SOTAVerified

Benchmarking

Papers

Showing 22012225 of 5548 papers

TitleStatusHype
Comparative analysis of neural network architectures for short-term FOREX forecasting0
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness0
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity RecognitionCode0
oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving0
Benchmarking Cross-Domain Audio-Visual Deception Detection0
Replication Study and Benchmarking of Real-Time Object Detection ModelsCode0
Benchmarking Classical and Learning-Based Multibeam Point Cloud RegistrationCode1
Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs0
Are EEG-to-Text Models Working?Code3
Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning0
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression ToolkitCode4
Aequitas Flow: Streamlining Fair ML ExperimentationCode4
OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMsCode2
Benchmarking Educational Program RepairCode0
Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-TuningCode0
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking0
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan DatasetsCode1
ACEGEN: Reinforcement learning of generative chemical agents for drug discoveryCode3
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
Performance Evaluation of Real-Time Object Detection for Electric ScootersCode0
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image RetrievalCode2
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language ModelsCode0
PhilHumans: Benchmarking Machine Learning for Personal Health0
Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles0
Show:102550
← PrevPage 89 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified