SOTAVerified

Benchmarking

Papers

Showing 31513200 of 5548 papers

TitleStatusHype
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity RecognitionCode0
Comparative analysis of neural network architectures for short-term FOREX forecasting0
UCCIX: Irish-eXcellence Large Language Model0
Divergent Creativity in Humans and Large Language ModelsCode0
oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving0
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness0
Benchmarking Cross-Domain Audio-Visual Deception Detection0
Replication Study and Benchmarking of Real-Time Object Detection ModelsCode0
Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs0
Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning0
Benchmarking Educational Program RepairCode0
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking0
Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-TuningCode0
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images0
Performance Evaluation of Real-Time Object Detection for Electric ScootersCode0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language ModelsCode0
Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles0
PhilHumans: Benchmarking Machine Learning for Personal Health0
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System0
Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-TurboCode0
Toward end-to-end interpretable convolutional neural networks for waveform signals0
CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities0
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News0
Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods0
The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMACode0
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting0
Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models0
On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks0
MileBench: Benchmarking MLLMs in Long Context0
Detecting critical treatment effect bias in small subgroupsCode0
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methodsCode0
Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models0
Stochastic Spiking Neural Networks with First-to-Spike Coding0
CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint MatchingCode0
Benchmarking Mobile Device Control Agents across Diverse Configurations0
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning0
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey beesCode0
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image ClassificationCode0
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking0
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches0
Open Datasets for Satellite Radio Resource Control0
TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos0
EnzChemRED, a rich enzyme chemistry relation extraction dataset0
In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review0
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real NewsCode0
Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization0
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning0
Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection0
Show:102550
← PrevPage 64 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified