SOTAVerified

Benchmarking

Papers

Showing 16261650 of 5548 papers

TitleStatusHype
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology0
Present and Future Generalization of Synthetic Image DetectorsCode0
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science CommunicatorsCode0
An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions0
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection0
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time0
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language ModelsCode1
Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks0
CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data0
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive ProgressionsCode0
Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific LeaderboardsCode0
Arena 4.0: A Comprehensive ROS2 Development and Benchmarking Platform for Human-centric Navigation Using Generative-Model-based Environment Generation0
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines0
Efficacy of Synthetic Data as a Benchmark0
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection ModelsCode0
Hard-Label Cryptanalytic Extraction of Neural Network ModelsCode0
ASR Benchmarking: Need for a More Representative Conversational DatasetCode0
Advances in APPFL: A Comprehensive and Extensible Federated Learning FrameworkCode2
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness CalibrationCode0
WER We Stand: Benchmarking Urdu ASR Models0
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part IICode0
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language ModelsCode0
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event DetectionCode0
Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact0
MetaFormer and CNN Hybrid Model for Polyp Image SegmentationCode1
Show:102550
← PrevPage 66 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified