SOTAVerified

Benchmarking

Papers

Showing 19261950 of 5548 papers

TitleStatusHype
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis0
QuantBench: Benchmarking AI Methods for Quantitative Investment0
Token Sequence Compression for Efficient Multimodal Computing0
Design and benchmarking of a two degree of freedom tendon driver unit for cable-driven wearable technologies0
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code RepositoriesCode0
MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified BenchmarkCode0
Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations0
Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room0
CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents0
Fluorescence Reference Target Quantitative Analysis LibraryCode0
A Large-scale Class-level Benchmark Dataset for Code Generation with LLMs0
Benchmarking machine learning models for predicting aerofoil performance0
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V30
Establishing Reliability Metrics for Reward Models in Large Language Models0
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture0
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues0
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive EvaluationCode0
IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays0
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents0
Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation0
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations0
AI Idea Bench 2025: AI Research Idea Generation Benchmark0
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers0
Unreal Robotics Lab: A High-Fidelity Robotics Simulator with Advanced Physics and Rendering0
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation0
Show:102550
← PrevPage 78 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified