SOTAVerified

Benchmarking

Papers

Showing 43764400 of 5548 papers

TitleStatusHype
Rearrangement: A Challenge for Embodied AI0
Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models0
Re-assessing ImageNet: How aligned is its single-label assumption with its multi-label nature?0
A Comparative Analysis on Ethical Benchmarking in Large Language Models0
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers0
A Survey on Vision Autoregressive Model0
A Survey on Temporal Sentence Grounding in Videos0
A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams0
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?0
Recommendations for Baselines and Benchmarking Approximate Gaussian Processes0
Reconstructing antibody repertoires from error-prone immunosequencing datasets0
A Survey on Preserving Fairness Guarantees in Changing Environments0
A Survey on Model Compression for Large Language Models0
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers0
A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-190
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research0
A Survey on LLM-based News Recommender Systems0
Unitail: Detecting, Reading, and Matching in Retail Scene0
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking0
A Survey of Spanish Clinical Language Models0
Refer to Anything with Vision-Language Prompts0
Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models0
Unleashing OpenTitan's Potential: a Silicon-Ready Embedded Secure Element for Root of Trust and Cryptographic Offloading0
A Survey of Small Language Models0
Regularization of ML models for Earth systems by using longer model timesteps0
Show:102550
← PrevPage 176 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified