SOTAVerified

Benchmarking

Papers

Showing 18511875 of 5548 papers

TitleStatusHype
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion ModelsCode2
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models0
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects0
Benchmarking Robust Self-Supervised Learning Across Diverse Downstream TasksCode0
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?0
Feature interpretability in BCIs: exploring the role of network lateralizationCode0
Benchmarking the Attribution Quality of Vision ModelsCode0
GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure DetectionCode2
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification0
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse ModalitiesCode1
REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image MatchingCode0
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction0
Separable Operator NetworksCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation0
When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph BenchmarkCode1
Benchmarking Vision Language Models for Cultural Understanding0
Experimental Benchmarking of Energy-saving Sub-Optimal Sliding Mode Control0
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem0
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization ModelingCode1
NativQA: Multilingual Culturally-Aligned Natural Query for LLMs0
Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videosCode1
Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic EnvironmentCode0
Show:102550
← PrevPage 75 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified