SOTAVerified

Benchmarking

Papers

Showing 14261450 of 5548 papers

TitleStatusHype
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetryCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and BenchmarkingCode1
SHARP: Environment and Person Independent Activity Recognition with Commodity IEEE 802.11 Access PointsCode1
JoinGym: An Efficient Query Optimization Environment for Reinforcement LearningCode1
A Critical Assessment of State-of-the-Art in Entity AlignmentCode1
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity DatasetCode1
ERASE: Benchmarking Feature Selection Methods for Deep Recommender SystemsCode1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 MinutesCode1
Jojajovai: A Parallel Guarani-Spanish Corpus for MT BenchmarkingCode1
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarksCode1
AQuA: A Benchmarking Tool for Label Quality AssessmentCode1
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and BeyondCode1
Evaluating histopathology transfer learning with ChampKitCode1
Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New BenchmarkingCode1
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsCode1
Evaluating Multimodal Representations on Visual Semantic Textual SimilarityCode1
ISSAFE: Improving Semantic Segmentation in Accidents by Fusing Event-based DataCode1
Rethinking Machine Unlearning in Image Generation ModelsCode1
JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in CrowdsCode1
Benchmark on Drug Target Interaction Modeling from a Structure PerspectiveCode1
ClinicRealm: Re-evaluating Large Language Models with Conventional Machine Learning for Non-Generative Clinical Prediction TasksCode1
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning AlgorithmsCode1
Show:102550
← PrevPage 58 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified