SOTAVerified

Benchmarking

Papers

Showing 42514275 of 5548 papers

TitleStatusHype
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
Context-guided Triple Matching for Multiple Choice Question Answering0
PASS: An ImageNet replacement for self-supervised pretraining without humansCode1
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language UnderstandingCode1
Disentangled Feature Representation for Few-shot Image ClassificationCode1
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement LearningCode2
Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine Translation0
Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue SystemCode1
Benchmarking Lane-changing Decision-making for Deep Reinforcement Learning0
Benchmarking Augmentation Methods for Learning Robust Navigation Agents: the Winning Entry of the 2021 iGibson Challenge0
SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and BenchmarkingCode1
Efficiently solving the thief orienteering problem with a max-min ant colony optimization approachCode0
A Novel Cluster Detection of COVID-19 Patients and Medical Disease Conditions Using Improved Evolutionary Clustering Algorithm Star0
Hybrid Transceiver Design for Tera-Hertz MIMO Systems Relying on Bayesian Learning Aided Sparse Channel Estimation0
AI Accelerator Survey and TrendsCode1
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge GraphsCode1
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction0
WiSoSuper: Benchmarking Super-Resolution Methods on Wind and Solar Data0
Messing Up 3D Virtual Environments: Transferable Adversarial 3D ObjectsCode0
Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical OptimizationCode0
A Survey on Temporal Sentence Grounding in Videos0
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle CommunicationCode1
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation DatasetCode1
Benchmarking the Spectrum of Agent CapabilitiesCode1
Show:102550
← PrevPage 171 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified