SOTAVerified

Benchmarking

Papers

Showing 25012525 of 5548 papers

TitleStatusHype
The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models CollapseCode0
Large-scale Benchmarking of Metaphor-based Optimization Heuristics0
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction SimulatorCode2
Multi-Fidelity Methods for Optimization: A Survey0
Recommendations for Baselines and Benchmarking Approximate Gaussian Processes0
Evaluation of simulation methods for tumor subclonal reconstruction0
Massively Multi-Cultural Knowledge Acquisition & LM BenchmarkingCode1
MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language ModelsCode2
Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms0
Benchmarking multi-component signal processing methods in the time-frequency planeCode0
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
Privacy-Preserving Language Model Inference with Instance Obfuscation0
BdSLW60: A Word-Level Bangla Sign Language DatasetCode0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
Customizable Perturbation Synthesis for Robust SLAM BenchmarkingCode2
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT0
AIR-Bench: Benchmarking Large Audio-Language Models via Generative ComprehensionCode2
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking StudyCode0
Explainable Global Wildfire Prediction Models using Graph Neural NetworksCode1
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation0
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices0
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation0
Retrieve, Merge, Predict: Augmenting Tables with Data LakesCode1
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
Show:102550
← PrevPage 101 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified