SOTAVerified

Benchmarking

Papers

Showing 501525 of 5548 papers

TitleStatusHype
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
RADAR: Benchmarking Language Models on Imperfect Tabular DataCode1
Benchmarking Meaning Representations in Neural Semantic ParsingCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksCode1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
Benchmarking End-to-End Behavioural Cloning on Video GamesCode1
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data PerspectiveCode1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
An Evaluation Dataset for Intent Classification and Out-of-Scope PredictionCode1
Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape ReconstructionCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional DependenciesCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
CoDEx: A Comprehensive Knowledge Graph Completion BenchmarkCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
An Empirical Study on Google Research Football Multi-agent ScenariosCode1
Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmarkCode1
CodeUpdateArena: Benchmarking Knowledge Editing on API UpdatesCode1
New Protocols and Negative Results for Textual Entailment Data CollectionCode1
Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New BenchmarkCode1
An Empirical Study of GPT-4o Image Generation CapabilitiesCode1
Show:102550
← PrevPage 21 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified