SOTAVerified

Benchmarking

Papers

Showing 47764800 of 5548 papers

TitleStatusHype
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context UnderstandingCode0
Mirage: Model-Agnostic Graph Distillation for Graph ClassificationCode0
Benchmarking Subset Selection from Large Candidate Solution Sets in Evolutionary Multi-objective OptimizationCode0
Sanity Simulations for Saliency MethodsCode0
From Variability to Stability: Advancing RecSys Benchmarking PracticesCode0
ALTIS: Modernizing GPGPU BenchmarkingCode0
From raw affiliations to organization identifiersCode0
Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking InsightsCode0
3D Face Reconstruction Error Decomposed: A Modular Benchmark for Fair and Fast Method EvaluationCode0
MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and LearningCode0
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code RepositoriesCode0
The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by Isolating Task-Specific Subnetworks in Feedforward Neural NetworksCode0
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsCode0
SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networksCode0
MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library ScenariosCode0
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum LearningCode0
SAWEC: Sensing-Assisted Wireless Edge ComputingCode0
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological EngineeringCode0
Vote'n'Rank: Revision of Benchmarking with Social Choice TheoryCode0
AlphaZip: Neural Network-Enhanced Lossless Text CompressionCode0
ML-Net: multi-label classification of biomedical texts with deep neural networksCode0
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in HistopathologyCode0
mlOSP: Towards a Unified Implementation of Regression Monte Carlo AlgorithmsCode0
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language RepresentationCode0
MLPerf Inference BenchmarkCode0
Show:102550
← PrevPage 192 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified