SOTAVerified

Benchmarking

Papers

Showing 501525 of 5548 papers

TitleStatusHype
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey0
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach0
EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP ModelsCode0
Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling0
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging0
Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language ModelsCode0
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation0
EnronQA: Towards Personalized RAG over Private Documents0
InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method0
MINERVA: Evaluating Complex Video ReasoningCode2
AI-ready Snow Radar Echogram Dataset (SRED) for climate change monitoring0
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and OutlookCode2
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule GenerationCode1
From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising0
Towards Robust and Generalizable Gerchberg Saxton based Physics Inspired Neural Networks for Computer Generated Holography: A Sensitivity Analysis Framework0
Sadeed: Advancing Arabic Diacritization Through Small Language Model0
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training0
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking0
Hydra: Marker-Free RGB-D Hand-Eye Calibration0
OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System VerificationCode1
The Leaderboard Illusion0
TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social NetworksCode1
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks0
LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs0
SecRepoBench: Benchmarking LLMs for Secure Code Generation in Real-World Repositories0
Show:102550
← PrevPage 21 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified