SOTAVerified

Benchmarking

Papers

Showing 22512275 of 5548 papers

TitleStatusHype
Forecasting time series with constraintsCode0
Zero-shot generation of synthetic neurosurgical data with large language modelsCode0
SkyRover: A Modular Simulator for Cross-Domain Pathfinding0
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents0
AT-Drone: Benchmarking Adaptive Teaming in Multi-Drone Pursuit0
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis0
Standardisation of Convex Ultrasound Data Through Geometric Analysis and Augmentation0
A Survey on LLM-based News Recommender Systems0
Machine learning for modelling unstructured grid data in computational physics: a review0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors0
Handwritten Text Recognition: A Survey0
One-Shot Federated Learning with Classifier-Free Diffusion Models0
The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray GenerationCode0
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment ProblemCode0
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph ColoringCode0
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories0
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations0
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation0
Decoding Complexity: Intelligent Pattern Exploration with CHPDA (Context Aware Hybrid Pattern Detection Algorithm)0
Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models0
Mol-MoE: Training Preference-Guided Routers for Molecule GenerationCode0
Surprise Potential as a Measure of Interactivity in Driving Scenarios0
PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature DataCode0
Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEsCode0
Show:102550
← PrevPage 91 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified