SOTAVerified

Benchmarking

Papers

Showing 23012350 of 5548 papers

TitleStatusHype
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization0
A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements0
BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer0
A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection0
BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function0
Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data0
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets0
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving0
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents0
A Metadata-Driven Approach to Understand Graph Neural Networks0
Foundations for learning from noisy quantum experiments0
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning0
BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text0
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving0
Benchmarks as Microscopes: A Call for Model Metrology0
Formal Covariate Benchmarking to Bound Omitted Variable Bias0
Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge0
FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents0
Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods0
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance0
ALT: A Python Package for Lightweight Feature Representation in Time Series Classification0
FOR-instance: a UAV laser scanning benchmark dataset for semantic and instance segmentation of individual trees0
Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text0
Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure0
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs0
Benchmarking XAI Explanations with Human-Aligned Evaluations0
A critical look at the current train/test split in machine learning0
Forecasting NIFTY 50 benchmark Index using Seasonal ARIMA time series models0
FORLAPS: An Innovative Data-Driven Reinforcement Learning Approach for Prescriptive Process Monitoring0
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate0
Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset0
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval0
Alpha Excel Benchmark0
Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset0
Benchmarking VLMs' Reasoning About Persuasive Atypical Images0
A Bayesian Committee Machine Potential for Oxygen-containing Organic Compounds0
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression0
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels0
Benchmarking Vision Language Models on German Factual Data0
Auto-tuning TensorFlow Threading Model for CPU Backend0
ForamViT-GAN: Exploring New Paradigms in Deep Learning for Micropaleontological Image Analysis0
Benchmarking Vision Language Models for Cultural Understanding0
ALP: Action-Aware Embodied Learning for Perception0
Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters0
A critical analysis of metrics used for measuring progress in artificial intelligence0
Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving0
Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments0
Benchmarking Video Frame Interpolation0
Show:102550
← PrevPage 47 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified