SOTAVerified

Benchmarking

Papers

Showing 21012150 of 5548 papers

TitleStatusHype
Benchmarking Model Predictive Control Algorithms in Building Optimization Testing Framework (BOPTEST)0
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches0
Exploration of TPUs for AI Applications0
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation0
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography0
Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability0
CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing0
Call for Action: towards the next generation of symbolic regression benchmark0
Benchmarking Agility and Reconfigurability in Satellite Systems for Tropical Cyclone Monitoring0
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations0
Benchmarking Aggression Identification in Social Media0
Calibrating chemical multisensory devices for real world applications: An in-depth comparison of quantitative Machine Learning approaches0
Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline0
Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift0
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations0
Explicitly Multi-Modal Benchmarks for Multi-Objective Optimization0
Exploitation-Guided Exploration for Semantic Embodied Navigation0
Exploring and Benchmarking the Planning Capabilities of Large Language Models0
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations0
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report0
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods0
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic0
Quantum Similarity Testing with Convolutional Neural Networks0
Explainable AI using expressive Boolean formulas0
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering0
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks0
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view0
Benchmarking Adversarial Robustness of Compressed Deep Learning Models0
Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs0
A Benchmarking Protocol for Pansharpening: Dataset, Preprocessing, and Quality Assessment0
Benchmarking Adversarial Robustness0
Experimenting with robotic intra-logistics domains0
Building benchmarking frameworks for supporting replicability and reproducibility: spatial and textual analysis as an example0
Experimental robustness benchmark of quantum neural network on a superconducting quantum processor0
Benchmarking Adversarially Robust Quantum Machine Learning at Scale0
Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite0
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists0
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)0
Benchmarking adversarial attacks and defenses for time-series data0
Analysis of different disparity estimation techniques on aerial stereo image datasets0
Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text0
Building a continuous benchmarking ecosystem in bioinformatics0
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches0
Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration0
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer0
AT-Drone: Benchmarking Adaptive Teaming in Multi-Drone Pursuit0
BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes0
Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances0
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark0
Show:102550
← PrevPage 43 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified