SOTAVerified

Benchmarking

Papers

Showing 53015350 of 5548 papers

TitleStatusHype
Estimating Task Completion Times for Network Rollouts using Statistical Models within Partitioning-based Regression Methods0
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices0
Estimating transmission from genetic and epidemiological data: a metric to compare transmission trees0
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding0
Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization0
Challenges and Advancements in Modeling Shock Fronts with Physics-Informed Neural Networks: A Review and Benchmarking Study0
Tackling Visual Control via Multi-View Exploration Maximization0
Challenge Results Are Not Reproducible0
ChakmaNMT: A Low-resource Machine Translation On Chakma Language0
Evalita-LLM: Benchmarking Large Language Models on Italian0
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning0
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding0
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI0
C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System0
Evaluating Cultural and Social Awareness of LLM Web Agents0
Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models0
Tactile MNIST: Benchmarking Active Tactile Perception0
Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy0
Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches0
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking0
CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking0
Certifying almost all quantum states with few single-qubit measurements0
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study0
Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines0
A Latent Fingerprint in the Wild Database0
CellCycleGAN: Spatiotemporal Microscopy Image Synthesis of Cell Populations using Statistical Shape Models and Conditional GANs0
Evaluating Music Recommender Systems for Groups0
Evaluating Nuanced Bias in Large Language Model Free Response Answers0
CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark0
Evaluating Robustness of LLMs on Crisis-Related Microblogs across Events, Information Types, and Linguistic Features0
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning0
A Large-scale Study on Training Sample Memorization in Generative Modeling0
A large-scale, physically-based synthetic dataset for satellite pose estimation0
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics0
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance0
A Benchmarking Protocol for Pansharpening: Dataset, Preprocessing, and Quality Assessment0
Evaluating the Generation of Spatial Relations in Text and Image Generative Models0
Evaluating the Performance of Large Language Models via Debates0
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning0
TARGET: Benchmarking Table Retrieval for Generative Tasks0
Efficient Demand Response Location Targeting for Price Spike Mitigation by Exploiting Price-demand Relationship0
Evaluating Visual Conversational Agents via Cooperative Human-AI Games0
Evaluation and Ensembling of Methods for Reverse Engineering of Brain Connectivity from Imaging Data0
Evaluation Methodology for Attacks Against Confidence Thresholding Models0
Evaluation Methods and Measures for Causal Learning Algorithms0
Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge0
Evaluation of Architectural Synthesis Using Generative AI0
Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi0
CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs0
Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?0
Show:102550
← PrevPage 107 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified