SOTAVerified

Benchmarking

Papers

Showing 23012350 of 5548 papers

TitleStatusHype
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency0
Unraveling the Capabilities of Language Models in News SummarizationCode0
Evolving Hard Maximum Cut Instances for Quantum Approximate Optimization Algorithms0
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research0
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection0
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation0
A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems0
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding0
Benchmarking Quantum Reinforcement LearningCode0
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding0
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share0
Making Sense of Data in the Wild: Data Analysis Automation at Scale0
Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets?0
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry0
Beyond Benchmarks: On The False Promise of AI Regulation0
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree searchCode0
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study0
Benchmarking global optimization techniques for unmanned aerial vehicle path planning0
Feature-based Evolutionary Diversity Optimization of Discriminating Instances for Chance-constrained Optimization Problems0
The Karp Dataset0
AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning0
You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain0
DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale0
CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization0
Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities0
Leveraging LLMs to Create a Haptic Devices' Recommendation System0
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and ReasoningCode0
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF0
Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs)0
Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes0
Optimally-Weighted Maximum Mean Discrepancy Framework for Continual Learning0
Benchmarking Image Perturbations for Testing Automated Driving Assistance SystemsCode0
Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing0
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model0
Benchmarking Large Language Models via Random Variables0
An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version0
FORLAPS: An Innovative Data-Driven Reinforcement Learning Approach for Prescriptive Process Monitoring0
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and AssistanceCode0
Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data0
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPUCode0
Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction0
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging0
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents0
Evaluating SAT and SMT Solvers on Large-Scale Sudoku PuzzlesCode0
Off-policy Evaluation for Payments at Adyen0
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval0
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning0
Keras Sig: Efficient Path Signature Computation on GPU in Keras 30
Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition0
Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving0
Show:102550
← PrevPage 47 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified