SOTAVerified

Benchmarking

Papers

Showing 53015325 of 5548 papers

TitleStatusHype
Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and ValidationCode0
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical DatasetCode0
Cryo-RALib -- a modular library for accelerating alignment in cryo-EMCode0
What the Weight?! A Unified Framework for Zero-Shot Knowledge CompositionCode0
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive ProgressionsCode0
Cross-Lingual Text Classification of Transliterated Hindi and MalayalamCode0
Benchmarking Flexible Electric Loads Scheduling Algorithms under Market Price UncertaintyCode0
Yum-me: A Personalized Nutrient-based Meal Recommender SystemCode0
Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph GenerationCode0
Cross-lingual sentiment classification in low-resource Bengali languageCode0
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive SegmentationCode0
STREETS: A Novel Camera Network Dataset for Traffic FlowCode0
Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical OptimizationCode0
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMsCode0
Benchmarking Failures in Tool-Augmented Language ModelsCode0
CRNN: A Joint Neural Network for Redundancy DetectionCode0
Critical review of conformational B-cell epitope prediction methodsCode0
PICO Element Detection in Medical Text via Long Short-Term Memory Neural NetworksCode0
Stronger Than You Think: Benchmarking Weak Supervision on Realistic TasksCode0
CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint MatchingCode0
PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature DataCode0
An Optical Control Environment for Benchmarking Reinforcement Learning AlgorithmsCode0
STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and BenchmarkingCode0
An open unified deep graph learning framework for discovering drug leadsCode0
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPUCode0
Show:102550
← PrevPage 213 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified