Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5301–5325 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation	Oct 23, 2024	ArticlesBenchmarking	CodeCode Available
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset	May 25, 2023	BenchmarkingText to SQL	CodeCode Available
Cryo-RALib -- a modular library for accelerating alignment in cryo-EM	Nov 11, 2020	BenchmarkingGPU	CodeCode Available
What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition	Jan 23, 2024	Benchmarking	CodeCode Available
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions	Sep 20, 2024	BenchmarkingSensitivity	CodeCode Available
Cross-Lingual Text Classification of Transliterated Hindi and Malayalam	Aug 31, 2021	BenchmarkingClassification	CodeCode Available
Benchmarking Flexible Electric Loads Scheduling Algorithms under Market Price Uncertainty	Feb 4, 2020	BenchmarkingDecision Making	CodeCode Available
Yum-me: A Personalized Nutrient-based Meal Recommender System	May 25, 2016	BenchmarkingRecommendation Systems	CodeCode Available
Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation	Dec 11, 2024	BenchmarkingFederated Learning	CodeCode Available
Cross-lingual sentiment classification in low-resource Bengali language	Nov 1, 2020	BenchmarkingClassification	CodeCode Available
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation	May 4, 2025	BenchmarkingFeature Upsampling	CodeCode Available
STREETS: A Novel Camera Network Dataset for Traffic Flow	Dec 1, 2019	Benchmarking	CodeCode Available
Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical Optimization	Sep 17, 2021	Benchmarking	CodeCode Available
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs	Oct 17, 2024	Benchmarking	CodeCode Available
Benchmarking Failures in Tool-Augmented Language Models	Mar 18, 2025	BenchmarkingText Generation	CodeCode Available
CRNN: A Joint Neural Network for Redundancy Detection	Jun 4, 2017	BenchmarkingGeneral Classification	CodeCode Available
Critical review of conformational B-cell epitope prediction methods	Jan 10, 2023	BenchmarkingDrug Design	CodeCode Available
PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks	Jul 1, 2018	BenchmarkingDecision Making	CodeCode Available
Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks	Jan 13, 2025	Benchmarking	CodeCode Available
CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching	Apr 25, 2024	BenchmarkingData Augmentation	CodeCode Available
PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data	Feb 6, 2025	BenchmarkingTime Series	CodeCode Available
An Optical Control Environment for Benchmarking Reinforcement Learning Algorithms	Mar 23, 2022	BenchmarkingDeep Reinforcement Learning	CodeCode Available
STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking	Jul 4, 2025	BenchmarkingNavigate	CodeCode Available
An open unified deep graph learning framework for discovering drug leads	Dec 6, 2022	BenchmarkingDrug Discovery	CodeCode Available
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU	Jan 16, 2025	Benchmarkingcontinuous-control	CodeCode Available

Show:10 25 50

← PrevPage 213 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified