Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5301–5350 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation	Oct 23, 2024	ArticlesBenchmarking	CodeCode Available
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset	May 25, 2023	BenchmarkingText to SQL	CodeCode Available
Cryo-RALib -- a modular library for accelerating alignment in cryo-EM	Nov 11, 2020	BenchmarkingGPU	CodeCode Available
What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition	Jan 23, 2024	Benchmarking	CodeCode Available
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions	Sep 20, 2024	BenchmarkingSensitivity	CodeCode Available
Cross-Lingual Text Classification of Transliterated Hindi and Malayalam	Aug 31, 2021	BenchmarkingClassification	CodeCode Available
Benchmarking Flexible Electric Loads Scheduling Algorithms under Market Price Uncertainty	Feb 4, 2020	BenchmarkingDecision Making	CodeCode Available
Yum-me: A Personalized Nutrient-based Meal Recommender System	May 25, 2016	BenchmarkingRecommendation Systems	CodeCode Available
Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation	Dec 11, 2024	BenchmarkingFederated Learning	CodeCode Available
Cross-lingual sentiment classification in low-resource Bengali language	Nov 1, 2020	BenchmarkingClassification	CodeCode Available
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation	May 4, 2025	BenchmarkingFeature Upsampling	CodeCode Available
STREETS: A Novel Camera Network Dataset for Traffic Flow	Dec 1, 2019	Benchmarking	CodeCode Available
Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical Optimization	Sep 17, 2021	Benchmarking	CodeCode Available
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs	Oct 17, 2024	Benchmarking	CodeCode Available
Benchmarking Failures in Tool-Augmented Language Models	Mar 18, 2025	BenchmarkingText Generation	CodeCode Available
CRNN: A Joint Neural Network for Redundancy Detection	Jun 4, 2017	BenchmarkingGeneral Classification	CodeCode Available
Critical review of conformational B-cell epitope prediction methods	Jan 10, 2023	BenchmarkingDrug Design	CodeCode Available
PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks	Jul 1, 2018	BenchmarkingDecision Making	CodeCode Available
Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks	Jan 13, 2025	Benchmarking	CodeCode Available
CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching	Apr 25, 2024	BenchmarkingData Augmentation	CodeCode Available
PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data	Feb 6, 2025	BenchmarkingTime Series	CodeCode Available
An Optical Control Environment for Benchmarking Reinforcement Learning Algorithms	Mar 23, 2022	BenchmarkingDeep Reinforcement Learning	CodeCode Available
STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking	Jul 4, 2025	BenchmarkingNavigate	CodeCode Available
An open unified deep graph learning framework for discovering drug leads	Dec 6, 2022	BenchmarkingDrug Discovery	CodeCode Available
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU	Jan 16, 2025	Benchmarkingcontinuous-control	CodeCode Available
PixelHop: A Successive Subspace Learning (SSL) Method for Object Classification	Sep 17, 2019	BenchmarkingDecision Making	CodeCode Available
pke: an open source python-based keyphrase extraction toolkit	Dec 1, 2016	BenchmarkingKeyphrase Extraction	CodeCode Available
Benchmarking Educational Program Repair	May 8, 2024	BenchmarkingProgram Repair	CodeCode Available
A Benchmarking Study of Vision-based Robotic Grasping Algorithms	Mar 14, 2025	BenchmarkingRobotic Grasping	CodeCode Available
CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization	Oct 25, 2022	Abstractive Text SummarizationBenchmarking	CodeCode Available
CREPO: An Open Repository to Benchmark Credal Network Algorithms	May 10, 2021	Benchmarking	CodeCode Available
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making	Sep 9, 2024	BenchmarkingDecision Making	CodeCode Available
Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI	Nov 23, 2023	BenchmarkingCloud Detection	CodeCode Available
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models	Mar 18, 2025	BenchmarkingSpatial Reasoning	CodeCode Available
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets	Jun 20, 2022	BenchmarkingFraud Detection	CodeCode Available
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison	Mar 1, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework	Sep 24, 2024	Benchmarkingcounterfactual	CodeCode Available
pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System Events -- Part I: Overview and Results	Apr 3, 2022	Benchmarking	CodeCode Available
pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System Events	Oct 25, 2022	Benchmarking	CodeCode Available
Continuous Optimization Benchmarks by Simulation	Aug 14, 2020	BenchmarkingGaussian Processes	CodeCode Available
Continual Learning Strategies for 3D Engineering Regression Problems: A Benchmarking Study	Apr 16, 2025	BenchmarkingContinual Learning	CodeCode Available
Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems	Mar 5, 2025	BenchmarkingCPU	CodeCode Available
Structured Prediction Problem Archive	Feb 4, 2022	BenchmarkingPrediction	CodeCode Available
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking	Sep 23, 2024	BenchmarkingDiversity	CodeCode Available
Benchmarking down-scaled (not so large) pre-trained language models	Sep 1, 2021	Benchmarking	CodeCode Available
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics	Apr 6, 2024	BenchmarkingHallucination	CodeCode Available
ContextGNN goes to Elliot: Towards Benchmarking Relational Deep Learning for Static Link Prediction (aka Personalized Item Recommendation)	Mar 20, 2025	BenchmarkingLink Prediction	CodeCode Available
Selected Languages are All You Need for Cross-lingual Truthfulness Transfer	Jun 20, 2024	AllBenchmarking	CodeCode Available
Content-Aware Differential Privacy with Conditional Invertible Neural Networks	Jul 29, 2022	Benchmarking	CodeCode Available
Population-wise Labeling of Sulcal Graphs using Multi-graph Matching	Jan 31, 2023	BenchmarkingGraph Matching	CodeCode Available

Show:10 25 50

← PrevPage 107 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified