Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2401–2450 of 5548 papers

Title	Date	Tasks	Status	Hype
SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages	Mar 14, 2024	BenchmarkingDimensionality Reduction	CodeCode Available	0
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors	Mar 14, 2024	BenchmarkingDomain Adaptation	CodeCode Available	0
Recurrent Drafter for Fast Speculative Decoding in Large Language Models	Mar 14, 2024	BenchmarkingKnowledge Distillation	CodeCode Available	3
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows	Mar 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models	Mar 12, 2024	Benchmarking	CodeCode Available	9
IndicSTR12: A Dataset for Indic Scene Text Recognition	Mar 12, 2024	BenchmarkingScene Text Recognition	—Unverified	0
An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems	Mar 12, 2024	Benchmarking	—Unverified	0
A tutorial on multi-view autoencoders using the multi-view-AE library	Mar 12, 2024	Benchmarking	—Unverified	0
Better than classical? The subtle art of benchmarking quantum machine learning models	Mar 11, 2024	BenchmarkingBinary Classification	CodeCode Available	7
(N,K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model	Mar 11, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies	Mar 11, 2024	BenchmarkingData Augmentation	CodeCode Available	0
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages	Mar 11, 2024	BenchmarkingData Augmentation	CodeCode Available	1
Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology	Mar 11, 2024	BenchmarkingContent-Based Image Retrieval	CodeCode Available	1
A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation	Mar 11, 2024	BenchmarkingTraffic Signal Control	—Unverified	0
Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark	Mar 9, 2024	BenchmarkingFairness	CodeCode Available	1
Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum Simulations	Mar 9, 2024	BenchmarkingCPU	CodeCode Available	0
Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms	Mar 8, 2024	BenchmarkingSynthetic Data Generation	CodeCode Available	0
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1
Benchmarking Large Language Models for Molecule Prediction Tasks	Mar 8, 2024	BenchmarkingPrediction	CodeCode Available	0
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents	Mar 8, 2024	BenchmarkingDecision Making	CodeCode Available	1
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume	Mar 8, 2024	Adversarial RobustnessBenchmarking	—Unverified	0
R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations	Mar 7, 2024	Benchmarking	CodeCode Available	1
NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems	Mar 7, 2024	BenchmarkingDependency Parsing	—Unverified	0
Benchmarking News Recommendation in the Era of Green AI	Mar 7, 2024	BenchmarkingGPU	—Unverified	0
Improvements & Evaluations on the MLCommons CloudMask Benchmark	Mar 7, 2024	Benchmarking	CodeCode Available	0
Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation	Mar 7, 2024	BenchmarkingMultimodal Recommendation	CodeCode Available	1
Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI	Mar 7, 2024	Benchmarking	CodeCode Available	0
Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks	Mar 6, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task	Mar 6, 2024	Benchmarking	CodeCode Available	0
A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video	Mar 6, 2024	BenchmarkingCrowd Counting	—Unverified	0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving	Mar 6, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem	Mar 6, 2024	BenchmarkingHallucination	CodeCode Available	0
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents	Mar 5, 2024	BenchmarkingLanguage Modeling	CodeCode Available	2
Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation	Mar 5, 2024	BenchmarkingIn-Context Learning	—Unverified	0
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering	Mar 5, 2024	BenchmarkingCode Generation	—Unverified	0
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground	Mar 4, 2024	Benchmarking	—Unverified	0
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis	Mar 4, 2024	BenchmarkingDrug Discovery	CodeCode Available	2
REAL-Colon: A dataset for developing real-world AI applications in colonoscopy	Mar 4, 2024	Benchmarking	CodeCode Available	2
Classification of the Fashion-MNIST Dataset on a Quantum Computer	Mar 4, 2024	BenchmarkingQuantum Machine Learning	—Unverified	0
Model Lakes	Mar 4, 2024	BenchmarkingManagement	—Unverified	0
Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks	Mar 4, 2024	Benchmarking	CodeCode Available	0
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification	Mar 3, 2024	BenchmarkingSpeaker Verification	CodeCode Available	0
A Bayesian Committee Machine Potential for Oxygen-containing Organic Compounds	Mar 2, 2024	BenchmarkingPosition	—Unverified	0
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing	Mar 2, 2024	AttributeBenchmarking	CodeCode Available	1
SINDy vs Hard Nonlinearities and Hidden Dynamics: a Benchmarking Study	Mar 1, 2024	Benchmarking	—Unverified	0
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms	Mar 1, 2024	BenchmarkingStochastic Optimization	—Unverified	0
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance	Mar 1, 2024	BenchmarkingStance Detection	—Unverified	0
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models	Mar 1, 2024	BenchmarkingMathematical Reasoning	—Unverified	0
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs	Mar 1, 2024	Benchmarking	CodeCode Available	1
Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking	Mar 1, 2024	BenchmarkingImitation Learning	—Unverified	0

Show:10 25 50

← PrevPage 49 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified