Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2051–2075 of 5548 papers

Title	Date	Tasks	Status	Hype
On the Evaluation of Speech Foundation Models for Spoken Language Understanding	Jun 14, 2024	BenchmarkingPrediction	—Unverified	0
Beyond Slow Signs in High-fidelity Model Extraction	Jun 14, 2024	Benchmarkingmodel	CodeCode Available	0
LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data	Jun 14, 2024	BenchmarkingDecision Making	CodeCode Available	1
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency	Jun 14, 2024	Benchmarking	CodeCode Available	1
TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs	Jun 14, 2024	BenchmarkingKnowledge Graphs	CodeCode Available	3
CubeSat-Enabled Free-Space Optics: Joint Data Communication and Fine Beam Tracking	Jun 13, 2024	Benchmarking	—Unverified	0
ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents	Jun 13, 2024	BenchmarkingSurvey	—Unverified	0
Decoding the Diversity: A Review of the Indic AI Research Landscape	Jun 13, 2024	BenchmarkingDiversity	—Unverified	0
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks	Jun 13, 2024	Benchmarking	CodeCode Available	3
BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics	Jun 13, 2024	Benchmarking	CodeCode Available	2
ECBD: Evidence-Centered Benchmark Design for NLP	Jun 13, 2024	Benchmarking	CodeCode Available	0
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents	Jun 13, 2024	BenchmarkingLanguage Modeling	CodeCode Available	2
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition	Jun 13, 2024	Benchmarking	—Unverified	0
A Review of 315 Benchmark and Test Functions for Machine Learning Optimization Algorithms and Metaheuristics with Mathematical and Visual Descriptions	Jun 13, 2024	Benchmarking	—Unverified	0
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models	Jun 13, 2024	Benchmarking	CodeCode Available	1
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living	Jun 13, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified	0
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT	Jun 13, 2024	BenchmarkingLLM-generated Text Detection	CodeCode Available	1
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs	Jun 13, 2024	BenchmarkingGPU	CodeCode Available	2
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2
SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution	Jun 13, 2024	BenchmarkingImage Super-Resolution	CodeCode Available	1
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation	Jun 13, 2024	BenchmarkingHallucination	CodeCode Available	0
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases	Jun 12, 2024	BenchmarkingModel Compression	—Unverified	0
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets	Jun 12, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks	Jun 12, 2024	BenchmarkingChatbot	CodeCode Available	3
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation	Jun 12, 2024	BenchmarkingImage Generation	CodeCode Available	1

Show:10 25 50

← PrevPage 83 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified