SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2061–2070 of 5548 papers

Title	Date	Tasks	Status	Hype
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models	Jun 13, 2024	Benchmarking	CodeCode Available	1
BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics	Jun 13, 2024	Benchmarking	CodeCode Available	2
ECBD: Evidence-Centered Benchmark Design for NLP	Jun 13, 2024	Benchmarking	CodeCode Available	0
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition	Jun 13, 2024	Benchmarking	—Unverified	0
A Review of 315 Benchmark and Test Functions for Machine Learning Optimization Algorithms and Metaheuristics with Mathematical and Visual Descriptions	Jun 13, 2024	Benchmarking	—Unverified	0
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT	Jun 13, 2024	BenchmarkingLLM-generated Text Detection	CodeCode Available	1
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living	Jun 13, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified	0
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs	Jun 13, 2024	BenchmarkingGPU	CodeCode Available	2
SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution	Jun 13, 2024	BenchmarkingImage Super-Resolution	CodeCode Available	1

Show:10 25 50

← PrevPage 207 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified