SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2661–2670 of 5548 papers

Title	Date	Tasks	Status	Hype
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning	Oct 14, 2024	Atari GamesBenchmarking	—Unverified	0
ChakmaNMT: A Low-resource Machine Translation On Chakma Language	Oct 14, 2024	BenchmarkingMachine Translation	—Unverified	0
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)	Oct 14, 2024	BenchmarkingMulti-Task Learning	—Unverified	0
The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence	Oct 14, 2024	Benchmarking	—Unverified	0
Personalised Feedback Framework for Online Education Programmes Using Generative AI	Oct 14, 2024	BenchmarkingManagement	—Unverified	0
SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing	Oct 14, 2024	BenchmarkingManagement	CodeCode Available	0
Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning Perspective	Oct 14, 2024	BenchmarkingContrastive Learning	CodeCode Available	0
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English	Oct 12, 2024	Benchmarking	CodeCode Available	0
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback	Oct 12, 2024	Benchmarking	CodeCode Available	0
Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection Models	Oct 12, 2024	BenchmarkingMisinformation	CodeCode Available	0

Show:10 25 50

← PrevPage 267 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified