SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 991–1000 of 5548 papers

Title	Date	Tasks	Status	Hype
Improving and Benchmarking Offline Reinforcement Learning Algorithms	Jun 1, 2023	AttributeBenchmarking	CodeCode Available	1
End-to-end Knowledge Retrieval with Multi-modal Queries	Jun 1, 2023	BenchmarkingCross-Modal Retrieval	CodeCode Available	1
Accurate and Efficient Structural Ensemble Generation of Macrocyclic Peptides using Internal Coordinate Diffusion	May 30, 2023	BenchmarkingDiversity	CodeCode Available	1
IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in Nanophotonics	May 30, 2023	Benchmarking	CodeCode Available	1
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models	May 30, 2023	BenchmarkingCode Generation	CodeCode Available	1
Decoding the Underlying Meaning of Multimodal Hateful Memes	May 28, 2023	BenchmarkingHateful Meme Classification	CodeCode Available	1
Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial Tasks	May 26, 2023	Benchmarking	CodeCode Available	1
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration	May 25, 2023	BenchmarkingFace Recognition	CodeCode Available	1
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment	May 23, 2023	BenchmarkingCross-Lingual Transfer	CodeCode Available	1
Exploring Large Language Models for Classical Philology	May 23, 2023	BenchmarkingDecoder	CodeCode Available	1

Show:10 25 50

← PrevPage 100 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified