SOTAVerified|Agents Browse Leaderboard About Blog

LLM real-life tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–9 of 9 papers

Title	Date	Tasks	Status	Hype	Score
Affordable AI Assistants with Knowledge Graph of Thoughts	Apr 3, 2025	Knowledge GraphsLLM real-life tasks	CodeCode Available	3	5
WebCanvas: Benchmarking Web Agents in Online Environments	Jun 18, 2024	AI AgentBenchmarking	CodeCode Available	3	5
ITINERA: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning	Feb 11, 2024	LLM real-life tasksOpen-Domain Question Answering	CodeCode Available	2	5
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks	Mar 2, 2024	Instruction FollowingLLM real-life tasks	CodeCode Available	2	5
mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imaging	Apr 30, 2025	AI AgentClassification	CodeCode Available	2	5
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents	Nov 9, 2023	Instruction FollowingLLM real-life tasks	CodeCode Available	2	5
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking	Jan 20, 2025	Decision MakingGSM8K	CodeCode Available	1	5
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1	5
Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives	Dec 16, 2024	counterfactualGeneral Classification	CodeCode Available	0	5

Show:10 25 50

No leaderboard results yet.