SOTAVerified

LLM real-life tasks

Papers

Showing 19 of 9 papers

TitleStatusHype
Affordable AI Assistants with Knowledge Graph of ThoughtsCode3
WebCanvas: Benchmarking Web Agents in Online EnvironmentsCode3
ITINERA: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary PlanningCode2
AutoDefense: Multi-Agent LLM Defense against Jailbreak AttacksCode2
mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imagingCode2
LLaVA-Plus: Learning to Use Tools for Creating Multimodal AgentsCode2
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought ThinkingCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident NarrativesCode0
Show:102550

No leaderboard results yet.