SOTAVerified

Benchmarking

Papers

Showing 321330 of 5548 papers

TitleStatusHype
InfiAgent-DABench: Evaluating Agents on Data Analysis TasksCode2
AiTLAS: Artificial Intelligence Toolbox for Earth ObservationCode2
InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph PriorCode2
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph PriorCode2
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion TransferCode2
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM InteractionsCode2
Craftium: An Extensible Framework for Creating Reinforcement Learning EnvironmentsCode2
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and InteractionsCode2
Investigating Tradeoffs in Real-World Video Super-ResolutionCode2
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation GenerationCode2
Show:102550
← PrevPage 33 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified