SOTAVerified

Large Language Model

Papers

Showing 651675 of 6097 papers

TitleStatusHype
Period-LLM: Extending the Periodic Capability of Multimodal Large Language ModelCode1
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM AgentsCode1
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality EvaluationCode1
ChatCFD: an End-to-End CFD Agent with Domain-specific Structured ThinkingCode1
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical ReasoningCode1
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K ResolutionCode1
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language ModelsCode1
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language NavigationCode1
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question AnsweringCode1
Unifying Multimodal Large Language Model Capabilities and Modalities via Model MergingCode1
Multimodal LLM-Guided Semantic Correction in Text-to-Image DiffusionCode1
REARANK: Reasoning Re-ranking Agent via Reinforcement LearningCode1
UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic informationCode1
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic LensCode1
A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial OptimizationCode1
ChemMLLM: Chemical Multimodal Large Language ModelCode1
CRAKEN: Cybersecurity LLM Agent with Knowledge-Based ExecutionCode1
PiFlow: Principle-aware Scientific Discovery with Multi-Agent CollaborationCode1
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following BehaviorCode1
U-SAM: An audio language Model for Unified Speech, Audio, and Music UnderstandingCode1
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and ExplanationCode1
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM EvaluationCode1
Unifying Segment Anything in Microscopy with Multimodal Large Language ModelCode1
ImagineBench: Evaluating Reinforcement Learning with Large Language Model RolloutsCode1
Measuring General Intelligence with Generated GamesCode1
Show:102550
← PrevPage 27 of 244Next →

No leaderboard results yet.