SOTAVerified

Language Modeling

Papers

Showing 9761000 of 14182 papers

TitleStatusHype
AutoVerus: Automated Proof Generation for Rust CodeCode2
Generalized Interpolating Discrete DiffusionCode2
A Generalist AgentCode2
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
Generative Pre-trained Speech Language Model with Efficient Hierarchical TransformerCode2
ExpertPrompting: Instructing Large Language Models to be Distinguished ExpertsCode2
Training Diffusion Models with Reinforcement LearningCode2
GIT: A Generative Image-to-text Transformer for Vision and LanguageCode2
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLMCode2
GPT-Driver: Learning to Drive with GPTCode2
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking CapabilitiesCode2
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement LearningCode2
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context ExamplesCode2
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton OperatorsCode2
TrustRAG: Enhancing Robustness and Trustworthiness in RAGCode2
Frontiers in Intelligent ColonoscopyCode2
A Touch, Vision, and Language Dataset for Multimodal AlignmentCode2
A Training-free LLM-based Approach to General Chinese Character Error CorrectionCode2
Forgetting Transformer: Softmax Attention with a Forget GateCode2
Formal Mathematics Statement Curriculum LearningCode2
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning AbilitiesCode2
A Systematic Survey of Prompt Engineering on Vision-Language Foundation ModelsCode2
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM InferenceCode2
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill SetsCode2
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language ModelsCode2
Show:102550
← PrevPage 40 of 568Next →

No leaderboard results yet.