CASTILLO: Characterizing Response Length Distributions of Large Language Models May 22, 2025 Instruction Following Language Modeling
Code Code Available 0ToDi: Token-wise Distillation via Fine-Grained Divergence Control May 22, 2025 Instruction Following Knowledge Distillation
— Unverified 0ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models May 22, 2025 Instruction Following reinforcement-learning
— Unverified 0LIFEBench: Evaluating Length Instruction Following in Large Language Models May 22, 2025 Instruction Following Text Generation
Code Code Available 0Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective May 21, 2025 Instruction Following Language Modeling
— Unverified 0Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought May 21, 2025 Chatbot Instruction Following
— Unverified 0ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy May 21, 2025 Instruction Following Transfer Learning
— Unverified 0FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management May 21, 2025 Instruction Following Management
— Unverified 0Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning May 21, 2025 Arithmetic Reasoning Instruction Following
— Unverified 0Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training May 20, 2025 All Domain Generalization
— Unverified 0DecIF: Improving Instruction-Following through Meta-Decomposition May 20, 2025 Instruction Following Response Generation
— Unverified 0Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels May 20, 2025 Instruction Following Knowledge Distillation
— Unverified 0Domain Adaptation of VLM for Soccer Video Understanding May 20, 2025 Action Classification Domain Adaptation
— Unverified 0Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks May 19, 2025 Instruction Following
— Unverified 0Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers May 19, 2025 Instruction Following Question Answering
— Unverified 0What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts May 19, 2025 Instruction Following
Code Code Available 0Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers May 19, 2025 In-Context Learning Instruction Following
— Unverified 0KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025 May 19, 2025 Automatic Speech Recognition Instruction Following
— Unverified 0CompBench: Benchmarking Complex Instruction-guided Image Editing May 18, 2025 Benchmarking Instruction Following
— Unverified 0Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning May 17, 2025 Decoder Instruction Following
— Unverified 0Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors May 17, 2025 counterfactual Instruction Following
Code Code Available 0Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining May 16, 2025 Instruction Following
— Unverified 0HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages May 16, 2025 Diversity Instruction Following
— Unverified 0When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs May 16, 2025 In-Context Learning Instruction Following
— Unverified 0GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents May 16, 2025 Benchmarking Instruction Following
— Unverified 0UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation May 15, 2025 Diversity Instruction Following
— Unverified 0Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation May 13, 2025 Code Generation In-Context Learning
— Unverified 0Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning? May 13, 2025 Chart Question Answering Fact Checking
Code Code Available 0Efficient Telecom Specific LLM: TSLAM-Mini with QLoRA and Digital Twin Data May 10, 2025 Instruction Following parameter-efficient fine-tuning
— Unverified 0Assessing Robustness to Spurious Correlations in Post-Training Language Models May 9, 2025 Instruction Following Mathematical Reasoning
— Unverified 0T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models May 8, 2025 Instruction Following Text-to-Video Generation
— Unverified 0Incentivizing Inclusive Contributions in Model Sharing Markets May 5, 2025 Federated Learning Instruction Following
— Unverified 0PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents May 2, 2025 Instruction Following Response Generation
— Unverified 0T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation May 1, 2025 counterfactual Instruction Following
— Unverified 0UAV-VLN: End-to-End Vision Language guided Navigation for UAVs Apr 30, 2025 Common Sense Reasoning Instruction Following
— Unverified 0Ask, Fail, Repeat: Meeseeks, an Iterative Feedback Benchmark for LLMs' Multi-turn Instruction-Following Ability Apr 30, 2025 Instruction Following Intent Recognition
— Unverified 0TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models Apr 29, 2025 Benchmarking Dataset Generation
Code Code Available 0CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks Apr 29, 2025 Instruction Following
— Unverified 0Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Apr 24, 2025 Image-text Retrieval Instruction Following
— Unverified 0ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance Apr 23, 2025 Instruction Following SSIM
— Unverified 0ParamΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost Apr 23, 2025 Instruction Following Language Modeling
— Unverified 0Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code Apr 23, 2025 Instruction Following Privacy Preserving
— Unverified 0DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models Apr 21, 2025 Computational Efficiency Instruction Following
— Unverified 0Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators Apr 21, 2025 Code Generation Instruction Following
Code Code Available 0Improving Instruct Models for Free: A Study on Partial Adaptation Apr 15, 2025 Few-Shot Learning In-Context Learning
— Unverified 0SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning Apr 12, 2025 Instruction Following
— Unverified 0Playpen: An Environment for Exploring Learning Through Conversational Interaction Apr 11, 2025 Instruction Following Large Language Model
Code Code Available 0VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding Apr 10, 2025 Instruction Following Video Understanding
— Unverified 0Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models Apr 10, 2025 Instruction Following
— Unverified 0Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models Apr 9, 2025 Instruction Following Mathematical Problem-Solving
— Unverified 0