When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs May 16, 2025 In-Context Learning Instruction Following
— Unverified 0GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents May 16, 2025 Benchmarking Instruction Following
— Unverified 0BLEUBERI: BLEU is a surprisingly effective reward for instruction following May 16, 2025 Instruction Following Synthetic Data Generation
Code Code Available 1MergeBench: A Benchmark for Merging Domain-Specialized LLMs May 16, 2025 Instruction Following
Code Code Available 1Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining May 16, 2025 Instruction Following
— Unverified 0UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation May 15, 2025 Diversity Instruction Following
— Unverified 0Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation May 13, 2025 Code Generation In-Context Learning
— Unverified 0HealthBench: Evaluating Large Language Models Towards Improved Human Health May 13, 2025 Instruction Following Multiple-choice
Code Code Available 7Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning? May 13, 2025 Chart Question Answering Fact Checking
Code Code Available 0A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models May 12, 2025 Instruction Following
Code Code Available 1Efficient Telecom Specific LLM: TSLAM-Mini with QLoRA and Digital Twin Data May 10, 2025 Instruction Following parameter-efficient fine-tuning
— Unverified 0Assessing Robustness to Spurious Correlations in Post-Training Language Models May 9, 2025 Instruction Following Mathematical Reasoning
— Unverified 0MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks May 9, 2025 Diagnostic Instruction Following
Code Code Available 1Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding May 8, 2025 document understanding Instruction Following
Code Code Available 1T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models May 8, 2025 Instruction Following Text-to-Video Generation
— Unverified 0LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis May 5, 2025 Chatbot Decoder
Code Code Available 3Incentivizing Inclusive Contributions in Model Sharing Markets May 5, 2025 Federated Learning Instruction Following
— Unverified 0PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents May 2, 2025 Instruction Following Response Generation
— Unverified 0T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation May 1, 2025 counterfactual Instruction Following
— Unverified 0Ask, Fail, Repeat: Meeseeks, an Iterative Feedback Benchmark for LLMs' Multi-turn Instruction-Following Ability Apr 30, 2025 Instruction Following Intent Recognition
— Unverified 0UAV-VLN: End-to-End Vision Language guided Navigation for UAVs Apr 30, 2025 Common Sense Reasoning Instruction Following
— Unverified 0TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models Apr 29, 2025 Benchmarking Dataset Generation
Code Code Available 0CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks Apr 29, 2025 Instruction Following
— Unverified 0Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Apr 24, 2025 Image-text Retrieval Instruction Following
— Unverified 0ParamΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost Apr 23, 2025 Instruction Following Language Modeling
— Unverified 0ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance Apr 23, 2025 Instruction Following SSIM
— Unverified 0Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code Apr 23, 2025 Instruction Following Privacy Preserving
— Unverified 0Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction Apr 22, 2025 Diversity Domain Adaptation
Code Code Available 1Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators Apr 21, 2025 Code Generation Instruction Following
Code Code Available 0DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models Apr 21, 2025 Computational Efficiency Instruction Following
— Unverified 0Chinese-Vicuna: A Chinese Instruction-following Llama-based Model Apr 17, 2025 Code Generation CPU
Code Code Available 7Improving Instruct Models for Free: A Study on Partial Adaptation Apr 15, 2025 Few-Shot Learning In-Context Learning
— Unverified 0A Dual-Space Framework for General Knowledge Distillation of Large Language Models Apr 15, 2025 Code Generation General Knowledge
Code Code Available 1RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users Apr 14, 2025 Instruction Following
Code Code Available 1How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Apr 14, 2025 Instruction Following
Code Code Available 2SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning Apr 12, 2025 Instruction Following
— Unverified 0Playpen: An Environment for Exploring Learning Through Conversational Interaction Apr 11, 2025 Instruction Following Large Language Model
Code Code Available 0Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models Apr 10, 2025 Instruction Following
— Unverified 0MM-IFEngine: Towards Multimodal Instruction Following Apr 10, 2025 Instruction Following
Code Code Available 2VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding Apr 10, 2025 Instruction Following Video Understanding
— Unverified 0Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models Apr 9, 2025 Instruction Following Mathematical Problem-Solving
— Unverified 0Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning Apr 9, 2025 Continual Learning Decoder
Code Code Available 1From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models Apr 8, 2025 In-Context Learning Instruction Following
— Unverified 0Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations Apr 8, 2025 Instruction Following Mixture-of-Experts
— Unverified 0Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators Apr 8, 2025 Instruction Following
— Unverified 0Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models Apr 7, 2025 Dialogue Evaluation Fairness
Code Code Available 2VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Apr 3, 2025 Image Generation Instruction Following
Code Code Available 3CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design Apr 3, 2025 Band Gap Dielectric Constant
Code Code Available 2STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection Apr 3, 2025 Instruction Following Language Modeling
Code Code Available 1The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context Apr 3, 2025 Instruction Following
— Unverified 0