| Rethinking Diverse Human Preference Learning through Principal Component Analysis | Feb 18, 2025 | | CodeCode Available | 2 |
| S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning | Feb 18, 2025 | Math | CodeCode Available | 2 |
| H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking | Feb 18, 2025 | | CodeCode Available | 2 |
| Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | Feb 18, 2025 | Code GenerationKnowledge Tracing | CodeCode Available | 2 |
| Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Feb 18, 2025 | Image RetrievalQuestion Answering | CodeCode Available | 2 |
| WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects | Feb 18, 2025 | Machine Translation | CodeCode Available | 2 |
| UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design | Feb 18, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| A Survey of Personalized Large Language Models: Progress and Future Directions | Feb 17, 2025 | Emotion RecognitionGeneral Knowledge | CodeCode Available | 2 |
| SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Feb 17, 2025 | parameter-efficient fine-tuning | CodeCode Available | 2 |
| HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation | Feb 17, 2025 | | CodeCode Available | 2 |
| Continuous Diffusion Model for Language Modeling | Feb 17, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| PUGS: Zero-shot Physical Understanding with Gaussian Splatting | Feb 17, 2025 | Friction | CodeCode Available | 2 |
| SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL | Feb 17, 2025 | Few-Shot LearningHeuristic Search | CodeCode Available | 2 |
| BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages | Feb 17, 2025 | Emotion Recognition | CodeCode Available | 2 |
| JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs | Feb 17, 2025 | ImputationIn-Context Learning | CodeCode Available | 2 |
| Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | Feb 17, 2025 | | CodeCode Available | 2 |
| Without Paired Labeled Data: An End-to-End Self-Supervised Paradigm for UAV-View Geo-Localization | Feb 17, 2025 | Computational EfficiencyContrastive Learning | CodeCode Available | 2 |
| Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More | Feb 17, 2025 | | CodeCode Available | 2 |
| Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment | Feb 17, 2025 | HallucinationLogical Reasoning | CodeCode Available | 2 |
| Idiosyncrasies in Large Language Models | Feb 17, 2025 | | CodeCode Available | 2 |
| Diffusion Models without Classifier-free Guidance | Feb 17, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 2 |
| LLM Agents Making Agent Tools | Feb 17, 2025 | | CodeCode Available | 2 |
| X-IL: Exploring the Design Space of Imitation Learning Policies | Feb 17, 2025 | Imitation LearningMamba | CodeCode Available | 2 |
| Image Inversion: A Survey from GANs to Diffusion and Beyond | Feb 17, 2025 | Generative Adversarial NetworkStyle Transfer | CodeCode Available | 2 |
| Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening | Feb 17, 2025 | Denoising | CodeCode Available | 2 |
| Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems | Feb 16, 2025 | Open-Domain Question AnsweringQuestion Answering | CodeCode Available | 2 |
| FinMTEB: Finance Massive Text Embedding Benchmark | Feb 16, 2025 | ArticlesSemantic Textual Similarity | CodeCode Available | 2 |
| NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM | Feb 16, 2025 | NavigateRAG | CodeCode Available | 2 |
| How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training | Feb 16, 2025 | | CodeCode Available | 2 |
| Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First Time | Feb 16, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| MasRouter: Learning to Route LLMs for Multi-Agent Systems | Feb 16, 2025 | HumanEvalmbpp | CodeCode Available | 2 |
| RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation | Feb 16, 2025 | graph constructionKnowledge Graphs | CodeCode Available | 2 |
| D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security | Feb 15, 2025 | Task Planning | CodeCode Available | 2 |
| SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding | Feb 15, 2025 | Question AnsweringStreaming video understanding | CodeCode Available | 2 |
| Process Reward Models for LLM Agents: Practical Framework and Directions | Feb 14, 2025 | | CodeCode Available | 2 |
| A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations | Feb 14, 2025 | Survey | CodeCode Available | 2 |
| MonoForce: Learnable Image-conditioned Physics Engine | Feb 14, 2025 | Model Predictive ControlTrajectory Prediction | CodeCode Available | 2 |
| Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal | Feb 14, 2025 | DenoisingImage Restoration | CodeCode Available | 2 |
| Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning | Feb 14, 2025 | Reinforcement Learning (RL)Skills Assessment | CodeCode Available | 2 |
| DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References | Feb 13, 2025 | Human-Object Interaction DetectionImitation Learning | CodeCode Available | 2 |
| CoSER: Coordinating LLM-Based Persona Simulation of Established Roles | Feb 13, 2025 | | CodeCode Available | 2 |
| DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra | Feb 13, 2025 | DecoderDe novo molecule generation from MS/MS spectrum (bonus chemical formulae) | CodeCode Available | 2 |
| Digi-Q: Learning Q-Value Functions for Training Device-Control Agents | Feb 13, 2025 | Q-LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| Diffusion Models for Molecules: A Survey of Methods and Tasks | Feb 13, 2025 | DiversityDrug Discovery | CodeCode Available | 2 |
| A Judge-free LLM Open-ended Generation Benchmark Based on the Distributional Hypothesis | Feb 13, 2025 | Text Generation | CodeCode Available | 2 |
| CoT-Valve: Length-Compressible Chain-of-Thought Tuning | Feb 13, 2025 | GSM8K | CodeCode Available | 2 |
| Harnessing Vision Models for Time Series Analysis: A Survey | Feb 13, 2025 | SurveyTime Series | CodeCode Available | 2 |
| KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG | Feb 13, 2025 | Knowledge GraphsLarge Language Model | CodeCode Available | 2 |
| TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument | Feb 13, 2025 | Audio GenerationDecoder | CodeCode Available | 2 |
| Unlocking the Potential of Classic GNNs for Graph-level Tasks: Simple Architectures Meet Excellence | Feb 13, 2025 | Graph ClassificationGraph Property Prediction | CodeCode Available | 2 |