| Generative Multimodal Models are In-Context Learners | Dec 20, 2023 | In-Context LearningPersonalized Image Generation | CodeCode Available | 3 |
| Attention is not not Explanation | Aug 13, 2019 | Decision MakingDiagnostic | CodeCode Available | 3 |
| Evaluating Language Model Agency through Negotiations | Jan 9, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 3 |
| DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection | Jan 4, 2024 | DecoderDenoising | CodeCode Available | 3 |
| Pheme: Efficient and Conversational Speech Generation | Jan 5, 2024 | | CodeCode Available | 3 |
| Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Jan 17, 2024 | Task Planning | CodeCode Available | 3 |
| VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks | Jan 24, 2024 | | CodeCode Available | 3 |
| FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design | Jan 25, 2024 | GPUQuantization | CodeCode Available | 3 |
| SliceGPT: Compress Large Language Models by Deleting Rows and Columns | Jan 26, 2024 | | CodeCode Available | 3 |
| Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation | Jan 31, 2024 | Hierarchical Text Segmentationparameter-efficient fine-tuning | CodeCode Available | 3 |
| LongAlign: A Recipe for Long Context Alignment of Large Language Models | Jan 31, 2024 | DiversityInstruction Following | CodeCode Available | 3 |
| Noise Contrastive Alignment of Language Models with Explicit Rewards | Feb 8, 2024 | Language ModellingMath | CodeCode Available | 3 |
| HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting | Feb 9, 2024 | | CodeCode Available | 3 |
| Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting | Feb 4, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 3 |
| PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models | Feb 12, 2024 | Answer GenerationHallucination | CodeCode Available | 3 |
| Magic-Me: Identity-Specific Video Customized Diffusion | Feb 14, 2024 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| BitDelta: Your Fine-Tune May Only Be Worth One Bit | Feb 15, 2024 | GPU | CodeCode Available | 3 |
| QuRating: Selecting High-Quality Data for Training Language Models | Feb 15, 2024 | In-Context Learning | CodeCode Available | 3 |
| LLMDFA: Analyzing Dataflow in Code with Large Language Models | Feb 16, 2024 | Hallucination | CodeCode Available | 3 |
| Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive | Feb 20, 2024 | | CodeCode Available | 3 |
| Codec-SUPERB: An In-Depth Analysis of Sound Codec Models | Feb 20, 2024 | | CodeCode Available | 3 |
| Towards Building Multilingual Language Model for Medicine | Feb 21, 2024 | Domain AdaptationLanguage Modeling | CodeCode Available | 3 |
| ChatMusician: Understanding and Generating Music Intrinsically with LLM | Feb 25, 2024 | MMLUText Generation | CodeCode Available | 3 |
| Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction | Feb 27, 2024 | Autonomous Driving | CodeCode Available | 3 |
| Explicit Interaction for Fusion-Based Place Recognition | Feb 27, 2024 | Autonomous Vehicles | CodeCode Available | 3 |
| Diffusion Language Models Are Versatile Protein Learners | Feb 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| CAMixerSR: Only Details Need More "Attention" | Feb 29, 2024 | 2k8k | CodeCode Available | 3 |
| CLLMs: Consistency Large Language Models | Feb 28, 2024 | | CodeCode Available | 3 |
| SynCode: LLM Generation with Grammar Augmentation | Mar 3, 2024 | Code Generationvalid | CodeCode Available | 3 |
| Controllable Text Generation for Large Language Models: A Survey | Aug 22, 2024 | AttributePrompt Engineering | CodeCode Available | 3 |
| RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection | Mar 9, 2024 | Anomaly Detectionfeature selection | CodeCode Available | 3 |
| Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields | Mar 14, 2024 | Denoising | CodeCode Available | 3 |
| Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook | Mar 23, 2025 | 3D GenerationMedical Report Generation | CodeCode Available | 3 |
| Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images | Mar 19, 2024 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 3 |
| AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework | Mar 19, 2024 | BenchmarkingFinancial Analysis | CodeCode Available | 3 |
| Rotary Position Embedding for Vision Transformer | Mar 20, 2024 | Position | CodeCode Available | 3 |
| AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation | Mar 21, 2024 | AllBlind All-in-One Image Restoration | CodeCode Available | 3 |
| The Elements of Differentiable Programming | Mar 21, 2024 | | CodeCode Available | 3 |
| Advancing LLM Reasoning Generalists with Preference Trees | Apr 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 3 |
| Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs | Apr 28, 2025 | Synthetic Data Generation | CodeCode Available | 3 |
| OGBench: Benchmarking Offline Goal-Conditioned RL | Oct 26, 2024 | Benchmarkingreinforcement-learning | CodeCode Available | 3 |
| HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention | Apr 9, 2024 | Autonomous DrivingPrediction | CodeCode Available | 3 |
| Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | Apr 10, 2024 | | CodeCode Available | 3 |
| NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving | Apr 11, 2024 | Autonomous DrivingNeRF | CodeCode Available | 3 |
| Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | Nov 5, 2024 | BenchmarkingHallucination | CodeCode Available | 3 |
| VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning | May 28, 2025 | RAG | CodeCode Available | 3 |
| CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models | Apr 24, 2024 | Consistent Character GenerationWord Embeddings | CodeCode Available | 3 |
| ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis | Jan 16, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 3 |
| Efficient Multimodal Large Language Models: A Survey | May 17, 2024 | Edge-computingQuestion Answering | CodeCode Available | 3 |
| CV-VAE: A Compatible Video VAE for Latent Generative Video Models | May 30, 2024 | Quantization | CodeCode Available | 3 |