| Controllable Text Generation for Large Language Models: A Survey | Aug 22, 2024 | AttributePrompt Engineering | CodeCode Available | 3 |
| RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection | Mar 9, 2024 | Anomaly Detectionfeature selection | CodeCode Available | 3 |
| Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields | Mar 14, 2024 | Denoising | CodeCode Available | 3 |
| Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook | Mar 23, 2025 | 3D GenerationMedical Report Generation | CodeCode Available | 3 |
| Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images | Mar 19, 2024 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 3 |
| AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework | Mar 19, 2024 | BenchmarkingFinancial Analysis | CodeCode Available | 3 |
| Rotary Position Embedding for Vision Transformer | Mar 20, 2024 | Position | CodeCode Available | 3 |
| AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation | Mar 21, 2024 | AllBlind All-in-One Image Restoration | CodeCode Available | 3 |
| The Elements of Differentiable Programming | Mar 21, 2024 | | CodeCode Available | 3 |
| Advancing LLM Reasoning Generalists with Preference Trees | Apr 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 3 |
| Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs | Apr 28, 2025 | Synthetic Data Generation | CodeCode Available | 3 |
| OGBench: Benchmarking Offline Goal-Conditioned RL | Oct 26, 2024 | Benchmarkingreinforcement-learning | CodeCode Available | 3 |
| HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention | Apr 9, 2024 | Autonomous DrivingPrediction | CodeCode Available | 3 |
| Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | Apr 10, 2024 | | CodeCode Available | 3 |
| NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving | Apr 11, 2024 | Autonomous DrivingNeRF | CodeCode Available | 3 |
| Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | Nov 5, 2024 | BenchmarkingHallucination | CodeCode Available | 3 |
| VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning | May 28, 2025 | RAG | CodeCode Available | 3 |
| CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models | Apr 24, 2024 | Consistent Character GenerationWord Embeddings | CodeCode Available | 3 |
| ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis | Jan 16, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 3 |
| Efficient Multimodal Large Language Models: A Survey | May 17, 2024 | Edge-computingQuestion Answering | CodeCode Available | 3 |
| CV-VAE: A Compatible Video VAE for Latent Generative Video Models | May 30, 2024 | Quantization | CodeCode Available | 3 |
| MotionLLM: Understanding Human Behaviors from Human Motions and Videos | May 30, 2024 | | CodeCode Available | 3 |
| MaskGIT: Masked Generative Image Transformer | Feb 8, 2022 | DecoderImage Generation | CodeCode Available | 3 |
| Unveiling Encoder-Free Vision-Language Models | Jun 17, 2024 | DecoderInductive Bias | CodeCode Available | 3 |
| VoCo-LLaMA: Towards Vision Compression with Large Language Models | Jun 18, 2024 | Computational EfficiencyQuestion Answering | CodeCode Available | 3 |
| Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models | Jan 16, 2024 | GPUQuantization | CodeCode Available | 3 |
| DF40: Toward Next-Generation Deepfake Detection | Jun 19, 2024 | DeepFake DetectionFace Reenactment | CodeCode Available | 3 |
| Rho-1: Not All Tokens Are What You Need | Apr 11, 2024 | AllContinual Pretraining | CodeCode Available | 3 |
| multiGradICON: A Foundation Model for Multimodal Medical Image Registration | Aug 1, 2024 | AnatomyDeep Learning | CodeCode Available | 3 |
| MANTIS: Interleaved Multi-Image Instruction Tuning | May 2, 2024 | | CodeCode Available | 3 |
| GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization | Jun 24, 2024 | Image ManipulationImage Manipulation Detection | CodeCode Available | 3 |
| HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis | Jun 23, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 3 |
| AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors | Jun 26, 2024 | Diversity | CodeCode Available | 3 |
| YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation | Jul 5, 2024 | Drum TranscriptionDrum Transcription in Music (DTM) | CodeCode Available | 3 |
| EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Jun 28, 2024 | Interactive SegmentationLanguage Modeling | CodeCode Available | 3 |
| Evaluation of Text-to-Video Generation Models: A Dynamics Perspective | Jul 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Apr 9, 2025 | 2kDecision Making | CodeCode Available | 3 |
| OneRestore: A Universal Restoration Framework for Composite Degradation | Jul 5, 2024 | Image DehazingImage Restoration | CodeCode Available | 3 |
| WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks | Jul 7, 2024 | Arithmetic Reasoning | CodeCode Available | 3 |
| Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts | Jul 9, 2024 | 3D Object Editing3D Reconstruction | CodeCode Available | 3 |
| Unified Approach for Hedging Impermanent Loss of Liquidity Provision | Jul 6, 2024 | | CodeCode Available | 3 |
| Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation | Jul 10, 2024 | 3D human pose and shape estimation | CodeCode Available | 3 |
| rLLM: Relational Table Learning with LLMs | Jul 29, 2024 | ClassificationNode Classification | CodeCode Available | 3 |
| WildGaussians: 3D Gaussian Splatting in the Wild | Jul 11, 2024 | 3DGS3D Scene Reconstruction | CodeCode Available | 3 |
| VISA: Reasoning Video Object Segmentation via Large Language Models | Jul 16, 2024 | DecoderObject | CodeCode Available | 3 |
| Scaling Retrieval-Based Language Models with a Trillion-Token Datastore | Jul 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Compact Language Models via Pruning and Knowledge Distillation | Jul 19, 2024 | Knowledge DistillationLanguage Modeling | CodeCode Available | 3 |
| PyABSA: A Modularized Framework for Reproducible Aspect-based Sentiment Analysis | Aug 2, 2022 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 3 |
| Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection | Jul 30, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine | Aug 6, 2024 | Medical Visual Question AnsweringOrgan Detection | CodeCode Available | 3 |