| NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields | Apr 1, 2024 | 3D Object DetectionNeRF | CodeCode Available | 2 |
| Query2CAD: Generating CAD models using natural language queries | May 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers | Jan 7, 2025 | DiversityText-to-Video Generation | CodeCode Available | 2 |
| What is the Role of Small Models in the LLM Era: A Survey | Sep 10, 2024 | | CodeCode Available | 2 |
| Methods for Detoxification of Texts for the Russian Language | May 19, 2021 | Style Transfer | CodeCode Available | 2 |
| NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation | Mar 3, 2022 | DecoderDepth Estimation | CodeCode Available | 2 |
| GTA: A Benchmark for General Tool Agents | Jul 11, 2024 | | CodeCode Available | 2 |
| Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers | Dec 13, 2023 | 3D Question Answering (3D-QA)Attribute | CodeCode Available | 2 |
| Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding | Feb 3, 2025 | Quantization | CodeCode Available | 2 |
| Sketch and Refine: Towards Fast and Accurate Lane Detection | Jan 26, 2024 | Lane Detection | CodeCode Available | 2 |
| Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On | Apr 1, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention | Jan 12, 2023 | Image Dehazing | CodeCode Available | 2 |
| FairyGen: Storied Cartoon Video from a Single Child-Drawn Character | Jun 26, 2025 | | CodeCode Available | 2 |
| MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models | May 21, 2025 | Computational Efficiency | CodeCode Available | 2 |
| Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers | Oct 9, 2024 | DecoderRe-Ranking | CodeCode Available | 2 |
| DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization | May 18, 2025 | Mathematical Reasoning | CodeCode Available | 2 |
| Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems | Feb 24, 2025 | Computational EfficiencyPDE Surrogate Modeling | CodeCode Available | 2 |
| TripleMixer: A 3D Point Cloud Denoising Model for Adverse Weather | Aug 25, 2024 | Autonomous DrivingDenoising | CodeCode Available | 2 |
| Mamba Meets Financial Markets: A Graph-Mamba Approach for Stock Price Prediction | Sep 26, 2024 | MambaPrediction | CodeCode Available | 2 |
| Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening | Feb 17, 2025 | Denoising | CodeCode Available | 2 |
| Audio-Synchronized Visual Animation | Mar 8, 2024 | | CodeCode Available | 2 |
| InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning | Mar 8, 2023 | Semantic Segmentation | CodeCode Available | 2 |
| SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design | Jan 29, 2024 | CPUGPU | CodeCode Available | 2 |
| LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding | Jun 29, 2023 | 16kImage Captioning | CodeCode Available | 2 |
| Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens | Nov 23, 2024 | Hallucination | CodeCode Available | 2 |
| MaskBit: Embedding-free Image Generation via Bit Tokens | Sep 24, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 2 |
| True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning | Jan 25, 2024 | Decision MakingReinforcement Learning (RL) | CodeCode Available | 2 |
| Emulating Self-attention with Convolution for Efficient Image Super-Resolution | Mar 9, 2025 | Computational EfficiencyImage Super-Resolution | CodeCode Available | 2 |
| GuardReasoner: Towards Reasoning-based LLM Safeguards | Jan 30, 2025 | | CodeCode Available | 2 |
| RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction | Mar 8, 2024 | Audio GenerationComputational Efficiency | CodeCode Available | 2 |
| PPSURF: Combining Patches and Point Convolutions for Detailed Surface Reconstruction | Jan 16, 2024 | Surface Reconstruction | CodeCode Available | 2 |
| Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse | Sep 17, 2024 | In-Context LearningRAG | CodeCode Available | 2 |
| Matryoshka Query Transformer for Large Vision-Language Models | May 29, 2024 | Language ModellingRepresentation Learning | CodeCode Available | 2 |
| Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery | Apr 14, 2024 | Change DetectionEdge Detection | CodeCode Available | 2 |
| DiffusionInst: Diffusion Model for Instance Segmentation | Dec 6, 2022 | DenoisingInstance Segmentation | CodeCode Available | 2 |
| Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues | Apr 12, 2024 | Data AugmentationFace Anti-Spoofing | CodeCode Available | 2 |
| Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation | Jan 1, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation | Mar 17, 2023 | DecoderImage Segmentation | CodeCode Available | 2 |
| Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting | May 28, 2022 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| In-Context Language Learning: Architectures and Algorithms | Jan 23, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 2 |
| Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement | Aug 2, 2024 | Image EnhancementLow-Light Image Enhancement | CodeCode Available | 2 |
| Fin-GAN: forecasting and classifying financial time series via generative adversarial networks | Jan 31, 2024 | Generative Adversarial NetworkProbabilistic Time Series Forecasting | CodeCode Available | 2 |
| INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models | Jun 7, 2023 | | CodeCode Available | 2 |
| SpaceByte: Towards Deleting Tokenization from Large Language Modeling | Apr 22, 2024 | DecoderLanguage Modeling | CodeCode Available | 2 |
| Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator | Dec 20, 2023 | Data Augmentationobject-detection | CodeCode Available | 2 |
| Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering | Nov 25, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction | Jul 30, 2021 | Click-Through Rate Prediction | CodeCode Available | 2 |
| Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data | Jun 6, 2024 | DenoisingLanguage Modeling | CodeCode Available | 2 |
| KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application | May 28, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| When Attention Sink Emerges in Language Models: An Empirical View | Oct 14, 2024 | Quantization | CodeCode Available | 2 |