| VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control | Dec 30, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation | Dec 30, 2024 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 2 |
| Edicho: Consistent Image Editing in the Wild | Dec 30, 2024 | Denoising | CodeCode Available | 2 |
| Efficient Parallel Genetic Algorithm for Perturbed Substructure Optimization in Complex Network | Dec 30, 2024 | Combinatorial OptimizationGraph Mining | CodeCode Available | 2 |
| MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks | Dec 29, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| Natural Language Fine-Tuning | Dec 29, 2024 | GSM8KLarge Language Model | CodeCode Available | 2 |
| DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis | Dec 28, 2024 | 3DGSFace Generation | CodeCode Available | 2 |
| OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System | Dec 28, 2024 | | CodeCode Available | 2 |
| MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration | Dec 28, 2024 | DeblurringDenoising | CodeCode Available | 2 |
| Learning an Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking | Dec 28, 2024 | Knowledge DistillationVisual Tracking | CodeCode Available | 2 |
| From Generalist to Specialist: A Survey of Large Language Models for Chemistry | Dec 28, 2024 | scientific discoverySurvey | CodeCode Available | 2 |
| GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Dec 28, 2024 | Camera LocalizationPose Estimation | CodeCode Available | 2 |
| MBQ: Modality-Balanced Quantization for Large Vision-Language Models | Dec 27, 2024 | GPUQuantization | CodeCode Available | 2 |
| Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation | Dec 27, 2024 | Image SegmentationSemantic Segmentation | CodeCode Available | 2 |
| ETTA: Elucidating the Design Space of Text-to-Audio Models | Dec 26, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 |
| SUTrack: Towards Simple and Unified Single Object Tracking | Dec 26, 2024 | Object TrackingRgb-T Tracking | CodeCode Available | 2 |
| RecLM: Recommendation Instruction Tuning | Dec 26, 2024 | Collaborative FilteringDiversity | CodeCode Available | 2 |
| Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment | Dec 26, 2024 | | CodeCode Available | 2 |
| WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting | Dec 25, 2024 | 3DGS3D Reconstruction | CodeCode Available | 2 |
| CGCOD: Class-Guided Camouflaged Object Detection | Dec 25, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Simultaneously Recovering Multi-Person Meshes and Multi-View Cameras with Human Semantics | Dec 25, 2024 | Camera Calibration | CodeCode Available | 2 |
| EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation | Dec 24, 2024 | Image CaptioningImage Generation | CodeCode Available | 2 |
| ZenSVI: An Open-Source Software for the Integrated Acquisition, Processing and Analysis of Street View Imagery Towards Scalable Urban Science | Dec 24, 2024 | | CodeCode Available | 2 |
| Long-Form Speech Generation with Spoken Language Models | Dec 24, 2024 | FormLanguage Modeling | CodeCode Available | 2 |
| Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models | Dec 24, 2024 | Question AnsweringVideo Question Answering | CodeCode Available | 2 |
| 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | Dec 24, 2024 | Natural Language UnderstandingScene Understanding | CodeCode Available | 2 |
| Token-Budget-Aware LLM Reasoning | Dec 24, 2024 | | CodeCode Available | 2 |
| Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection | Dec 23, 2024 | Anomaly DetectionVideo Anomaly Detection | CodeCode Available | 2 |
| Reasoning to Attend: Try to Understand How <SEG> Token Works | Dec 23, 2024 | Semantic SimilaritySemantic Textual Similarity | CodeCode Available | 2 |
| Large Language Model Safety: A Holistic Survey | Dec 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners | Dec 23, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| ActiveGS: Active Scene Reconstruction Using Gaussian Splatting | Dec 23, 2024 | | CodeCode Available | 2 |
| Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark | Dec 23, 2024 | | CodeCode Available | 2 |
| Cross-View Referring Multi-Object Tracking | Dec 23, 2024 | Cross-view Referring Multi-Object TrackingMulti-Object Tracking | CodeCode Available | 2 |
| Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction | Dec 23, 2024 | | CodeCode Available | 2 |
| Evaluation of Bio-Inspired Models under Different Learning Settings For Energy Efficiency in Network Traffic Prediction | Dec 23, 2024 | Privacy PreservingTraffic Prediction | CodeCode Available | 2 |
| DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder | Dec 23, 2024 | | CodeCode Available | 2 |
| Reconstructing People, Places, and Cameras | Dec 23, 2024 | Camera Pose EstimationPose Estimation | CodeCode Available | 2 |
| xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition | Dec 23, 2024 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 2 |
| Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization | Dec 23, 2024 | Position | CodeCode Available | 2 |
| Guided Real Image Dehazing using YCbCr Color Space | Dec 23, 2024 | Image Dehazing | CodeCode Available | 2 |
| Evaluating LLM Reasoning in the Operations Research Domain with ORQA | Dec 22, 2024 | Question Answering | CodeCode Available | 2 |
| Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression | Dec 22, 2024 | 3D Lane DetectionLane Detection | CodeCode Available | 2 |
| An OpenMind for 3D medical vision self-supervised learning | Dec 22, 2024 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 2 |
| Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection | Dec 22, 2024 | | CodeCode Available | 2 |
| OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning | Dec 22, 2024 | | CodeCode Available | 2 |
| Where am I? Cross-View Geo-localization with Natural Language Descriptions | Dec 22, 2024 | geo-localizationImage Retrieval | CodeCode Available | 2 |
| Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching | Dec 22, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting | Dec 22, 2024 | Financial AnalysisLoad Forecasting | CodeCode Available | 2 |
| A Generalizable Anomaly Detection Method in Dynamic Graphs | Dec 21, 2024 | Anomaly DetectionDiversity | CodeCode Available | 2 |