| DH-Mamba: Exploring Dual-domain Hierarchical State Space Models for MRI Reconstruction | Jan 14, 2025 | DiversityMamba | CodeCode Available | 1 |
| OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training | Jan 14, 2025 | | CodeCode Available | 1 |
| Dataset Distillation via Committee Voting | Jan 13, 2025 | Dataset Distillation | CodeCode Available | 1 |
| TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations | Jan 13, 2025 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| Toward Realistic Camouflaged Object Detection: Benchmarks and Method | Jan 13, 2025 | Instance SegmentationObject | CodeCode Available | 1 |
| Estimating Musical Surprisal in Audio | Jan 13, 2025 | EEG | CodeCode Available | 1 |
| Split Federated Learning Empowered Vehicular Edge Intelligence: Concept, Adaptive Design, and Future Directions | Jan 13, 2025 | Edge-computingFederated Learning | CodeCode Available | 1 |
| RePoseD: Efficient Relative Pose Estimation With Known Depth Information | Jan 13, 2025 | Camera Pose EstimationDepth Estimation | CodeCode Available | 1 |
| Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method | Jan 13, 2025 | Anomaly Detection In Surveillance VideosMultiple Instance Learning | CodeCode Available | 1 |
| LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models | Jan 13, 2025 | Autonomous Driving | CodeCode Available | 1 |
| Breaking Memory Limits: Gradient Wavelet Transform Enhances LLMs Training | Jan 13, 2025 | | CodeCode Available | 1 |
| How GPT learns layer by layer | Jan 13, 2025 | NavigateRepresentation Learning | CodeCode Available | 1 |
| A Survey on Dynamic Neural Networks: from Computer Vision to Multi-modal Sensor Fusion | Jan 13, 2025 | Dynamic neural networksModel Compression | CodeCode Available | 1 |
| MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning | Jan 13, 2025 | Causal DiscoveryCausal Inference | CodeCode Available | 1 |
| Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion | Jan 13, 2025 | 3D Semantic Scene CompletionMamba | CodeCode Available | 1 |
| MathReader : Text-to-Speech for Mathematical Documents | Jan 13, 2025 | Optical Character Recognition (OCR)text-to-speech | CodeCode Available | 1 |
| RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | Jan 13, 2025 | Concept AlignmentImage Captioning | CodeCode Available | 1 |
| D3MES: Diffusion Transformer with multihead equivariant self-attention for 3D molecule generation | Jan 13, 2025 | 3D Molecule Generation | CodeCode Available | 1 |
| Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning | Jan 12, 2025 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints | Jan 12, 2025 | Image SegmentationReferring Expression | CodeCode Available | 1 |
| ZNO-Eval: Benchmarking reasoning capabilities of large language models in Ukrainian | Jan 12, 2025 | BenchmarkingMath | CodeCode Available | 1 |
| UR2P-Dehaze: Learning a Simple Image Dehaze Enhancer via Unpaired Rich Physical Prior | Jan 12, 2025 | Image DehazingImage Reconstruction | CodeCode Available | 1 |
| SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training | Jan 12, 2025 | Time Series Forecasting | CodeCode Available | 1 |
| VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning | Jan 12, 2025 | Dense Video CaptioningVideo Captioning | CodeCode Available | 1 |
| CULTURE3D: Cultural Landmarks and Terrain Dataset for 3D Applications | Jan 12, 2025 | NeRF | CodeCode Available | 1 |
| 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes | Jan 12, 2025 | NavigateObject | CodeCode Available | 1 |
| Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping | Jan 11, 2025 | GPULarge Language Model | CodeCode Available | 1 |
| VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification | Jan 11, 2025 | HallucinationLanguage Modeling | CodeCode Available | 1 |
| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| Flash Window Attention: speedup the attention computation for Swin Transformer | Jan 11, 2025 | Computational Efficiency | CodeCode Available | 1 |
| O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning | Jan 11, 2025 | Decision MakingDiagnostic | CodeCode Available | 1 |
| NVS-SQA: Exploring Self-Supervised Quality Representation Learning for Neurally Synthesized Scenes without References | Jan 11, 2025 | NeRFRepresentation Learning | CodeCode Available | 1 |
| Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis | Jan 11, 2025 | AttributeBenchmarking | CodeCode Available | 1 |
| CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection | Jan 11, 2025 | 3D Object Detectionobject-detection | CodeCode Available | 1 |
| Challenging reaction prediction models to generalize to novel chemistry | Jan 11, 2025 | Prediction | CodeCode Available | 1 |
| Exploring Pose-Based Anomaly Detection for Retail Security: A Real-World Shoplifting Dataset and Benchmark | Jan 11, 2025 | Anomaly DetectionPose-based Anomaly Detection | CodeCode Available | 1 |
| HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection | Jan 10, 2025 | DeepFake DetectionFace Swapping | CodeCode Available | 1 |
| EDNet: Edge-Optimized Small Target Detection in UAV Imagery -- Faster Context Attention, Better Feature Fusion, and Hardware Acceleration | Jan 10, 2025 | object-detectionObject Detection | CodeCode Available | 1 |
| Merging Feed-Forward Sublayers for Compressed Transformers | Jan 10, 2025 | image-classificationImage Classification | CodeCode Available | 1 |
| kANNolo: Sweet and Smooth Approximate k-Nearest Neighbors Search | Jan 10, 2025 | Information RetrievalQuantization | CodeCode Available | 1 |
| Understanding Impact of Human Feedback via Influence Functions | Jan 10, 2025 | | CodeCode Available | 1 |
| Interpretable Enzyme Function Prediction via Residue-Level Detection | Jan 10, 2025 | Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION | CodeCode Available | 1 |
| Pose-independent 3D Anthropometry from Sparse Data | Jan 10, 2025 | | CodeCode Available | 1 |
| From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training | Jan 10, 2025 | Reinforcement Learning (RL) | CodeCode Available | 1 |
| From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities | Jan 10, 2025 | Human-Object Interaction DetectionKnowledge Distillation | CodeCode Available | 1 |
| Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages | Jan 10, 2025 | Machine Translation | CodeCode Available | 1 |
| Super-class guided Transformer for Zero-Shot Attribute Classification | Jan 10, 2025 | AttributeClassification | CodeCode Available | 1 |
| ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification | Jan 10, 2025 | Speaker Verification | CodeCode Available | 1 |
| StructSR: Refuse Spurious Details in Real-World Image Super-Resolution | Jan 10, 2025 | Image Super-ResolutionSSIM | CodeCode Available | 1 |
| Learning to generate feasible graphs using graph grammars | Jan 10, 2025 | | CodeCode Available | 1 |