| Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification | May 20, 2024 | Hyperspectral Image Classificationimage-classification | CodeCode Available | 2 |
| SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | May 20, 2024 | Audio ClassificationGPU | CodeCode Available | 2 |
| Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices | May 20, 2024 | Image GenerationVideo Editing | CodeCode Available | 2 |
| A Simulation Tool for V2G Enabled Demand Response Based on Model Predictive Control | May 20, 2024 | energy managementManagement | CodeCode Available | 2 |
| Diff-BGM: A Diffusion Model for Video Background Music Generation | May 20, 2024 | DiversityMusic Generation | CodeCode Available | 2 |
| Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography | May 20, 2024 | Breast Cancer DetectionDiversity | CodeCode Available | 2 |
| xFinder: Robust and Pinpoint Answer Extraction for Large Language Models | May 20, 2024 | | CodeCode Available | 2 |
| Imp: Highly Capable Large Multimodal Models for Mobile Devices | May 20, 2024 | QuantizationVisual Question Answering | CodeCode Available | 2 |
| AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field | May 20, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music | May 20, 2024 | Synthetic Data Generation | CodeCode Available | 2 |
| MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise | May 20, 2024 | | CodeCode Available | 2 |
| CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization | May 20, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| SEMv3: A Fast and Robust Approach to Table Separation Line Detection | May 20, 2024 | Line Detection | CodeCode Available | 2 |
| MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | May 20, 2024 | College MathematicsGSM8K | CodeCode Available | 2 |
| AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements | May 20, 2024 | 3D Pose EstimationPose Estimation | CodeCode Available | 2 |
| NetMamba: Efficient Network Traffic Classification via Pre-training Unidirectional Mamba | May 19, 2024 | ClassificationFew-Shot Learning | CodeCode Available | 2 |
| Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | May 19, 2024 | 6D Pose EstimationGPU | CodeCode Available | 2 |
| SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization | May 19, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Your Transformer is Secretly Linear | May 19, 2024 | | CodeCode Available | 2 |
| Transcriptomics-guided Slide Representation Learning in Computational Pathology | May 19, 2024 | Contrastive LearningRepresentation Learning | CodeCode Available | 2 |
| MAMCA -- Optimal on Accuracy and Efficiency for Automatic Modulation Classification with Extended Signal Length | May 18, 2024 | DenoisingGPU | CodeCode Available | 2 |
| MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | May 18, 2024 | 3DGSNeRF | CodeCode Available | 2 |
| MapCoder: Multi-Agent Code Generation for Competitive Problem Solving | May 18, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching | May 18, 2024 | 3D GenerationDenoising | CodeCode Available | 2 |
| MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection | May 18, 2024 | Anomaly DetectionDecision Making | CodeCode Available | 2 |
| GinAR: An End-To-End Multivariate Time Series Forecasting Model Suitable for Variable Missing | May 18, 2024 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 2 |
| Heterogeneity-Informed Meta-Parameter Learning for Spatiotemporal Time Series Forecasting | May 17, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Layer-Condensed KV Cache for Efficient Inference of Large Language Models | May 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| TexPainter: Generative Mesh Texturing with Multi-view Consistency | May 17, 2024 | Denoising | CodeCode Available | 2 |
| Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance | May 17, 2024 | Crowd Counting | CodeCode Available | 2 |
| Observational Scaling Laws and the Predictability of Language Model Performance | May 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning | May 17, 2024 | Dictionary Learning | CodeCode Available | 2 |
| Many-Shot In-Context Learning in Multimodal Foundation Models | May 16, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation | May 16, 2024 | | CodeCode Available | 2 |
| SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network | May 16, 2024 | Binary ClassificationDecoder | CodeCode Available | 2 |
| LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery | May 16, 2024 | Bilevel Optimizationscientific discovery | CodeCode Available | 2 |
| Libra: Building Decoupled Vision System on Large Language Models | May 16, 2024 | Image to textLanguage Modeling | CodeCode Available | 2 |
| Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models | May 16, 2024 | Music Generation | CodeCode Available | 2 |
| IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model | May 16, 2024 | Image EnhancementImage Reconstruction | CodeCode Available | 2 |
| HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition | May 16, 2024 | Contrastive LearningSurgical phase recognition | CodeCode Available | 2 |
| PyTorch-IE: Fast and Reproducible Prototyping for Information Extraction | May 16, 2024 | | CodeCode Available | 2 |
| Active Learning with Fully Bayesian Neural Networks for Discontinuous and Nonstationary Data | May 16, 2024 | Active Learningscientific discovery | CodeCode Available | 2 |
| DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data | May 16, 2024 | Data AugmentationDiversity | CodeCode Available | 2 |
| DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection | May 16, 2024 | Adversarial AttackFace Recognition | CodeCode Available | 2 |
| Grounded 3D-LLM with Referent Tokens | May 16, 2024 | Dense CaptioningDiversity | CodeCode Available | 2 |
| SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | May 16, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model | May 15, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| From NeRFs to Gaussian Splats, and Back | May 15, 2024 | SSIM | CodeCode Available | 2 |
| PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models | May 15, 2024 | Benchmarking | CodeCode Available | 2 |
| EchoTracker: Advancing Myocardial Point Tracking in Echocardiography | May 14, 2024 | DiagnosticMotion Estimation | CodeCode Available | 2 |