| Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation | Dec 18, 2024 | Graph LearningMulti-modal Recommendation | CodeCode Available | 2 | 5 |
| SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models | Apr 25, 2025 | Spatial ReasoningText to 3D | CodeCode Available | 2 | 5 |
| TetWeave: Isosurface Extraction using On-The-Fly Delaunay Tetrahedral Grids for Gradient-Based Mesh Optimization | May 7, 2025 | 3D ReconstructionFairness | CodeCode Available | 2 | 5 |
| MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | Dec 19, 2022 | cross-modal alignmentDenoising | CodeCode Available | 2 | 5 |
| MathOptAI.jl: Embed trained machine learning predictors into JuMP models | Jul 3, 2025 | CPUGaussian Processes | CodeCode Available | 2 | 5 |
| SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | May 20, 2024 | Audio ClassificationGPU | CodeCode Available | 2 | 5 |
| DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | Jun 16, 2025 | 2D Pose EstimationDecoder | CodeCode Available | 2 | 5 |
| On the Role of Attention Heads in Large Language Model Safety | Oct 17, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 | 5 |
| Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | Jan 17, 2024 | GPUImage Classification | CodeCode Available | 2 | 5 |
| All-in-one foundational models learning across quantum chemical levels | Sep 18, 2024 | AllCloud Computing | CodeCode Available | 2 | 5 |
| Reinforcement learning-based motion imitation for physiologically plausible musculoskeletal motor control | Mar 18, 2025 | Humanoid ControlMotion Synthesis | CodeCode Available | 2 | 5 |
| Simplified and Generalized Masked Diffusion for Discrete Data | Jun 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Dec 14, 2024 | Mixture-of-ExpertsObject | CodeCode Available | 2 | 5 |
| TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation | Feb 11, 2025 | Image Generation | CodeCode Available | 2 | 5 |
| Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection | May 19, 2025 | Event-based visionObject | CodeCode Available | 2 | 5 |
| V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization | Nov 5, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 | 5 |
| BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Dec 10, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 2 | 5 |
| HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions | Jul 28, 2022 | Image ClassificationObject Detection | CodeCode Available | 2 | 5 |
| FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching | Mar 24, 2025 | Weakly-supervised Learning | CodeCode Available | 2 | 5 |
| Open-Vocabulary DETR with Conditional Matching | Mar 22, 2022 | Language Modellingobject-detection | CodeCode Available | 2 | 5 |
| BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis | Nov 13, 2024 | NeRFNovel View Synthesis | CodeCode Available | 2 | 5 |
| Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Feb 24, 2025 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards | Jul 12, 2025 | | CodeCode Available | 2 | 5 |
| Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss | Jan 5, 2024 | Knowledge Distillation | CodeCode Available | 2 | 5 |
| Combinatorial Client-Master Multiagent Deep Reinforcement Learning for Task Offloading in Mobile Edge Computing | Feb 18, 2024 | Deep Reinforcement LearningEdge-computing | CodeCode Available | 2 | 5 |
| CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model | Feb 6, 2024 | DecoderImage Segmentation | CodeCode Available | 2 | 5 |
| HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset | Dec 3, 2024 | 3D Generation | CodeCode Available | 2 | 5 |
| ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models | Jun 26, 2024 | Classification | CodeCode Available | 2 | 5 |
| mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | Feb 12, 2025 | cross-modal alignmentLarge Language Model | CodeCode Available | 2 | 5 |
| Multiview Scene Graph | Oct 15, 2024 | DecoderObject | CodeCode Available | 2 | 5 |
| MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation | Nov 22, 2024 | Video Generation | CodeCode Available | 2 | 5 |
| N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting | Jan 30, 2022 | Time SeriesTime Series Analysis | CodeCode Available | 2 | 5 |
| DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection | Sep 24, 2024 | Depression DetectionMamba | CodeCode Available | 2 | 5 |
| Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models | Feb 26, 2024 | | CodeCode Available | 2 | 5 |
| Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors | Jul 13, 2024 | Super-ResolutionVideo Super-Resolution | CodeCode Available | 2 | 5 |
| DiMeR: Disentangled Mesh Reconstruction Model | Apr 24, 2025 | Image to 3Dmodel | CodeCode Available | 2 | 5 |
| Can Large Language Model Agents Simulate Human Trust Behavior? | Feb 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing | Mar 14, 2025 | | CodeCode Available | 2 | 5 |
| Fast Best-of-N Decoding via Speculative Rejection | Oct 26, 2024 | | CodeCode Available | 2 | 5 |
| Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness | Apr 10, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 2 | 5 |
| On the Generalization of BasicVSR++ to Video Deblurring and Denoising | Apr 11, 2022 | DeblurringDenoising | CodeCode Available | 2 | 5 |
| Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation | May 4, 2025 | Knowledge DistillationMultivariate Time Series Forecasting | CodeCode Available | 2 | 5 |
| PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation | Jan 23, 2024 | DecoderImage Segmentation | CodeCode Available | 2 | 5 |
| One Quantizer is Enough: Toward a Lightweight Audio Codec | Apr 7, 2025 | | CodeCode Available | 2 | 5 |
| Side Adapter Network for Open-Vocabulary Semantic Segmentation | Feb 23, 2023 | Language ModellingOpen Vocabulary Semantic Segmentation | CodeCode Available | 2 | 5 |
| Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems | Jan 8, 2025 | Bayesian InferenceVariational Inference | CodeCode Available | 2 | 5 |
| More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity | Jul 7, 2022 | Object DetectionSemantic Segmentation | CodeCode Available | 2 | 5 |
| KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? | Aug 21, 2024 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 | 5 |
| Mobius: Text to Seamless Looping Video Generation via Latent Shift | Feb 27, 2025 | DenoisingVideo Generation | CodeCode Available | 2 | 5 |
| PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals | May 30, 2024 | | CodeCode Available | 2 | 5 |