| Joint Audio and Speech Understanding | Sep 25, 2023 | | CodeCode Available | 2 | 5 |
| AdaLomo: Low-memory Optimization with Adaptive Learning Rate | Oct 16, 2023 | | CodeCode Available | 2 | 5 |
| Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | Jul 15, 2024 | Code Generation | CodeCode Available | 2 | 5 |
| Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting | Oct 16, 2023 | | CodeCode Available | 2 | 5 |
| Learning for CasADi: Data-driven Models in Numerical Optimization | Dec 10, 2023 | | CodeCode Available | 2 | 5 |
| Tokenize Anything via Prompting | Dec 14, 2023 | DecoderVisual Prompting | CodeCode Available | 2 | 5 |
| Diffusion Models without Classifier-free Guidance | Feb 17, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 2 | 5 |
| FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning | Apr 1, 2025 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 2 | 5 |
| BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models | Jan 20, 2024 | Backdoor Attack | CodeCode Available | 2 | 5 |
| General Flow as Foundation Affordance for Scalable Robot Learning | Jan 21, 2024 | Prediction | CodeCode Available | 2 | 5 |
| VOLoc: Visual Place Recognition by Querying Compressed Lidar Map | Feb 25, 2024 | Pose EstimationTransfer Learning | CodeCode Available | 2 | 5 |
| DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding | Jun 26, 2025 | EEGEeg Decoding | CodeCode Available | 2 | 5 |
| CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification | Mar 14, 2024 | ClassificationCrowd Counting | CodeCode Available | 2 | 5 |
| GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction | Apr 18, 2024 | Graph structure learningJoint Entity and Relation Extraction | CodeCode Available | 2 | 5 |
| RRHF: Rank Responses to Align Language Models with Human Feedback without tears | Apr 11, 2023 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats | Jun 5, 2024 | 3D-Aware Image Synthesis3D Generation | CodeCode Available | 2 | 5 |
| CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning | Jun 7, 2024 | Instruction FollowingMath | CodeCode Available | 2 | 5 |
| Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language | Jun 9, 2024 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 2 | 5 |
| Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language Model | Jun 13, 2024 | DiagnosticImage Retrieval | CodeCode Available | 2 | 5 |
| S^3 -- Semantic Signal Separation | Jun 13, 2024 | blind source separationTopic Models | CodeCode Available | 2 | 5 |
| Dissecting Adversarial Robustness of Multimodal LM Agents | Jun 18, 2024 | Adversarial RobustnessAdversarial Text | CodeCode Available | 2 | 5 |
| BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis | Mar 17, 2024 | 3D GenerationText to 3D | CodeCode Available | 2 | 5 |
| UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer | Nov 17, 2022 | Video Understanding | CodeCode Available | 2 | 5 |
| Deep Learning for Text Style Transfer: A Survey | Nov 1, 2020 | ArticlesDeep Learning | CodeCode Available | 2 | 5 |
| Segment Any Mesh: Zero-shot Mesh Part Segmentation via Lifting Segment Anything 2 to 3D | Aug 24, 2024 | DiversitySegmentation | CodeCode Available | 2 | 5 |
| GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Sep 6, 2024 | 3DGS3D human pose and shape estimation | CodeCode Available | 2 | 5 |
| Integrating Neural Operators with Diffusion Models Improves Spectral Representation in Turbulence Modeling | Sep 13, 2024 | Computational Efficiency | CodeCode Available | 2 | 5 |
| SkinMamba: A Precision Skin Lesion Segmentation Architecture with Cross-Scale Global State Modeling and Frequency Boundary Guidance | Sep 17, 2024 | DecoderLesion Segmentation | CodeCode Available | 2 | 5 |
| Pairwise Comparisons Are All You Need | Mar 13, 2024 | AllFace Image Quality Assessment | CodeCode Available | 2 | 5 |
| CycleBNN: Cyclic Precision Training in Binary Neural Networks | Sep 28, 2024 | Inference Optimization | CodeCode Available | 2 | 5 |
| StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization | Oct 11, 2024 | RAGRetrieval-augmented Generation | CodeCode Available | 2 | 5 |
| Enforcing geometric constraints of virtual normal for depth prediction | Jul 29, 2019 | Depth EstimationDepth Prediction | CodeCode Available | 2 | 5 |
| DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving | Dec 12, 2024 | 4D reconstructionAutonomous Driving | CodeCode Available | 2 | 5 |
| Open Universal Arabic ASR Leaderboard | Dec 18, 2024 | Benchmarking | CodeCode Available | 2 | 5 |
| TryOnDiffusion: A Tale of Two UNets | Jun 14, 2023 | Virtual Try-on | CodeCode Available | 2 | 5 |
| Training Deep Learning Models with Norm-Constrained LMOs | Feb 11, 2025 | Deep Learning | CodeCode Available | 2 | 5 |
| Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning | May 20, 2023 | Logical Reasoning | CodeCode Available | 2 | 5 |
| Facilitating Advanced Sentinel-2 Analysis Through a Simplified Computation of Nadir BRDF Adjusted Reflectance | Apr 24, 2024 | | CodeCode Available | 2 | 5 |
| TrackOcc: Camera-based 4D Panoptic Occupancy Tracking | Mar 11, 2025 | 3D Object TrackingObject Tracking | CodeCode Available | 2 | 5 |
| SlicerNNInteractive: A 3D Slicer extension for nnInteractive | Apr 7, 2025 | Image SegmentationSemantic Segmentation | CodeCode Available | 2 | 5 |
| WavReward: Spoken Dialogue Models With Generalist Reward Evaluators | May 14, 2025 | Spoken Dialogue Systems | CodeCode Available | 2 | 5 |
| balance -- a Python package for balancing biased data samples | Jul 12, 2023 | | CodeCode Available | 2 | 5 |
| Continual Training of Language Models for Few-Shot Learning | Oct 11, 2022 | Continual LearningContinual Pretraining | CodeCode Available | 2 | 5 |
| AutoFormer: Searching Transformers for Visual Recognition | Jul 1, 2021 | AutoMLFine-Grained Image Classification | CodeCode Available | 2 | 5 |
| I-BERT: Integer-only BERT Quantization | Jan 5, 2021 | GPUNatural Language Inference | CodeCode Available | 2 | 5 |
| Flow-Guided Diffusion for Video Inpainting | Nov 26, 2023 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| GNS: A generalizable Graph Neural Network-based simulator for particulate and fluid modeling | Nov 18, 2022 | Graph Neural Network | CodeCode Available | 2 | 5 |
| Probabilistic Time Series Forecasting with Implicit Quantile Networks | Jul 8, 2021 | Probabilistic Time Series ForecastingTime Series | CodeCode Available | 2 | 5 |
| Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents | Apr 19, 2023 | Information RetrievalPassage Ranking | CodeCode Available | 2 | 5 |
| Optimal Transport Tools (OTT): A JAX Toolbox for all things Wasserstein | Jan 28, 2022 | All | CodeCode Available | 2 | 5 |