| DaCapo: a modular deep learning framework for scalable 3D image segmentation | Aug 5, 2024 | Image SegmentationManagement | CodeCode Available | 2 |
| ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems | Aug 5, 2024 | AI Agent | CodeCode Available | 2 |
| YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition | Aug 5, 2024 | Action Detection | CodeCode Available | 2 |
| VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge | Aug 5, 2024 | Clinical KnowledgeDiagnostic | CodeCode Available | 2 |
| Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models | Aug 4, 2024 | Hallucination | CodeCode Available | 2 |
| Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models | Aug 4, 2024 | | CodeCode Available | 2 |
| radarODE: An ODE-Embedded Deep Learning Model for Contactless ECG Reconstruction from Millimeter-Wave Radar | Aug 3, 2024 | Decoder | CodeCode Available | 2 |
| VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling | Aug 2, 2024 | Image Generation | CodeCode Available | 2 |
| StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation | Aug 2, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement | Aug 2, 2024 | Image EnhancementLow-Light Image Enhancement | CodeCode Available | 2 |
| CFBench: A Comprehensive Constraints-Following Benchmark for LLMs | Aug 2, 2024 | | CodeCode Available | 2 |
| Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Aug 2, 2024 | cross-modal alignmentMultiple Object Tracking | CodeCode Available | 2 |
| Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning | Aug 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MESA: Effective Matching Redundancy Reduction by Semantic Area Segmentation | Aug 1, 2024 | Patch Matching | CodeCode Available | 2 |
| Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation | Aug 1, 2024 | Open Vocabulary Panoptic SegmentationOpen Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Towards Reliable Advertising Image Generation Using Human Feedback | Aug 1, 2024 | Image Generation | CodeCode Available | 2 |
| TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models | Aug 1, 2024 | | CodeCode Available | 2 |
| Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | Aug 1, 2024 | Math | CodeCode Available | 2 |
| Segment anything model 2: an application to 2D and 3D medical images | Aug 1, 2024 | Computed Tomography (CT)Segmentation | CodeCode Available | 2 |
| DeliLaw: A Chinese Legal Counselling System Based on a Large Language Model | Aug 1, 2024 | ArticlesHallucination | CodeCode Available | 2 |
| Tamper-Resistant Safeguards for Open-Weight LLMs | Aug 1, 2024 | Red TeamingTAR | CodeCode Available | 2 |
| Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention | Aug 1, 2024 | Image Generation | CodeCode Available | 2 |
| TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods | Jul 31, 2024 | | CodeCode Available | 2 |
| CAMAv2: A Vision-Centric Approach for Static Map Element Annotation | Jul 31, 2024 | | CodeCode Available | 2 |
| Detecting, Explaining, and Mitigating Memorization in Diffusion Models | Jul 31, 2024 | Image GenerationMemorization | CodeCode Available | 2 |
| RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining | Jul 31, 2024 | Optical Flow EstimationRain Removal | CodeCode Available | 2 |
| MetaOpenFOAM: an LLM-based multi-agent framework for CFD | Jul 31, 2024 | RAGRetrieval-augmented Generation | CodeCode Available | 2 |
| MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction | Jul 31, 2024 | Autonomous DrivingPrediction | CodeCode Available | 2 |
| MIST: A Simple and Scalable End-To-End 3D Medical Imaging Segmentation Framework | Jul 31, 2024 | 3D Medical Imaging SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training | Jul 31, 2024 | RAGReranking | CodeCode Available | 2 |
| Tabular Data Augmentation for Machine Learning: Progress and Prospects of Embracing Generative AI | Jul 31, 2024 | | CodeCode Available | 2 |
| Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent | Jul 31, 2024 | Translationvalid | CodeCode Available | 2 |
| MSA^2Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation | Jul 31, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| H-Watch: An Open, Connected Platform for AI-Enhanced COVID19 Infection Symptoms Monitoring and Contact Tracing | Jul 31, 2024 | | CodeCode Available | 2 |
| Accelerating Image Super-Resolution Networks with Pixel-Level Classification | Jul 31, 2024 | | CodeCode Available | 2 |
| Revisiting Tampered Scene Text Detection in the Era of Generative AI | Jul 31, 2024 | MisinformationScene Text Detection | CodeCode Available | 2 |
| Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends | Jul 31, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 2 |
| ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning | Jul 30, 2024 | ARCreinforcement-learning | CodeCode Available | 2 |
| Zero Shot Health Trajectory Prediction Using Transformer | Jul 30, 2024 | ICU AdmissionICU Mortality | CodeCode Available | 2 |
| Interpretable Pre-Trained Transformers for Heart Time-Series Data | Jul 30, 2024 | DecoderElectrocardiography (ECG) | CodeCode Available | 2 |
| SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | Jul 30, 2024 | Caption GenerationQuestion Answering | CodeCode Available | 2 |
| Palu: Compressing KV-Cache with Low-Rank Projection | Jul 30, 2024 | GPUQuantization | CodeCode Available | 2 |
| Machine Unlearning in Generative AI: A Survey | Jul 30, 2024 | Machine UnlearningSurvey | CodeCode Available | 2 |
| Autonomous Improvement of Instruction Following Skills via Foundation Models | Jul 30, 2024 | Image GenerationInstruction Following | CodeCode Available | 2 |
| MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls | Jul 30, 2024 | Gesture GenerationMotion Generation | CodeCode Available | 2 |
| Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation | Jul 30, 2024 | DisentanglementMusic Generation | CodeCode Available | 2 |
| XHand: Real-time Expressive Hand Avatar | Jul 30, 2024 | | CodeCode Available | 2 |
| Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network | Jul 29, 2024 | DecoderSuper-Resolution | CodeCode Available | 2 |
| Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities | Jul 29, 2024 | Contrastive LearningDeepFake Detection | CodeCode Available | 2 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 |