| MACE: Mass Concept Erasure in Diffusion Models | Mar 10, 2024 | Text-to-Image Generation | CodeCode Available | 3 |
| Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models | Mar 10, 2024 | Visual Question Answering | CodeCode Available | 3 |
| What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? | Mar 10, 2024 | Depth EstimationImage Matting | CodeCode Available | 3 |
| uniGradICON: A Foundation Model for Medical Image Registration | Mar 9, 2024 | Image RegistrationMedical Image Registration | CodeCode Available | 3 |
| RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection | Mar 9, 2024 | Anomaly Detectionfeature selection | CodeCode Available | 3 |
| LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation | Mar 8, 2024 | Image SegmentationMamba | CodeCode Available | 3 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 |
| Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | Mar 8, 2024 | 1 Image, 2*2 StitchingCode Generation | CodeCode Available | 3 |
| Unbiased Estimator for Distorted Conics in Camera Calibration | Mar 7, 2024 | Camera Calibration | CodeCode Available | 3 |
| Embodied Understanding of Driving Scenarios | Mar 7, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 3 |
| Bridging Language and Items for Retrieval and Recommendation | Mar 6, 2024 | RetrievalSentence | CodeCode Available | 3 |
| KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | Mar 5, 2024 | HallucinationSelf-Learning | CodeCode Available | 3 |
| PromptKD: Unsupervised Prompt Distillation for Vision-Language Models | Mar 5, 2024 | Knowledge DistillationPrompt Engineering | CodeCode Available | 3 |
| Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | Mar 5, 2024 | Image Generation | CodeCode Available | 3 |
| Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling | Mar 5, 2024 | Mamba | CodeCode Available | 3 |
| Learning to Use Tools via Cooperative and Interactive Agents | Mar 5, 2024 | | CodeCode Available | 3 |
| NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models | Mar 5, 2024 | QuantizationSpeech Synthesis | CodeCode Available | 3 |
| Behavior Generation with Latent Actions | Mar 5, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 3 |
| Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models | Mar 5, 2024 | TextVQAVisual Question Answering | CodeCode Available | 3 |
| Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve | Mar 4, 2024 | GPUScheduling | CodeCode Available | 3 |
| Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents | Mar 4, 2024 | Contrastive Learning | CodeCode Available | 3 |
| NeuSpeech: Decode Neural signal as Speech | Mar 4, 2024 | Brain Computer InterfaceEEG | CodeCode Available | 3 |
| ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models | Mar 4, 2024 | DenoisingImage Generation | CodeCode Available | 3 |
| Diffusion-TS: Interpretable Diffusion for General Time Series Generation | Mar 4, 2024 | Audio SynthesisDecoder | CodeCode Available | 3 |
| Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation | Mar 4, 2024 | Age And Gender ClassificationAge and Gender Estimation | CodeCode Available | 3 |
| Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review | Mar 4, 2024 | Medical Report GenerationQuestion Answering | CodeCode Available | 3 |
| The Hidden Attention of Mamba Models | Mar 3, 2024 | Mamba | CodeCode Available | 3 |
| 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos | Mar 3, 2024 | 3DGSNeural Rendering | CodeCode Available | 3 |
| Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey | Mar 3, 2024 | Property Prediction | CodeCode Available | 3 |
| DUFOMap: Efficient Dynamic Awareness Mapping | Mar 3, 2024 | Computational Efficiency | CodeCode Available | 3 |
| SynCode: LLM Generation with Grammar Augmentation | Mar 3, 2024 | Code Generationvalid | CodeCode Available | 3 |
| Logit Standardization in Knowledge Distillation | Mar 3, 2024 | Knowledge Distillation | CodeCode Available | 3 |
| GuardT2I: Defending Text-to-Image Models from Adversarial Prompts | Mar 3, 2024 | Binary ClassificationLanguage Modeling | CodeCode Available | 3 |
| Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling | Mar 2, 2024 | | CodeCode Available | 3 |
| IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact | Mar 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| OpenGraph: Towards Open Graph Foundation Models | Mar 2, 2024 | Data AugmentationGraph Learning | CodeCode Available | 3 |
| DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions | Mar 2, 2024 | Neural Architecture Search | CodeCode Available | 3 |
| ptwt - The PyTorch Wavelet Toolbox | Mar 1, 2024 | | CodeCode Available | 3 |
| VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks | Mar 1, 2024 | Image ClassificationImage Generation | CodeCode Available | 3 |
| Theoretically Achieving Continuous Representation of Oriented Bounding Boxes | Feb 29, 2024 | Fairnessobject-detection | CodeCode Available | 3 |
| BigGait: Learning Gait Representation You Want by Large Vision Models | Feb 29, 2024 | Gait Recognition | CodeCode Available | 3 |
| CAMixerSR: Only Details Need More "Attention" | Feb 29, 2024 | 2k8k | CodeCode Available | 3 |
| ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL | Feb 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Towards Generalizable Tumor Synthesis | Feb 29, 2024 | Computed Tomography (CT) | CodeCode Available | 3 |
| Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping | Feb 29, 2024 | Image Generation | CodeCode Available | 3 |
| RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks | Feb 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis | Feb 28, 2024 | DecoderImage Generation | CodeCode Available | 3 |
| CLLMs: Consistency Large Language Models | Feb 28, 2024 | | CodeCode Available | 3 |
| Diffusion Language Models Are Versatile Protein Learners | Feb 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Simple linear attention language models balance the recall-throughput tradeoff | Feb 28, 2024 | Language ModellingMamba | CodeCode Available | 3 |