| Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness | Apr 10, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 2 |
| Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping | Apr 9, 2024 | Image RetrievalObject | CodeCode Available | 2 |
| AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval | Apr 9, 2024 | AllInformation Retrieval | CodeCode Available | 2 |
| YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images | Apr 9, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks | Apr 9, 2024 | Answer SelectionLong-Context Understanding | CodeCode Available | 2 |
| GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis | Apr 9, 2024 | Image GenerationZero-shot Generalization | CodeCode Available | 2 |
| Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation | Apr 9, 2024 | Knowledge DistillationLanguage Modeling | CodeCode Available | 2 |
| ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization | Apr 9, 2024 | Colorization | CodeCode Available | 2 |
| Policy-Guided Diffusion | Apr 9, 2024 | | CodeCode Available | 2 |
| Hash3D: Training-free Acceleration for 3D Generation | Apr 9, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 |
| Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion | Apr 9, 2024 | 3D Generation | CodeCode Available | 2 |
| Autonomous Evaluation and Refinement of Digital Agents | Apr 9, 2024 | | CodeCode Available | 2 |
| RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos | Apr 9, 2024 | Mamba | CodeCode Available | 2 |
| GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation | Apr 9, 2024 | Go to AnyThingNavigate | CodeCode Available | 2 |
| SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions | Apr 9, 2024 | | CodeCode Available | 2 |
| Robust Confidence Intervals in Stereo Matching using Possibility Theory | Apr 9, 2024 | Stereo Matching | CodeCode Available | 2 |
| Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation | Apr 9, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? | Apr 9, 2024 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| Test-Time Zero-Shot Temporal Action Localization | Apr 8, 2024 | Action LocalizationLanguage Modelling | CodeCode Available | 2 |
| Evaluating Mathematical Reasoning Beyond Accuracy | Apr 8, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement | Apr 8, 2024 | BinarizationDocument Enhancement | CodeCode Available | 2 |
| TIM: A Time Interval Machine for Audio-Visual Action Recognition | Apr 8, 2024 | Action DetectionAction Recognition | CodeCode Available | 2 |
| ATFNet: Adaptive Time-Frequency Ensembled Network for Long-term Time Series Forecasting | Apr 8, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Dual-Camera Smooth Zoom on Mobile Phones | Apr 7, 2024 | | CodeCode Available | 2 |
| DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology | Apr 7, 2024 | DiagnosticMultiple Instance Learning | CodeCode Available | 2 |
| Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer | Apr 7, 2024 | 3D Human Reconstruction3D Object Reconstruction | CodeCode Available | 2 |
| VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module | Apr 7, 2024 | Image Registration | CodeCode Available | 2 |
| UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection | Apr 7, 2024 | Action DetectionMoment Queries | CodeCode Available | 2 |
| Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM | Apr 7, 2024 | Marine Animal Segmentation | CodeCode Available | 2 |
| 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions | Apr 7, 2024 | 3D Reconstruction | CodeCode Available | 2 |
| Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution | Apr 7, 2024 | DecoderSuper-Resolution | CodeCode Available | 2 |
| Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models | Apr 7, 2024 | Denoising | CodeCode Available | 2 |
| LHU-Net: A Light Hybrid U-Net for Cost-Efficient, High-Performance Volumetric Medical Image Segmentation | Apr 7, 2024 | Computational EfficiencyImage Segmentation | CodeCode Available | 2 |
| Diffusion Time-step Curriculum for One Image to 3D Generation | Apr 6, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 |
| Bridging the Gap Between End-to-End and Two-Step Text Spotting | Apr 6, 2024 | Text Spotting | CodeCode Available | 2 |
| ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming | Apr 6, 2024 | Adversarial RobustnessDialogue Safety Prediction | CodeCode Available | 2 |
| MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | Apr 6, 2024 | Logical ReasoningMath | CodeCode Available | 2 |
| Aligning Diffusion Models by Optimizing Human Utility | Apr 6, 2024 | | CodeCode Available | 2 |
| OmniColor: A Global Camera Pose Optimization Approach of LiDAR-360Camera Fusion for Colorizing Point Clouds | Apr 6, 2024 | 3D Reconstruction | CodeCode Available | 2 |
| InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization | Apr 6, 2024 | valid | CodeCode Available | 2 |
| Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes | Apr 6, 2024 | Point Cloud Registration | CodeCode Available | 2 |
| Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models | Apr 6, 2024 | Image GenerationUnconditional Image Generation | CodeCode Available | 2 |
| MedIAnomaly: A comparative study of anomaly detection in medical images | Apr 6, 2024 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 2 |
| Dynamic Prompt Optimizing for Text-to-Image Generation | Apr 5, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation | Apr 5, 2024 | Image Generation | CodeCode Available | 2 |
| Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models | Apr 5, 2024 | Data Augmentation | CodeCode Available | 2 |
| ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing | Apr 5, 2024 | Image Manipulation | CodeCode Available | 2 |
| Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction | Apr 5, 2024 | graph constructionOpen Information Extraction | CodeCode Available | 2 |
| Hypothesis Generation with Large Language Models | Apr 5, 2024 | Multi-Armed Bandits | CodeCode Available | 2 |
| Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs | Apr 5, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 |