| SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications | Mar 27, 2023 | | CodeCode Available | 2 |
| Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective | Mar 27, 2023 | Image Quality AssessmentNo-Reference Image Quality Assessment | CodeCode Available | 2 |
| Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching | Mar 27, 2023 | DecoderFew-Shot Learning | CodeCode Available | 2 |
| Learned Image Compression with Mixed Transformer-CNN Architectures | Mar 27, 2023 | Image Compression | CodeCode Available | 2 |
| Label-Free Liver Tumor Segmentation | Mar 27, 2023 | SegmentationTumor Segmentation | CodeCode Available | 2 |
| Anti-DreamBooth: Protecting users from personalized text-to-image synthesis | Mar 27, 2023 | Image Generation | CodeCode Available | 2 |
| SimpleNet: A Simple Network for Image Anomaly Detection and Localization | Mar 27, 2023 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 2 |
| High-fidelity 3D Human Digitization from Single 2K Resolution Images | Mar 27, 2023 | 2k3D Human Reconstruction | CodeCode Available | 2 |
| CelebV-Text: A Large-Scale Facial Text-Video Dataset | Mar 26, 2023 | Text GenerationText-to-Video Generation | CodeCode Available | 2 |
| Learning Generative Structure Prior for Blind Text Image Super-resolution | Mar 26, 2023 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation | Mar 26, 2023 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 2 |
| GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents | Mar 26, 2023 | Contrastive LearningGesture Generation | CodeCode Available | 2 |
| OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering | Mar 26, 2023 | | CodeCode Available | 2 |
| You Only Segment Once: Towards Real-Time Panoptic Segmentation | Mar 26, 2023 | DecoderPanoptic Segmentation | CodeCode Available | 2 |
| Human Preference Score: Better Aligning Text-to-Image Models with Human Preference | Mar 25, 2023 | | CodeCode Available | 2 |
| Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels | Mar 25, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters | Mar 25, 2023 | 3D Architecture3D Reconstruction | CodeCode Available | 2 |
| EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies | Mar 25, 2023 | Anomaly DetectionComputational Efficiency | CodeCode Available | 2 |
| MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer | Mar 25, 2023 | Image Generation | CodeCode Available | 2 |
| Conditional Image-to-Video Generation with Latent Flow Diffusion Models | Mar 24, 2023 | Image to Video GenerationMotion Generation | CodeCode Available | 2 |
| TRAK: Attributing Model Behavior at Scale | Mar 24, 2023 | model | CodeCode Available | 2 |
| Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation | Mar 24, 2023 | Text to 3D | CodeCode Available | 2 |
| Query-Dependent Video Representation for Moment Retrieval and Highlight Detection | Mar 24, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale | Mar 24, 2023 | | CodeCode Available | 2 |
| GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning | Mar 24, 2023 | Virtual Try-on | CodeCode Available | 2 |
| FedGH: Heterogeneous Federated Learning with Generalized Global Header | Mar 23, 2023 | Federated LearningPrediction | CodeCode Available | 2 |
| Towards Better Dynamic Graph Learning: New Architecture and Unified Library | Mar 23, 2023 | Dynamic Link PredictionDynamic Node Classification | CodeCode Available | 2 |
| Masked Image Training for Generalizable Deep Image Denoising | Mar 23, 2023 | Deep LearningDenoising | CodeCode Available | 2 |
| NOPE: Novel Object Pose Estimation from a Single Image | Mar 23, 2023 | ObjectPose Estimation | CodeCode Available | 2 |
| ReVersion: Diffusion-Based Relation Inversion from Images | Mar 23, 2023 | Contrastive LearningFew-Shot Learning | CodeCode Available | 2 |
| Neural Preset for Color Style Transfer | Mar 23, 2023 | 4kColor Normalization | CodeCode Available | 2 |
| Learning Human-Inspired Force Strategies for Robotic Assembly | Mar 22, 2023 | | CodeCode Available | 2 |
| Dense Distinct Query for End-to-End Object Detection | Mar 22, 2023 | Objectobject-detection | CodeCode Available | 2 |
| SHERF: Generalizable Human NeRF from a Single Image | Mar 22, 2023 | 3D Human ReconstructionNeRF | CodeCode Available | 2 |
| Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval | Mar 22, 2023 | Image-text matchingLanguage Modeling | CodeCode Available | 2 |
| The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs | Mar 22, 2023 | | CodeCode Available | 2 |
| ExBEHRT: Extended Transformer for Electronic Health Records to Predict Disease Subtypes & Progressions | Mar 22, 2023 | | CodeCode Available | 2 |
| Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions | Mar 22, 2023 | NeRF | CodeCode Available | 2 |
| RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation | Mar 22, 2023 | Code CompletionLanguage Modeling | CodeCode Available | 2 |
| Spherical Transformer for LiDAR-based 3D Recognition | Mar 22, 2023 | 3D Object Detection3D Semantic Segmentation | CodeCode Available | 2 |
| Emotionally Enhanced Talking Face Generation | Mar 21, 2023 | Face GenerationTalking Face Generation | CodeCode Available | 2 |
| CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation | Mar 21, 2023 | Image SegmentationOpen Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Detecting Everything in the Open World: Towards Universal Object Detection | Mar 21, 2023 | object-detectionObject Detection | CodeCode Available | 2 |
| Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection | Mar 21, 2023 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 2 |
| An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds | Mar 21, 2023 | 3D Single Object TrackingAutonomous Driving | CodeCode Available | 2 |
| Learning A Sparse Transformer Network for Effective Image Deraining | Mar 21, 2023 | Image ReconstructionImage Restoration | CodeCode Available | 2 |
| 3D Human Mesh Estimation from Virtual Markers | Mar 21, 2023 | 3D Human Pose Estimation3D Pose Estimation | CodeCode Available | 2 |
| BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements | Mar 21, 2023 | Multi-Task Learning | CodeCode Available | 2 |
| Large AI Models in Health Informatics: Applications, Challenges, and the Future | Mar 21, 2023 | Decision MakingDrug Discovery | CodeCode Available | 2 |
| Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion | Mar 21, 2023 | Optical Flow EstimationScene Flow Estimation | CodeCode Available | 2 |