| PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling | Mar 24, 2024 | NeRFNovel View Synthesis | CodeCode Available | 2 |
| SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking | Mar 24, 2024 | Object TrackingRgb-T Tracking | CodeCode Available | 2 |
| EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World | Mar 24, 2024 | Action AnticipationAction Quality Assessment | CodeCode Available | 2 |
| Omni-Kernel Network for Image Restoration | Mar 24, 2024 | DeblurringImage Defocus Deblurring | CodeCode Available | 2 |
| The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization | Mar 24, 2024 | reinforcement-learning | CodeCode Available | 2 |
| Towards Large-Scale Training of Pathology Foundation Models | Mar 24, 2024 | Nuclear SegmentationSelf-Supervised Learning | CodeCode Available | 2 |
| Space Group Informed Transformer for Crystalline Materials Generation | Mar 23, 2024 | | CodeCode Available | 2 |
| Adaptive Super Resolution For One-Shot Talking-Head Generation | Mar 23, 2024 | DecoderSuper-Resolution | CodeCode Available | 2 |
| In-Context Matting | Mar 23, 2024 | Image Matting | CodeCode Available | 2 |
| An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning | Mar 23, 2024 | Federated LearningTransfer Learning | CodeCode Available | 2 |
| Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation | Mar 22, 2024 | Earth Observation | CodeCode Available | 2 |
| Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers | Mar 22, 2024 | Information Retrieval | CodeCode Available | 2 |
| LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels | Mar 22, 2024 | 3D Semantic SegmentationLIDAR Semantic Segmentation | CodeCode Available | 2 |
| MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis | Mar 22, 2024 | Medical DiagnosisMedical Visual Question Answering | CodeCode Available | 2 |
| FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions | Mar 22, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models | Mar 22, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Mar 22, 2024 | 3D Generationglobal-optimization | CodeCode Available | 2 |
| Transfer CLIP for Generalizable Image Denoising | Mar 22, 2024 | DecoderDenoising | CodeCode Available | 2 |
| LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | Mar 22, 2024 | Data AugmentationGSM8K | CodeCode Available | 2 |
| YOLOv5-6D: Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Mar 22, 2024 | 6D Pose Estimation using RGBGPU | CodeCode Available | 2 |
| Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt | Mar 22, 2024 | Data AugmentationTime Series | CodeCode Available | 2 |
| Construction of a Japanese Financial Benchmark for Large Language Models | Mar 22, 2024 | | CodeCode Available | 2 |
| Shadow Generation for Composite Image Using Diffusion model | Mar 22, 2024 | Image-to-Image Translation | CodeCode Available | 2 |
| MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection | Mar 21, 2024 | Anomaly DetectionAnomaly Detection In Surveillance Videos | CodeCode Available | 2 |
| SoftPatch: Unsupervised Anomaly Detection with Noisy Data | Mar 21, 2024 | Anomaly DetectionUnsupervised Anomaly Detection | CodeCode Available | 2 |
| View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network | Mar 21, 2024 | Person Re-Identification | CodeCode Available | 2 |
| Volumetric Environment Representation for Vision-Language Navigation | Mar 21, 2024 | 3D geometryMulti-Task Learning | CodeCode Available | 2 |
| Protein Conformation Generation via Force-Guided SE(3) Diffusion Models | Mar 21, 2024 | Diversity | CodeCode Available | 2 |
| AutoRE: Document-Level Relation Extraction with Large Language Models | Mar 21, 2024 | Document-level Relation ExtractionRelation | CodeCode Available | 2 |
| SyncTweedies: A General Generative Framework Based on Synchronized Diffusions | Mar 21, 2024 | Denoising | CodeCode Available | 2 |
| Understanding the Ranking Loss for Recommendation with Sparse User Feedback | Mar 21, 2024 | Binary ClassificationClick-Through Rate Prediction | CodeCode Available | 2 |
| Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models | Mar 21, 2024 | Image GenerationSemantic Segmentation | CodeCode Available | 2 |
| Model Uncertainty in Evolutionary Optimization and Bayesian Optimization: A Comparative Analysis | Mar 21, 2024 | Bayesian Optimization | CodeCode Available | 2 |
| Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization | Mar 21, 2024 | geo-localizationRe-Ranking | CodeCode Available | 2 |
| SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks | Mar 21, 2024 | | CodeCode Available | 2 |
| Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data | Mar 20, 2024 | Memorization | CodeCode Available | 2 |
| Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking | Mar 20, 2024 | 3D Multi-Object TrackingCPU | CodeCode Available | 2 |
| LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models | Mar 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Certified Human Trajectory Prediction | Mar 20, 2024 | Autonomous VehiclesPrediction | CodeCode Available | 2 |
| Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation | Mar 20, 2024 | Semantic SegmentationWeakly supervised Semantic Segmentation | CodeCode Available | 2 |
| vid-TLDR: Training Free Token merging for Light-weight Video Transformer | Mar 20, 2024 | Action RecognitionComputational Efficiency | CodeCode Available | 2 |
| AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior | Mar 20, 2024 | | CodeCode Available | 2 |
| RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition | Mar 20, 2024 | Contrastive LearningFine-Grained Visual Recognition | CodeCode Available | 2 |
| SocialBench: Sociality Evaluation of Role-Playing Conversational Agents | Mar 20, 2024 | | CodeCode Available | 2 |
| TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer | Mar 20, 2024 | Keyword Spotting | CodeCode Available | 2 |
| Nellie: Automated organelle segmentation, tracking, and hierarchical feature extraction in 2D/3D live-cell microscopy | Mar 20, 2024 | | CodeCode Available | 2 |
| PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns | Mar 20, 2024 | Multimodal Reasoning | CodeCode Available | 2 |
| DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance | Mar 20, 2024 | | CodeCode Available | 2 |
| Diversified and Personalized Multi-rater Medical Image Segmentation | Mar 20, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| eRST: A Signaled Graph Theory of Discourse Relations and Organization | Mar 20, 2024 | | CodeCode Available | 2 |