| EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera | May 14, 2024 | Depth EstimationSurface Reconstruction | CodeCode Available | 2 |
| Learning Multi-Agent Communication from Graph Modeling Perspective | May 14, 2024 | | CodeCode Available | 2 |
| ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association | May 14, 2024 | 3D Multi-Object TrackingDecoder | CodeCode Available | 2 |
| EchoTracker: Advancing Myocardial Point Tracking in Echocardiography | May 14, 2024 | DiagnosticMotion Estimation | CodeCode Available | 2 |
| Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation | May 14, 2024 | Decoder | CodeCode Available | 2 |
| Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark | May 14, 2024 | | CodeCode Available | 2 |
| GREEN: a lightweight architecture using learnable wavelets and Riemannian geometry for biomarker exploration | May 14, 2024 | EEG | CodeCode Available | 2 |
| Autonomous clustering by fast find of mass and distance peaks | May 13, 2024 | AstronomyClustering | CodeCode Available | 2 |
| FreeVA: Offline MLLM as Training-Free Video Assistant | May 13, 2024 | FairnessQuestion Answering | CodeCode Available | 2 |
| OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition | May 13, 2024 | Decision MakingLoop Closure Detection | CodeCode Available | 2 |
| GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting | May 13, 2024 | 3D scene EditingVirtual Try-on | CodeCode Available | 2 |
| Evaluation of Retrieval-Augmented Generation: A Survey | May 13, 2024 | Information RetrievalRAG | CodeCode Available | 2 |
| AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention | May 13, 2024 | BlockingCPU | CodeCode Available | 2 |
| CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution | May 13, 2024 | DiversityImage Super-Resolution | CodeCode Available | 2 |
| DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation | May 13, 2024 | 3D GenerationDecoder | CodeCode Available | 2 |
| Zero-Shot Tokenizer Transfer | May 13, 2024 | XLM-R | CodeCode Available | 2 |
| Transferable Neural Wavefunctions for Solids | May 13, 2024 | Variational Monte Carlo | CodeCode Available | 2 |
| RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors | May 13, 2024 | Adversarial RobustnessText Detection | CodeCode Available | 2 |
| Localizing Task Information for Improved Model Merging and Compression | May 13, 2024 | Task Arithmetic | CodeCode Available | 2 |
| PHUDGE: Phi-3 as Scalable Judge | May 12, 2024 | Data Augmentation | CodeCode Available | 2 |
| Learnable Item Tokenization for Generative Recommendation | May 12, 2024 | DiversityWorld Knowledge | CodeCode Available | 2 |
| BoQ: A Place is Worth a Bag of Learnable Queries | May 12, 2024 | Image Similarity SearchRetrieval | CodeCode Available | 2 |
| IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization | May 11, 2024 | Sound Source Localization | CodeCode Available | 2 |
| Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training | May 11, 2024 | | CodeCode Available | 2 |
| MRSegmentator: Multi-Modality Segmentation of 40 Classes in MRI and CT | May 10, 2024 | Model OptimizationOrgan Segmentation | CodeCode Available | 2 |
| Self-Consistent Recursive Diffusion Bridge for Medical Image Translation | May 10, 2024 | DenoisingScheduling | CodeCode Available | 2 |
| Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation | May 10, 2024 | Semantic Segmentation | CodeCode Available | 2 |
| What Can Natural Language Processing Do for Peer Review? | May 10, 2024 | Articles | CodeCode Available | 2 |
| Linearizing Large Language Models | May 10, 2024 | In-Context LearningMamba | CodeCode Available | 2 |
| GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs | May 10, 2024 | graph constructionimage-classification | CodeCode Available | 2 |
| Learning A Spiking Neural Network for Efficient Image Deraining | May 10, 2024 | Image ReconstructionRain Removal | CodeCode Available | 2 |
| Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting | May 10, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| PLeak: Prompt Leaking Attacks against Large Language Model Applications | May 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention | May 10, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| State-Free Inference of State-Space Models: The Transfer Function Approach | May 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Memory Mosaics | May 10, 2024 | DisentanglementIn-Context Learning | CodeCode Available | 2 |
| Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba | May 9, 2024 | Action RecognitionMamba | CodeCode Available | 2 |
| OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs | May 9, 2024 | BenchmarkingFact Checking | CodeCode Available | 2 |
| HMT: Hierarchical Memory Transformer for Long Context Language Processing | May 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation | May 9, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask | May 9, 2024 | Anomaly DetectionImputation | CodeCode Available | 2 |
| CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | May 9, 2024 | Image CaptioningInstruction Following | CodeCode Available | 2 |
| FloorSet -- a VLSI Floorplanning Dataset with Design Constraints of Real-World SoCs | May 9, 2024 | Combinatorial Optimization | CodeCode Available | 2 |
| LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild | May 9, 2024 | Depression DetectionNavigate | CodeCode Available | 2 |
| Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning | May 9, 2024 | parameter-efficient fine-tuningVisual Prompting | CodeCode Available | 2 |
| Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference | May 9, 2024 | | CodeCode Available | 2 |
| Outlier-robust Kalman Filtering through Generalised Bayes | May 9, 2024 | Bayesian InferenceComputational Efficiency | CodeCode Available | 2 |
| HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution | May 8, 2024 | Image Super-Resolution | CodeCode Available | 2 |
| Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models | May 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID | May 8, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |