| Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting | Jan 24, 2023 | Optical Flow EstimationVideo Inpainting | CodeCode Available | 2 |
| Encrypted Vector Similarity Computations Using Partially Homomorphic Encryption: Applications and Performance Analysis | Mar 7, 2025 | Image RetrievalPrivacy Preserving | CodeCode Available | 2 |
| Scientific QA System with Verifiable Answers | Jul 16, 2024 | ArticlesInformation Retrieval | CodeCode Available | 2 |
| LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation | Apr 30, 2024 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records | May 4, 2024 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| Oscillatory State-Space Models | Oct 4, 2024 | MambaState Space Models | CodeCode Available | 2 |
| RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback | Feb 6, 2024 | reinforcement-learningReinforcement Learning (RL) | CodeCode Available | 2 |
| Locality Alignment Improves Vision-Language Models | Oct 14, 2024 | Semantic SegmentationSpatial Reasoning | CodeCode Available | 2 |
| DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer | Jun 13, 2024 | Face Image QualityFace Image Quality Assessment | CodeCode Available | 2 |
| Mr. DETR: Instructive Multi-Route Training for Detection Transformers | Dec 13, 2024 | DecoderObject Detection | CodeCode Available | 2 |
| Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series Classification | Aug 13, 2022 | Contrastive LearningData Augmentation | CodeCode Available | 2 |
| InstructDiffusion: A Generalist Modeling Interface for Vision Tasks | Sep 7, 2023 | Keypoint Detection | CodeCode Available | 2 |
| QAMPARI: An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs | May 25, 2022 | Answer GenerationNatural Questions | CodeCode Available | 2 |
| Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | Feb 11, 2021 | Cross-Modal RetrievalFine-Grained Image Classification | CodeCode Available | 2 |
| Data augmentation and multimodal learning for predicting drug response in patient-derived xenografts from gene expressions and histology images | Apr 25, 2022 | Data AugmentationDrug Response Prediction | CodeCode Available | 2 |
| Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation | Feb 23, 2022 | Efficient ExplorationNavigate | CodeCode Available | 2 |
| Is ChatGPT a General-Purpose Natural Language Processing Task Solver? | Feb 8, 2023 | Arithmetic ReasoningZero-Shot Learning | CodeCode Available | 2 |
| The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation Project | May 26, 2025 | | CodeCode Available | 2 |
| M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus | Dec 29, 2022 | Music TranscriptionSinging Voice Synthesis | CodeCode Available | 2 |
| SVDiff: Compact Parameter Space for Diffusion Fine-Tuning | Mar 20, 2023 | Data AugmentationDiffusion Personalization | CodeCode Available | 2 |
| WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation | Mar 26, 2023 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 2 |
| https://arxiv.org/pdf/2409.07491 | Sep 13, 2024 | Brain Computer InterfaceEEG | CodeCode Available | 2 |
| Neuroevolution of Self-Interpretable Agents | Mar 18, 2020 | Reinforcement LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking | Jan 12, 2024 | 4D reconstructionDenoising | CodeCode Available | 2 |
| SoccerTrack: A Dataset and Tracking Algorithm for Soccer With Fish-Eye and Drone Videos | Jun 20, 2022 | 4k8k | CodeCode Available | 2 |
| MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning | May 14, 2025 | Anomaly DetectionAnomaly Segmentation | CodeCode Available | 2 |
| MAS-Zero: Designing Multi-Agent Systems with Zero Supervision | May 26, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| Video Diffusion Models | Apr 7, 2022 | Unconditional Video GenerationVideo Generation | CodeCode Available | 2 |
| Synthetic QA Corpora Generation with Roundtrip Consistency | Jun 12, 2019 | Question AnsweringQuestion Generation | CodeCode Available | 2 |
| Boosting Knowledge Graph Generation from Tabular Data with RML Views | May 22, 2023 | Data IntegrationGraph Generation | CodeCode Available | 2 |
| StyleDrop: Text-to-Image Generation in Any Style | Jun 1, 2023 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents | Oct 18, 2023 | | CodeCode Available | 2 |
| WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation | Mar 4, 2025 | Hallucination | CodeCode Available | 2 |
| Human Performance Modeling and Rendering via Neural Animated Mesh | Sep 18, 2022 | | CodeCode Available | 2 |
| Vision Transformers for Single Image Dehazing | Apr 8, 2022 | Image DehazingSingle Image Dehazing | CodeCode Available | 2 |
| EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion | May 2, 2024 | 3D Object RetrievalDenoising | CodeCode Available | 2 |
| A Simple Baseline for Efficient Hand Mesh Reconstruction | Mar 4, 2024 | 3D Hand Pose EstimationComputational Efficiency | CodeCode Available | 2 |
| The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning | Dec 4, 2023 | In-Context Learning | CodeCode Available | 2 |
| Combining Hough Transform and Deep Learning Approaches to Reconstruct ECG Signals From Printouts | Oct 18, 2024 | ECG Digitization | CodeCode Available | 2 |
| Behavior Sequence Transformer for E-commerce Recommendation in Alibaba | May 15, 2019 | Click-Through Rate PredictionRecommendation Systems | CodeCode Available | 2 |
| Revitalizing Convolutional Network for Image Restoration | Jun 25, 2024 | DeblurringImage Deblurring | CodeCode Available | 2 |
| DTrOCR: Decoder-only Transformer for Optical Character Recognition | Aug 30, 2023 | DecoderHandwritten Text Recognition | CodeCode Available | 2 |
| Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture | Oct 18, 2023 | 4kimage-classification | CodeCode Available | 2 |
| High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity | Oct 14, 2024 | DenoisingDichotomous Image Segmentation | CodeCode Available | 2 |
| Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative Filtering | Feb 18, 2024 | Collaborative FilteringContrastive Learning | CodeCode Available | 2 |
| Dual-domain strip attention for image restoration | Mar 1, 2024 | DeblurringDenoising | CodeCode Available | 2 |
| Robot Cooking with Stir-fry: Bimanual Non-prehensile Manipulation of Semi-fluid Objects | May 12, 2022 | Deformable Object Manipulation | CodeCode Available | 2 |
| Avoiding Shortcuts: Enhancing Channel-Robust Specific Emitter Identification via Single-Source Domain Generalization | Jan 20, 2025 | Contrastive LearningDomain Generalization | CodeCode Available | 2 |
| FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing | Oct 9, 2023 | Optical Flow EstimationText-to-Video Editing | CodeCode Available | 2 |
| Scaling Laws for Precision | Nov 7, 2024 | Quantization | CodeCode Available | 2 |