| Towards Grand Unification of Object Tracking | Jul 14, 2022 | Multi-Object TrackingMulti-Object Tracking and Segmentation | CodeCode Available | 2 |
| Scene Text Recognition with Permuted Autoregressive Sequence Models | Jul 14, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Language Modelling with Pixels | Jul 14, 2022 | Language ModellingNamed Entity Recognition | CodeCode Available | 2 |
| Relighting4D: Neural Relightable Human from Videos | Jul 14, 2022 | | CodeCode Available | 2 |
| EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations | Jul 14, 2022 | Image-to-Image TranslationTranslation | CodeCode Available | 2 |
| Structure PLP-SLAM: Efficient Sparse Mapping and Localization using Point, Line and Plane for Monocular, RGB-D and Stereo Cameras | Jul 13, 2022 | Camera LocalizationPose Tracking | CodeCode Available | 2 |
| DocPrompting: Generating Code by Retrieving the Docs | Jul 13, 2022 | Code Generation | CodeCode Available | 2 |
| Open High-Resolution Satellite Imagery: The WorldStrat Dataset -- With Application to Super-Resolution | Jul 13, 2022 | HumanitarianMulti-Frame Super-Resolution | CodeCode Available | 2 |
| Learning Deep Time-index Models for Time Series Forecasting | Jul 13, 2022 | Inductive BiasMeta-Learning | CodeCode Available | 2 |
| Bootstrap Latent Representations for Multi-modal Recommendation | Jul 13, 2022 | Multi-modal Recommendation | CodeCode Available | 2 |
| PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images | Jul 13, 2022 | 3D human pose and shape estimation3D Human Pose Estimation | CodeCode Available | 2 |
| Wayformer: Motion Forecasting via Simple & Efficient Attention Networks | Jul 12, 2022 | Autonomous DrivingDecoder | CodeCode Available | 2 |
| Earthformer: Exploring Space-Time Transformers for Earth System Forecasting | Jul 12, 2022 | Earth ObservationEarth Surface Forecasting | CodeCode Available | 2 |
| Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios | Jul 12, 2022 | Image Classification | CodeCode Available | 2 |
| Accelerating Certifiable Estimation with Preconditioned Eigensolvers | Jul 12, 2022 | | CodeCode Available | 2 |
| Collaborative Neural Rendering using Anime Character Sheets | Jul 12, 2022 | Image GenerationImage to 3D | CodeCode Available | 2 |
| Audio-Visual Segmentation | Jul 11, 2022 | Segmentation | CodeCode Available | 2 |
| Fourier Neural Operator with Learned Deformations for PDEs on General Geometries | Jul 11, 2022 | valid | CodeCode Available | 2 |
| Dual Vision Transformer | Jul 11, 2022 | | CodeCode Available | 2 |
| Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics | Jul 11, 2022 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning | Jul 11, 2022 | Image ClassificationInstance Segmentation | CodeCode Available | 2 |
| CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer | Jul 11, 2022 | Image-to-Image TranslationStyle Transfer | CodeCode Available | 2 |
| No Language Left Behind: Scaling Human-Centered Machine Translation | Jul 11, 2022 | Machine TranslationMixture-of-Experts | CodeCode Available | 2 |
| PSP-HDRI+: A Synthetic Dataset Generator for Pre-Training of Human-Centric Computer Vision Models | Jul 11, 2022 | Keypoint Estimation | CodeCode Available | 2 |
| 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds | Jul 10, 2022 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 2 |
| LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action | Jul 10, 2022 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| SFNet: Faster, Accurate, and Domain Agnostic Semantic Segmentation via Semantic Flow | Jul 10, 2022 | Real-Time Semantic SegmentationSemantic Segmentation | CodeCode Available | 2 |
| DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer | Jul 10, 2022 | FormInductive Bias | CodeCode Available | 2 |
| Improving Entity Disambiguation by Reasoning over a Knowledge Base | Jul 8, 2022 | Entity DisambiguationEntity Linking | CodeCode Available | 2 |
| ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking | Jul 8, 2022 | Entity DisambiguationEntity Linking | CodeCode Available | 2 |
| Accelerating Material Design with the Generative Toolkit for Scientific Discovery | Jul 8, 2022 | Drug DiscoveryMaterials Screening | CodeCode Available | 2 |
| HierarchicalForecast: A Reference Framework for Hierarchical Forecasting in Python | Jul 7, 2022 | BIG-bench Machine LearningDecision Making | CodeCode Available | 2 |
| More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity | Jul 7, 2022 | Object DetectionSemantic Segmentation | CodeCode Available | 2 |
| VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning | Jul 7, 2022 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection | Jul 7, 2022 | ObjectOpen Vocabulary Attribute Detection | CodeCode Available | 2 |
| Few-Shot Scene Classification of Optical Remote Sensing Images Leveraging Calibrated Pretext Tasks | Jul 6, 2022 | Contrastive LearningFew-Shot Learning | CodeCode Available | 2 |
| DCT-Net: Domain-Calibrated Translation for Portrait Stylization | Jul 6, 2022 | Few-Shot LearningStyle Transfer | CodeCode Available | 2 |
| FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling | Jul 6, 2022 | Video Quality Assessment | CodeCode Available | 2 |
| Softmax-free Linear Transformers | Jul 5, 2022 | Computational Efficiency | CodeCode Available | 2 |
| Probability density estimation for sets of large graphs with respect to spectral information using stochastic block models | Jul 5, 2022 | Density Estimation | CodeCode Available | 2 |
| CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning | Jul 5, 2022 | Code GenerationDecoder | CodeCode Available | 2 |
| CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers | Jul 5, 2022 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation | Jul 5, 2022 | Autonomous DrivingCollision Avoidance | CodeCode Available | 2 |
| Neural Networks and the Chomsky Hierarchy | Jul 5, 2022 | | CodeCode Available | 2 |
| Egocentric Video-Language Pretraining @ Ego4D Challenge 2022 | Jul 4, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks | Jul 4, 2022 | | CodeCode Available | 2 |
| Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | Jul 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding | Jul 4, 2022 | BenchmarkingDocument Ranking | CodeCode Available | 2 |
| Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022 | Jul 4, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Harmonizer: Learning to Perform White-Box Image and Video Harmonization | Jul 4, 2022 | Image HarmonizationVideo Harmonization | CodeCode Available | 2 |