| Task | Area | Papers | Results |
|---|---|---|---|
| Class Incremental Learning | Foundations & Efficiency | 634 | 15 |
| MRI Reconstruction In its most basic form, MRI reconstruction consists in retri… | Medical & Scientific | 441 | 15 |
| Saliency Detection Saliency Detection is a preprocessing step in computer visio… | Computer Vision | 364 | 15 |
| Scene Segmentation Scene segmentation is the task of splitting a scene into its… | Computer Vision | 283 | 15 |
| Speaker Identification | Audio & Speech | 248 | 15 |
| Explanation Generation | Language & Reasoning | 235 | 15 |
| COVID-19 Diagnosis Covid-19 Diagnosis is the task of diagnosing the presence of… | Medical & Scientific | 211 | 15 |
| Keyword Extraction Keyword extraction is tasked with the automatic identificati… | Language & Reasoning | 172 | 15 |
| Point Tracking Point Tracking, often referred to as Tracking any Point (TAP… | Computer Vision | 151 | 15 |
| Code Search The goal of Code Search is to retrieve code fragments from a… | Language & Reasoning | 125 | 15 |
| Talking Head Generation Talking head generation is the task of generating a talking … | Generative Models | 119 | 15 |
| Protein Function Prediction For GO terms prediction, given the specific function predict… | Medical & Scientific | 79 | 15 |
| Weakly-supervised instance segmentation | Computer Vision | 39 | 15 |
| Table-based Fact Verification Verifying facts given semi-structured data. | Language & Reasoning | 26 | 15 |
| Beat Tracking Determine the positions of all beats in a music recording. | Audio & Speech | 19 | 15 |
| Dense Pixel Correspondence Estimation | Computer Vision | 17 | 15 |
| Ad-hoc video search The Ad-hoc search task ended a 3 year cycle from 2016-2018 w… | Multimodal & Vision-Language | 13 | 15 |
| Image Retrieval with Multi-Modal Query The problem of retrieving images from a database based on a … | Multimodal & Vision-Language | 10 | 15 |
| GPS Embeddings GPS Embeddings is the collective name for a set of feature-l… | Foundations & Efficiency | 1 | 15 |
| Time Series Analysis Time Series Analysis is a statistical technique used to anal… | Time Series & Forecasting | 6,748 | 14 |
| Binary Classification | Foundations & Efficiency | 2,574 | 14 |
| Model Compression Model Compression is an actively pursued area of research ov… | Foundations & Efficiency | 1,356 | 14 |
| 3D Pose Estimation Image credit: [GSNet: Joint Vehicle Pose and Shape Reconstru… | Computer Vision | 379 | 14 |
| Task-Oriented Dialogue Systems Achieving a pre-defined task through a dialog. | Language & Reasoning | 308 | 14 |
| 3D Object Classification 3D Object Classification is the task of predicting the class… | Computer Vision | 93 | 14 |
| Multi-target Domain Adaptation The idea of Multi-target Domain Adaptation is to adapt a mod… | Foundations & Efficiency | 39 | 14 |
| Interpretability Techniques for Deep Learning | Foundations & Efficiency | 25 | 14 |
| Audio Super-Resolution Audio super-resolution, especially speech, refers to the pro… | Audio & Speech | 22 | 14 |
| 3D Multi-Person Pose Estimation (absolute) This task aims to solve absolute 3D multi-person pose Estima… | Computer Vision | 20 | 14 |
| Font Recognition Font recognition (also called visual font recognition or opt… | Computer Vision | 19 | 14 |
| Semi-Supervised Instance Segmentation | Computer Vision | 19 | 14 |
| MMR total Sum of all scores of the 11 distinct tasks involving texts, … | Foundations & Efficiency | 12 | 14 |
| Story Continuation The task involves providing an initial scene that can be obt… | Language & Reasoning | 10 | 14 |
| Image/Document Clustering | Multimodal & Vision-Language | 8 | 14 |
| Horizon Line Estimation | Computer Vision | 7 | 14 |
| Core set discovery A core set in machine learning is defined as the minimal set… | Foundations & Efficiency | 1 | 14 |
| Meta-Learning Meta-learning is a methodology considered with "learning to … | Foundations & Efficiency | 3,569 | 13 |
| Outlier Detection Outlier Detection is a task of identifying a subset of a giv… | Computer Vision | 703 | 13 |
| Event Extraction Determine the extent of the events in a text. Other names: E… | Computer Vision | 446 | 13 |
| Multimodal Reasoning Reasoning over multimodal inputs. | Multimodal & Vision-Language | 302 | 13 |
| Saliency Prediction A saliency map is a model that predicts eye fixations on a v… | Foundations & Efficiency | 268 | 13 |
| ECG Classification | Medical & Scientific | 116 | 13 |
| Aspect Extraction Aspect extraction is the task of identifying and extracting … | Computer Vision | 92 | 13 |
| 3D Action Recognition Image: [Rahmani et al](https://www.cv-foundation.org/openacc… | Computer Vision | 91 | 13 |
| Instrument Recognition | Computer Vision | 39 | 13 |
| Video deraining | Generative Models | 30 | 13 |
| Binary text classification | Language & Reasoning | 20 | 13 |
| Grounded Situation Recognition Grounded Situation Recognition aims to produce the structure… | Computer Vision | 15 | 13 |
| Situation Recognition Situation Recognition aims to produce the structured image s… | Computer Vision | 12 | 13 |
| Downbeat Tracking Determine the positions of all downbeats in a music recordin… | Audio & Speech | 11 | 13 |