| Task | Area | Papers | Results |
|---|---|---|---|
| Video Summarization Video Summarization aims to generate a short synopsis that s… | Multimodal & Vision-Language | 280 | 18 |
| Medical Image Registration Image registration, also known as image fusion or image matc… | Medical & Scientific | 198 | 18 |
| Sentence Ordering Sentence ordering task deals with finding the correct order … | Language & Reasoning | 50 | 18 |
| Crop Classification | Foundations & Efficiency | 48 | 18 |
| inverse tone mapping For stack based inverse tone mapping | Generative Models | 35 | 18 |
| Music Modeling ( Image credit: [R-Transformer](https://arxiv.org/pdf/1907.0… | Audio & Speech | 34 | 18 |
| Natural Language Visual Grounding | Multimodal & Vision-Language | 32 | 18 |
| Image Matching | Computer Vision | 18 | 18 |
| Human Instance Segmentation Instance segmentation is the task of detecting and delineati… | Computer Vision | 16 | 18 |
| Video-based Generative Performance Benchmarking (Correctness of Information) The benchmark evaluates a generative Video Conversational Mo… | Generative Models | 15 | 18 |
| Arabic Text Diacritization Addition of diacritics for undiacritized arabic texts for wo… | Language & Reasoning | 13 | 18 |
| Human action generation Yan et al. (2019) CSGN: "When the dancer is stepping, jumpin… | Generative Models | 13 | 18 |
| Colorectal Gland Segmentation: | Medical & Scientific | 9 | 18 |
| Meter Reading | Computer Vision | 9 | 18 |
| Text-based Person Retrieval with Noisy Correspondence This is a benchmark about text-based person retrieval with n… | Multimodal & Vision-Language | 6 | 18 |
| Atomic number classification Predict the atomic number of a node in a molecular/material/… | Foundations & Efficiency | 1 | 18 |
| Contrastive Learning Contrastive Learning is a deep learning technique for unsupe… | Foundations & Efficiency | 6,661 | 17 |
| Information Retrieval Information retrieval is the task of ranking a list of docum… | Recommendation & Retrieval | 4,740 | 17 |
| Image Restoration Image Restoration is a family of inverse problems for obtain… | Generative Models | 1,459 | 17 |
| Community Detection Community Detection is one of the fundamental problems in ne… | Graphs & Structured Data | 919 | 17 |
| Knowledge Graph Completion Knowledge graphs $G$ are represented as a collection of trip… | Graphs & Structured Data | 482 | 17 |
| Scene Generation make to t shirt an Ad with a little bit of action | Generative Models | 309 | 17 |
| Change Point Detection Change Point Detection is concerned with the accurate detect… | Computer Vision | 285 | 17 |
| Keyphrase Extraction A classic task to extract salient phrases that best summariz… | Language & Reasoning | 153 | 17 |
| Zero Shot Segmentation | Computer Vision | 134 | 17 |
| Point Cloud Generation | Generative Models | 117 | 17 |
| Speech-to-Speech Translation Speech-to-speech translation (S2ST) consists on translating … | Audio & Speech | 117 | 17 |
| Sketch-Based Image Retrieval | Multimodal & Vision-Language | 110 | 17 |
| Multimodal Machine Translation Multimodal machine translation is the task of doing machine … | Multimodal & Vision-Language | 108 | 17 |
| Stereo Depth Estimation | Computer Vision | 97 | 17 |
| Single-Source Domain Generalization In this task a model is trained in a single source domain an… | Computer Vision | 48 | 17 |
| Nested Mention Recognition Nested mention recognition is the task of correctly modeling… | Computer Vision | 11 | 17 |
| Fine-Grained Urban Flow Inference Fine-grained urban flow inference (FUFI) aims to infer the f… | Time Series & Forecasting | 5 | 17 |
| Conversational Web Navigation The problem of conversational web navigation is described as… | Reinforcement Learning & Robotics | 3 | 17 |
| Multi-Task Learning Multi-task learning aims to learn multiple different tasks s… | Foundations & Efficiency | 3,687 | 16 |
| Human Activity Recognition Classify various human activities | Computer Vision | 744 | 16 |
| Sentence Classification | Language & Reasoning | 303 | 16 |
| Pose Tracking Pose Tracking is the task of estimating multi-person human p… | Computer Vision | 191 | 16 |
| Mortality Prediction ( Image credit: [Early hospital mortality prediction using v… | Medical & Scientific | 189 | 16 |
| Physical Simulations | Medical & Scientific | 100 | 16 |
| Anomaly Classification Anomaly Classification is the task of identifying and catego… | Foundations & Efficiency | 72 | 16 |
| Surgical phase recognition The first 40 videos are used for training, the last 40 video… | Medical & Scientific | 69 | 16 |
| Temporal Sentence Grounding Temporal sentence grounding (TSG) aims to locate a specific … | Time Series & Forecasting | 43 | 16 |
| Distance regression Prediction of the distance between connected nodes in molecu… | Foundations & Efficiency | 19 | 16 |
| Graph Ranking | Graphs & Structured Data | 18 | 16 |
| Breast Tumour Classification | Medical & Scientific | 13 | 16 |
| Generalized Few-Shot Learning | Medical & Scientific | 13 | 16 |
| Factual Inconsistency Detection in Chart Captioning Detect factual inconsistency between charts and captions. | Computer Vision | 4 | 16 |
| regression | Foundations & Efficiency | 9,424 | 15 |
| Denoising Denoising is a task in image processing and computer vision … | Generative Models | 7,282 | 15 |