| Task | Area | Papers | Results |
|---|---|---|---|
| Click-Through Rate Prediction Click-through rate prediction is the task of predicting the … | Recommendation & Retrieval | 391 | 127 |
| 3D Object Tracking 3D Object Tracking is a computer vision task dedicated to mo… | Computer Vision | 67 | 127 |
| Video Prediction Script for Amee Marketing & Trading Company Short Video (Dur… | Generative Models | 394 | 126 |
| Zero-Shot Video Retrieval Zero-shot video retrieval is the task of retrieving relevant… | Multimodal & Vision-Language | 40 | 126 |
| Temporal Action Localization Temporal Action Localization aims to detect activities in th… | Time Series & Forecasting | 1,477 | 125 |
| Open Vocabulary Semantic Segmentation | Computer Vision | 113 | 124 |
| Motion Synthesis Creating a video where people in the images move (such as bl… | Generative Models | 282 | 123 |
| Metric Learning The goal of Metric Learning is to learn a representation fun… | Foundations & Efficiency | 1,648 | 122 |
| Speech Enhancement Speech Enhancement is a signal processing task that involves… | Audio & Speech | 982 | 122 |
| Data-to-Text Generation A classic problem in natural-language generation (NLG) invol… | Language & Reasoning | 219 | 122 |
| Action Segmentation Action Segmentation is a challenging problem in high-level v… | Computer Vision | 219 | 120 |
| Image Dehazing ( Image credit: [Densely Connected Pyramid Dehazing Network]… | Generative Models | 295 | 117 |
| Continuous Control Continuous control in the context of playing games, especial… | Reinforcement Learning & Robotics | 1,161 | 116 |
| Incremental Learning Incremental learning aims to develop artificially intelligen… | Foundations & Efficiency | 1,371 | 112 |
| Cross-Modal Retrieval Cross-Modal Retrieval (CMR) is a task of retrieving items ac… | Multimodal & Vision-Language | 522 | 111 |
| Few-Shot Object Detection Few-Shot Object Detection is a computer vision task that inv… | Computer Vision | 179 | 111 |
| Image Denoising Image Denoising is a computer vision task that involves remo… | Generative Models | 1,220 | 110 |
| Slot Filling The goal of Slot Filling is to identify from a running dialo… | Language & Reasoning | 458 | 110 |
| Semi-Supervised Object Detection Semi-supervised object detection uses both labeled data and … | Computer Vision | 115 | 110 |
| Open-Domain Question Answering Open-domain question answering is the task of question answe… | Language & Reasoning | 494 | 109 |
| Age Estimation Age Estimation is the task of estimating the age of a person… | Computer Vision | 254 | 109 |
| Open Information Extraction In natural language processing, open information extraction … | Language & Reasoning | 207 | 108 |
| Weakly Supervised Action Localization In this task, the training data consists of videos with a li… | Computer Vision | 55 | 107 |
| Visual Dialog Visual Dialog requires an AI agent to hold a meaningful dial… | Multimodal & Vision-Language | 118 | 106 |
| Image Manipulation Detection The task of detecting images or image parts that have been t… | Generative Models | 73 | 106 |
| Novel View Synthesis Synthesize a target image with an arbitrary target camera po… | Generative Models | 1,441 | 103 |
| Human-Object Interaction Detection Human-Object Interaction (HOI) detection is a task of identi… | Computer Vision | 449 | 103 |
| Zero-Shot Action Recognition | Computer Vision | 83 | 103 |
| 3D Hand Pose Estimation Image: [Zimmerman et l](https://arxiv.xsrg/pdf/1705.01389v3.… | Computer Vision | 178 | 102 |
| Sign Language Recognition Sign Language Recognition is a computer vision and natural l… | Multimodal & Vision-Language | 297 | 101 |
| SMAC Bechmarks for Efficient Exploration of Completion of Multi-s… | Reinforcement Learning & Robotics | 121 | 101 |
| Abstractive Text Summarization Abstractive Text Summarization is the task of generating a s… | Language & Reasoning | 846 | 99 |
| Entity Alignment Entity Alignment is the task of finding entities in two know… | Language & Reasoning | 190 | 97 |
| Unsupervised Semantic Segmentation Models that learn to segment each image (i.e. assign a class… | Computer Vision | 95 | 97 |
| Drug Discovery Drug discovery is the task of applying machine learning to d… | Medical & Scientific | 1,337 | 96 |
| Video Object Segmentation Video object segmentation is a binary labeling problem aimin… | Computer Vision | 551 | 96 |
| Zero-Shot Transfer Image Classification | Computer Vision | 19 | 95 |
| 3D Face Reconstruction 3D Face Reconstruction is a computer vision task that involv… | Generative Models | 211 | 94 |
| Multi-Person Pose Estimation Multi-person pose estimation is the task of estimating the p… | Computer Vision | 151 | 94 |
| Lipreading Lipreading is a process of extracting speech by watching lip… | Multimodal & Vision-Language | 103 | 94 |
| Pedestrian Detection Pedestrian detection is the task of detecting pedestrians fr… | Computer Vision | 438 | 92 |
| Reading Comprehension Most current question answering datasets frame the task as r… | Language & Reasoning | 1,760 | 91 |
| Knowledge Distillation Knowledge distillation is the process of transferring knowle… | Foundations & Efficiency | 4,240 | 90 |
| Conversational Response Selection Conversational response selection refers to the task of iden… | Language & Reasoning | 46 | 90 |
| Sentence Completion | Language & Reasoning | 91 | 89 |
| Face Recognition Facial Recognition is the task of making a positive identifi… | Computer Vision | 2,329 | 88 |
| Image Manipulation Localization The task of segmenting parts of images or image parts that h… | Generative Models | 31 | 87 |
| Video Captioning Video Captioning is a task of automatic captioning a video b… | Multimodal & Vision-Language | 473 | 86 |
| Text-To-SQL Text-to-SQL is a task in natural language processing (NLP) w… | Language & Reasoning | 424 | 86 |
| Zero-Shot Learning Zero-shot learning (ZSL) is a model's ability to detect clas… | Foundations & Efficiency | 1,864 | 84 |