| Task | Area | Papers | Results |
|---|---|---|---|
| Video Question Answering | Multimodal & Vision-Language | 460 | 254 |
| Recommendation Systems ### Recommendation System in AI Research A Recommendation Sy… | Recommendation & Retrieval | 6,047 | 252 |
| Video Super-Resolution Video Super-Resolution is a computer vision task that aims t… | Generative Models | 281 | 247 |
| Math Word Problem Solving A math word problem is a mathematical exercise (such as in a… | Language & Reasoning | 107 | 245 |
| Code Generation Code Generation is an important field to predict explicit co… | Language & Reasoning | 1,697 | 241 |
| Graph Regression The regression task is similar to graph classification but u… | Graphs & Structured Data | 145 | 240 |
| Pose Estimation Pose Estimation is a computer vision task where the goal is … | Computer Vision | 4,228 | 236 |
| Node Property Prediction | Graphs & Structured Data | 54 | 235 |
| Text Summarization Text Summarization is a natural language processing (NLP) ta… | Language & Reasoning | 1,340 | 233 |
| Time Series Classification Time Series Classification is a general task that can be use… | Time Series & Forecasting | 697 | 231 |
| Multi-Object Tracking Multi-Object Tracking is a task in computer vision that invo… | Computer Vision | 671 | 227 |
| Molecular Property Prediction Molecular property prediction is the task of predicting the … | Medical & Scientific | 354 | 224 |
| 3D Point Cloud Classification | Computer Vision | 202 | 216 |
| Panoptic Segmentation Panoptic Segmentation is a computer vision task that combine… | Computer Vision | 462 | 208 |
| Audio Classification Audio Classification is a machine learning task that involve… | Audio & Speech | 361 | 202 |
| Deblurring Deblurring is a computer vision task that involves removing … | Generative Models | 999 | 200 |
| Image-to-Image Translation Image-to-Image Translation is a task in computer vision and … | Multimodal & Vision-Language | 1,184 | 196 |
| Visual Reasoning Ability to understand actions and reasoning associated with … | Multimodal & Vision-Language | 698 | 192 |
| Traffic Prediction Traffic Prediction is a task that involves forecasting traff… | Time Series & Forecasting | 375 | 192 |
| Prompt Engineering Prompt engineering is the process of designing and refining … | Language & Reasoning | 1,236 | 191 |
| Point Cloud Registration Point Cloud Registration is a fundamental problem in 3D comp… | Computer Vision | 447 | 190 |
| Scene Text Recognition See [Scene Text Detection](https://paperswithcode.com/task/s… | Computer Vision | 269 | 190 |
| Video Quality Assessment Video Quality Assessment is a computer vision task aiming to… | Computer Vision | 216 | 186 |
| Word Sense Disambiguation The task of Word Sense Disambiguation (WSD) consists of asso… | Language & Reasoning | 1,035 | 181 |
| Vision and Language Navigation | Multimodal & Vision-Language | 223 | 169 |
| Arithmetic Reasoning | Language & Reasoning | 175 | 169 |
| Facial Expression Recognition (FER) Facial Expression Recognition (FER) is a computer vision tas… | Computer Vision | 492 | 167 |
| Text-to-Image Generation The development of the brain's blood supply in an embryo inv… | Multimodal & Vision-Language | 1,085 | 160 |
| Lane Detection Lane Detection is a computer vision task that involves ident… | Reinforcement Learning & Robotics | 251 | 160 |
| Link Property Prediction | Foundations & Efficiency | 13 | 160 |
| RGB Salient Object Detection RGB Salient object detection is a task-based on a visual att… | Computer Vision | 222 | 159 |
| Video Frame Interpolation The goal of Video Frame Interpolation is to synthesize sever… | Generative Models | 204 | 157 |
| Coreference Resolution | Language & Reasoning | 880 | 154 |
| Change Detection Change Detection is a computer vision task that involves det… | Computer Vision | 919 | 146 |
| Scene Text Detection Scene Text Detection is a computer vision task that involves… | Computer Vision | 213 | 146 |
| 3D Semantic Segmentation 3D Semantic Segmentation is a computer vision task that invo… | Computer Vision | 348 | 145 |
| 3D Multi-Object Tracking Image: [Weng et al](https://arxiv.org/pdf/1907.03961v4.pdf) | Computer Vision | 101 | 140 |
| Face Detection Face Detection is a computer vision task that involves autom… | Computer Vision | 536 | 139 |
| Aspect-Based Sentiment Analysis (ABSA) Aspect-Based Sentiment Analysis (ABSA) is a Natural Language… | Language & Reasoning | 469 | 139 |
| Classification Classification is the task of categorizing a set of data int… | Foundations & Efficiency | 12,815 | 134 |
| Continual Learning Continual Learning (also known as Incremental Learning, Life… | Foundations & Efficiency | 2,644 | 133 |
| Optical Flow Estimation Optical Flow Estimation is a computer vision task that invol… | Computer Vision | 2,184 | 133 |
| Entity Linking Assigning a unique identity to entities (such as famous indi… | Language & Reasoning | 735 | 133 |
| Video Instance Segmentation The goal of video instance segmentation is simultaneous dete… | Computer Vision | 148 | 133 |
| Trajectory Prediction Trajectory Prediction is the problem of predicting the short… | Foundations & Efficiency | 1,004 | 132 |
| Video Generation ( Various Video Generation Tasks. Gif credit: [MaGViT](https… | Generative Models | 1,466 | 131 |
| Face Verification Face Verification is a machine learning task in computer vis… | Computer Vision | 360 | 130 |
| Speech Separation The task of extracting all overlapping speech sources in a g… | Audio & Speech | 359 | 129 |
| Graph Property Prediction | Graphs & Structured Data | 56 | 128 |
| Keyword Spotting In speech processing, keyword spotting deals with the identi… | Audio & Speech | 407 | 127 |