| Task | Area | Papers | Results |
|---|---|---|---|
| Edge Detection Edge Detection is a fundamental image processing technique w… | Computer Vision | 490 | 27 |
| 6D Pose Estimation Image: [Zeng et al](https://arxiv.org/pdf/1609.09475v3.pdf) | Computer Vision | 255 | 27 |
| Age And Gender Classification Age and gender classification is a dual-task of identifying … | Computer Vision | 34 | 27 |
| Social Media Popularity Prediction Social Media Popularity Prediction (SMPP) aims to predict th… | Foundations & Efficiency | 7 | 27 |
| Unsupervised Anomaly Detection with Specified Settings -- 1% anomaly | Time Series & Forecasting | 6 | 27 |
| Data Augmentation Data augmentation involves techniques used for increasing th… | Generative Models | 8,378 | 26 |
| Image Enhancement Image Enhancement is basically improving the interpretabilit… | Generative Models | 983 | 26 |
| Video Semantic Segmentation The goal of video semantic segmentation is to assign a prede… | Computer Vision | 895 | 26 |
| Gesture Recognition Gesture Recognition is an active field of research with appl… | Computer Vision | 572 | 26 |
| Audio Generation Audio generation (synthesis) is the task of generating raw a… | Audio & Speech | 270 | 26 |
| Video Reconstruction Source: [Deep-SloMo](https://github.com/avinashpaliwal/Deep-… | Generative Models | 145 | 26 |
| Multi-Label Image Classification The Multi-Label Image Classification focuses on predicting l… | Computer Vision | 124 | 26 |
| Line Segment Detection | Computer Vision | 37 | 26 |
| Emotion Interpretation | Language & Reasoning | 6 | 26 |
| Point Cloud Classification Point Cloud Classification is a task involving the classific… | Computer Vision | 265 | 25 |
| Sound Event Detection Sound Event Detection (SED) is the task of recognizing the s… | Audio & Speech | 194 | 25 |
| Entity Typing Entity Typing is an important task in text analysis. Assigni… | Language & Reasoning | 170 | 25 |
| Single-View 3D Reconstruction | Generative Models | 98 | 25 |
| Time Series Regression Predicting one or more scalars for an entire time series exa… | Time Series & Forecasting | 82 | 25 |
| Clustering Algorithms Evaluation | Foundations & Efficiency | 12 | 25 |
| Description-guided molecule generation The significance of description-based molecule generation li… | Medical & Scientific | 2 | 25 |
| Chunking Chunking, also known as shallow parsing, identifies continuo… | Language & Reasoning | 447 | 24 |
| Virtual Try-on Virtual try-on of clothing or other items such as glasses an… | Generative Models | 276 | 24 |
| Audio-Visual Speech Recognition Audio-visual speech recognition is the task of transcribing … | Multimodal & Vision-Language | 100 | 24 |
| Temporal Relation Extraction Temporal relation extraction systems aim to identify and cla… | Time Series & Forecasting | 88 | 24 |
| Visual Relationship Detection Visual relationship detection (VRD) is one newly developed c… | Computer Vision | 82 | 24 |
| Emotional Intelligence Emotional Intelligence (EI) is a measure of "The ability to … | Language & Reasoning | 77 | 24 |
| Dense Video Captioning Most natural videos contain numerous events. For example, in… | Multimodal & Vision-Language | 76 | 24 |
| Subjectivity Analysis A related task to sentiment analysis is the subjectivity ana… | Language & Reasoning | 63 | 24 |
| Meme Classification Meme classification refers to the task of classifying intern… | Language & Reasoning | 59 | 24 |
| Image Attribution Image attribution algorithms aim to identify important regio… | Computer Vision | 26 | 24 |
| Protein Secondary Structure Prediction Protein secondary structure prediction is a vital task in bi… | Medical & Scientific | 26 | 24 |
| 3D Point Cloud Linear Classification Training a linear classifier(e.g. SVM) on the embeddings/rep… | Computer Vision | 21 | 24 |
| Autonomous Driving Autonomous driving is the task of driving a vehicle without … | Reinforcement Learning & Robotics | 6,092 | 23 |
| Time Series Anomaly Detection | Time Series & Forecasting | 264 | 23 |
| Sign Language Translation Given a video containing sign language, the task is to predi… | Multimodal & Vision-Language | 153 | 23 |
| Stock Market Prediction | Time Series & Forecasting | 104 | 23 |
| Hypernym Discovery Given a corpus and a target term (hyponym), the task of hype… | Language & Reasoning | 33 | 23 |
| Human Interaction Recognition Human Interaction Recognition (HIR) is a field of study that… | Computer Vision | 22 | 23 |
| Video-based Generative Performance Benchmarking The benchmark evaluates a generative Video Conversational Mo… | Generative Models | 20 | 23 |
| Multi-tissue Nucleus Segmentation | Medical & Scientific | 14 | 23 |
| Semantic entity labeling - One of Form Understanding task (Word grouping, Semantic en… | Language & Reasoning | 14 | 23 |
| Unsupervised Panoptic Segmentation Unsupervised Panoptic Segmentation aims to partition an imag… | Computer Vision | 4 | 23 |
| Image Compression Image Compression is an application of data compression for … | Generative Models | 1,008 | 22 |
| Image Registration Image registration is the process of transforming different … | Computer Vision | 953 | 22 |
| Point Processes | Foundations & Efficiency | 541 | 22 |
| Gaze Estimation Gaze Estimation is a task to predict where a person is looki… | Computer Vision | 248 | 22 |
| Scene Flow Estimation Optical flow is a two-dimensional motion field in the image … | Computer Vision | 152 | 22 |
| Extractive Text Summarization Given a document, selecting a subset of the words or sentenc… | Language & Reasoning | 95 | 22 |
| Text-to-Music Generation | Audio & Speech | 37 | 22 |