| Task | Area | Papers | Results |
|---|---|---|---|
| Spoken Language Understanding | Language & Reasoning | 550 | 36 |
| Text-to-Video Generation Ma grand-mère m’a raconté que quand elle était étudiante, el… | Multimodal & Vision-Language | 201 | 36 |
| No-Reference Image Quality Assessment An Image Quality Assessment approach where no reference imag… | Computer Vision | 155 | 36 |
| Highlight Detection https://youtu.be/pJ0auP7dbcY?si=vSiZevfJ57YUKC2q | Computer Vision | 78 | 36 |
| Tabular Data Generation Generation of the tabular data using generative models | Generative Models | 73 | 36 |
| Image Reconstruction | Generative Models | 2,143 | 35 |
| Stochastic Optimization Stochastic Optimization is the task of optimizing certain ob… | Foundations & Efficiency | 1,387 | 35 |
| Network Pruning Network Pruning is a popular approach to reduce a heavy netw… | Foundations & Efficiency | 534 | 35 |
| Malware Classification Malware Classification is the process of assigning a malware… | Language & Reasoning | 146 | 35 |
| Single-step retrosynthesis | Medical & Scientific | 34 | 35 |
| Image Segmentation Image Segmentation is a computer vision task that involves d… | Computer Vision | 5,035 | 34 |
| Text Clustering Grouping a set of texts in such a way that objects in the sa… | Language & Reasoning | 123 | 34 |
| Referring expression generation Generate referring expressions | Multimodal & Vision-Language | 84 | 34 |
| 3D Anomaly Detection 3D-only Anomaly Detection. Structures out of normal distribu… | Time Series & Forecasting | 36 | 34 |
| Adversarial Robustness Adversarial Robustness evaluates the vulnerabilities of mach… | Foundations & Efficiency | 1,746 | 33 |
| Relation Classification Relation Classification is the task of identifying the seman… | Language & Reasoning | 445 | 33 |
| Visual Storytelling ( Image credit: [No Metrics Are Perfect](https://github.com/… | Multimodal & Vision-Language | 115 | 33 |
| Image-to-Text Retrieval Image-text retrieval is the process of retrieving relevant i… | Multimodal & Vision-Language | 59 | 33 |
| Graph Clustering Graph Clustering is the process of grouping the nodes of the… | Graphs & Structured Data | 393 | 32 |
| Depth Completion The Depth Completion task is a sub-problem of depth estimati… | Computer Vision | 242 | 32 |
| Reflection Removal Remove the spots from mirror and clear the picture | Generative Models | 81 | 32 |
| Shadow Detection | Computer Vision | 79 | 32 |
| Object Recognition Object recognition is a computer vision technique for detect… | Computer Vision | 2,042 | 31 |
| Speech Synthesis Speech synthesis is the task of generating speech from some … | Audio & Speech | 1,249 | 31 |
| Fake News Detection Fake News Detection is a natural language processing task th… | Language & Reasoning | 490 | 31 |
| Video Classification Video Classification is the task of producing a label that i… | Computer Vision | 455 | 31 |
| Visual Localization Visual Localization is the problem of estimating the camera … | Computer Vision | 402 | 31 |
| Scene Graph Generation A scene graph is a structured representation of an image, wh… | Graphs & Structured Data | 318 | 31 |
| Speech-to-Text Translation Translate audio signals of speech in one language into text … | Audio & Speech | 146 | 31 |
| Person Search Person Search is a task which aims at matching a specific pe… | Computer Vision | 139 | 31 |
| Music Transcription Music transcription is the task of converting an acoustic mu… | Audio & Speech | 96 | 31 |
| Collaborative Filtering | Recommendation & Retrieval | 1,309 | 30 |
| DeepFake Detection DeepFake Detection is the task of detecting fake videos or i… | Language & Reasoning | 580 | 30 |
| Document Layout Analysis "Document Layout Analysis is performed to determine physical… | Language & Reasoning | 99 | 30 |
| Phrase Grounding Given an image and a corresponding caption, the Phrase Groun… | Multimodal & Vision-Language | 88 | 30 |
| Pedestrian Attribute Recognition Pedestrian attribution recognition is the task of recognizin… | Computer Vision | 56 | 30 |
| Weather Forecasting Weather Forecasting is the prediction of future weather cond… | Time Series & Forecasting | 420 | 29 |
| Audio captioning Audio Captioning is the task of describing audio using text.… | Audio & Speech | 119 | 29 |
| Surface Normals Estimation Surface normal estimation deals with the task of predicting … | Computer Vision | 39 | 29 |
| Bird's-Eye View Semantic Segmentation | Reinforcement Learning & Robotics | 26 | 29 |
| Parking Space Occupancy Image credit: [https://github.com/martin-marek/parking-space… | Computer Vision | 5 | 29 |
| Multiple Instance Learning Multiple Instance Learning is a type of weakly supervised le… | Foundations & Efficiency | 744 | 28 |
| Adversarial Defense Competitions with currently unpublished results: - [TrojAI](… | Foundations & Efficiency | 403 | 28 |
| Stance Detection Stance detection is the extraction of a subject's reaction t… | Language & Reasoning | 343 | 28 |
| Code Completion | Language & Reasoning | 212 | 28 |
| Text to Audio Retrieval | Audio & Speech | 20 | 28 |
| Unsupervised Anomaly Detection with Specified Settings -- 0.1% anomaly | Time Series & Forecasting | 6 | 28 |
| Unsupervised Anomaly Detection with Specified Settings -- 10% anomaly | Time Series & Forecasting | 6 | 28 |
| Unsupervised Anomaly Detection with Specified Settings -- 20% anomaly | Time Series & Forecasting | 6 | 28 |
| Unsupervised Anomaly Detection with Specified Settings -- 30% anomaly | Time Series & Forecasting | 6 | 28 |