| Task | Area | Papers | Results |
|---|---|---|---|
| Image Classification Image Classification is a fundamental task in vision recogni… | Computer Vision | 10,419 | 2,912 |
| Atari Games The Atari 2600 Games task (and dataset) involves training an… | Reinforcement Learning & Robotics | 625 | 2,519 |
| Semantic Segmentation | Computer Vision | 14,763 | 1,920 |
| Node Classification Node Classification is a machine learning task in graph-base… | Graphs & Structured Data | 1,860 | 1,793 |
| Question Answering Question answering can be segmented into domain-specific tas… | Language & Reasoning | 10,817 | 1,784 |
| Object Detection | Computer Vision | 10,957 | 981 |
| Few-Shot Image Classification Few-Shot Image Classification is a computer vision task that… | Computer Vision | 353 | 913 |
| Image Generation Image Generation (synthesis) is the task of generating new i… | Generative Models | 6,689 | 871 |
| 3D Object Detection 3D Object Detection is a task in computer vision where the g… | Computer Vision | 1,576 | 809 |
| Graph Classification Graph Classification is a task that involves classifying a g… | Graphs & Structured Data | 927 | 809 |
| Image Super-Resolution Image Super-Resolution is a machine learning task where the … | Generative Models | 1,589 | 748 |
| Visual Question Answering (VQA) Visual Question Answering (VQA) is a task in computer vision… | Multimodal & Vision-Language | 2,167 | 727 |
| Anomaly Detection Anomaly Detection is a binary classification identifying unu… | Time Series & Forecasting | 4,856 | 669 |
| Action Recognition Action Recognition is a computer vision task that involves r… | Computer Vision | 2,759 | 650 |
| Domain Generalization The idea of Domain Generalization is to learn from one or mu… | Computer Vision | 1,751 | 570 |
| Time Series Forecasting Time Series Forecasting is the task of fitting a model to hi… | Time Series & Forecasting | 1,609 | 538 |
| Person Re-Identification Person Re-Identification is a computer vision task in which … | Computer Vision | 1,488 | 533 |
| Natural Language Inference Natural language inference (NLI) is the task of determining … | Language & Reasoning | 1,961 | 505 |
| Link Prediction Link Prediction is a task in graph and network analysis wher… | Graphs & Structured Data | 1,949 | 501 |
| Language Modelling A language model is a model of natural language. Language mo… | Language & Reasoning | 17,610 | 467 |
| Few-Shot Semantic Segmentation Few-shot semantic segmentation (FSS) learns to segment targe… | Computer Vision | 168 | 458 |
| Semi-Supervised Image Classification Semi-supervised image classification leverages unlabelled da… | Computer Vision | 167 | 456 |
| 3D Human Pose Estimation 3D Human Pose Estimation is a computer vision task that invo… | Computer Vision | 665 | 454 |
| Named Entity Recognition (NER) Named Entity Recognition (NER) is a task of Natural Language… | Language & Reasoning | 2,874 | 439 |
| Machine Translation Machine translation is the task of translating a sentence in… | Language & Reasoning | 10,752 | 438 |
| Neural Architecture Search Neural architecture search (NAS) is a technique for automati… | Foundations & Efficiency | 1,915 | 424 |
| Image Captioning Image Captioning is the task of describing the content of an… | Multimodal & Vision-Language | 1,878 | 422 |
| Long-tail Learning Long-tailed learning, one of the most challenging problems i… | Foundations & Efficiency | 131 | 421 |
| Speech Recognition Speech Recognition is the task of converting spoken language… | Audio & Speech | 6,433 | 398 |
| Common Sense Reasoning Common sense reasoning tasks are intended to require the mod… | Language & Reasoning | 939 | 397 |
| Unsupervised Domain Adaptation Unsupervised Domain Adaptation is a learning framework to tr… | Foundations & Efficiency | 1,951 | 393 |
| Domain Adaptation Domain Adaptation is the task of adapting models across doma… | Foundations & Efficiency | 6,439 | 391 |
| Sentiment Analysis Sentiment Analysis is the task of classifying the polarity o… | Language & Reasoning | 5,630 | 382 |
| Semi-Supervised Video Object Segmentation The semi-supervised scenario assumes the user inputs a full … | Computer Vision | 147 | 380 |
| Relation Extraction Relation Extraction is the task of predicting attributes and… | Language & Reasoning | 1,977 | 377 |
| Fine-Grained Image Classification Fine-Grained Image Classification is a task in computer visi… | Computer Vision | 353 | 377 |
| Image Retrieval Image Retrieval is a fundamental and long-standing computer … | Multimodal & Vision-Language | 2,239 | 372 |
| Text Classification Text Classification is the task of assigning a sentence or d… | Language & Reasoning | 3,635 | 341 |
| Instance Segmentation Instance Segmentation is a computer vision task that involve… | Computer Vision | 2,262 | 340 |
| Visual Question Answering MLLM Leaderboard | Multimodal & Vision-Language | 2,177 | 334 |
| Medical Image Segmentation Medical Image Segmentation is a computer vision task that in… | Medical & Scientific | 2,089 | 333 |
| Image Clustering Models that partition the dataset into semantically meaningf… | Computer Vision | 236 | 329 |
| Referring Expression Segmentation The task aims at labeling the pixels of an image or video th… | Multimodal & Vision-Language | 145 | 317 |
| Video Retrieval The objective of video retrieval is as follows: given a text… | Multimodal & Vision-Language | 486 | 309 |
| Multi-Label Classification multilabel graph classification with highest result | Foundations & Efficiency | 1,198 | 302 |
| Motion Forecasting Motion forecasting is the task of predicting the location of… | Time Series & Forecasting | 205 | 299 |
| Visual Object Tracking Visual Object Tracking is an important research topic in com… | Computer Vision | 341 | 289 |
| Semantic Textual Similarity Semantic textual similarity deals with determining how simil… | Language & Reasoning | 2,381 | 280 |
| Out-of-Distribution Detection Detect out-of-distribution or anomalous examples. | Foundations & Efficiency | 888 | 269 |
| Visual Place Recognition Visual Place Recognition is the task of matching a view of a… | Multimodal & Vision-Language | 297 | 265 |