Video Semantic Segmentation
The goal of video semantic segmentation is to assign a predefined class to each pixel in all frames of a video. This requires the model not only to predict accurate segmentation masks but also to ensure that these masks remain temporally consistent across frames. This task has broad applications in areas such as autonomous driving, medical video analysis, and AR/VR.
Papers
Showing 21–30 of 895 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | TMANet-50 | mIoU | 80.3 | — | Unverified |
| 2 | TDNet-50 [9] | mIoU | 79.9 | — | Unverified |
| 3 | DeltaDist-DDRNet-39 | mIoU | 79.9 | — | Unverified |
| 4 | PSPNet-101 [20] | mIoU | 79.7 | — | Unverified |
| 5 | PSPNet-50 [20] | mIoU | 78.1 | — | Unverified |
| 6 | LVS [12] | mIoU | 76.8 | — | Unverified |
| 7 | GRFP [15] | mIoU | 73.6 | — | Unverified |
| 8 | FCN-50 [14] | mIoU | 70.1 | — | Unverified |
| 9 | DFF [22] | mIoU | 69.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DVIS++(VIT-L) | mIoU | 63.8 | — | Unverified |
| 2 | UniVS(Swin-L) | mIoU | 59.8 | — | Unverified |
| 3 | Tube-Link(Swin-large) | mIoU | 59.6 | — | Unverified |
| 4 | MRCFA(MiT-B5) | mIoU | 49.9 | — | Unverified |
| 5 | CFFM(MiT-B5) | mIoU | 49.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | WaSR-T (ResNet-101) | Q | 60.1 | — | Unverified |
| 2 | TMANet (ResNet-50) | Q | 57.5 | — | Unverified |
| 3 | CSANet (ResNet-101) | Q | 49.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MVNet(DeepLabV3) | mIoU | 54.52 | — | Unverified |
| 2 | MVNet(PSPNet) | mIoU | 54.36 | — | Unverified |
| 3 | MVNet(FCN) | mIoU | 53.9 | — | Unverified |