Long-Context Understanding
Papers
Showing 21–30 of 81 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GPT-4o | 1 Image, 4*4 Stitching, Exact Accuracy | 83 | — | Unverified |
| 2 | GPT-4V | 1 Image, 4*4 Stitching, Exact Accuracy | 54.72 | — | Unverified |
| 3 | Gemini Pro 1.5 | 1 Image, 4*4 Stitching, Exact Accuracy | 39.85 | — | Unverified |
| 4 | Gemini Pro 1.0 | 1 Image, 4*4 Stitching, Exact Accuracy | 24.78 | — | Unverified |
| 5 | LLaVA-Llama-3 | 1 Image, 4*4 Stitching, Exact Accuracy | 17.5 | — | Unverified |
| 6 | Claude 3 Opus | 1 Image, 4*4 Stitching, Exact Accuracy | 12.3 | — | Unverified |
| 7 | IDEFICS2-8B | 1 Image, 4*4 Stitching, Exact Accuracy | 7.8 | — | Unverified |
| 8 | InstructBLIP-Flan-T5-XXL | 1 Image, 4*4 Stitching, Exact Accuracy | 6.2 | — | Unverified |
| 9 | CogVLM2-Llama-3 | 1 Image, 4*4 Stitching, Exact Accuracy | 0.9 | — | Unverified |
| 10 | mPLUG-Owl-v2 | 1 Image, 4*4 Stitching, Exact Accuracy | 0.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GPT-4-Turbo-1106 | 1k | 74 | — | Unverified |
| 2 | GPT-4-Turbo-0125 | 1k | 73.5 | — | Unverified |
| 3 | Claude-2 | 1k | 65 | — | Unverified |
| 4 | GPT-3.5-Turbo-1106 | 1k | 61.5 | — | Unverified |
| 5 | InternLM2-7b | 1k | 58.6 | — | Unverified |
| 6 | Vicuna-13b-v1.5-16k | 1k | 53.4 | — | Unverified |
| 7 | ChatGLM3-6b-32k | 1k | 39.8 | — | Unverified |
| 8 | Vicuna-7b-v1.5-16k | 1k | 37 | — | Unverified |
| 9 | LongChat-7b-v1.5-32k | 1k | 32.4 | — | Unverified |
| 10 | ChatGLM2-6b-32k | 1k | 31.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GPT-4-Turbo-1106 | 2k | 18.5 | — | Unverified |
| 2 | GPT-4-Turbo-0125 | 2k | 15.5 | — | Unverified |
| 3 | Vicuna-13b-v1.5-16k | 2k | 5.4 | — | Unverified |
| 4 | LongChat-7b-v1.5-32k | 2k | 5.3 | — | Unverified |
| 5 | Vicuna-7b-v1.5-16k | 2k | 5.3 | — | Unverified |
| 6 | InternLM2-7b | 2k | 5.1 | — | Unverified |
| 7 | Claude-2 | 2k | 5 | — | Unverified |
| 8 | GPT-3.5-Turbo-1106 | 2k | 4 | — | Unverified |
| 9 | ChatGLM3-6b-32k | 2k | 2.3 | — | Unverified |
| 10 | ChatGLM2-6b-32k | 2k | 0.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GALI(Llama3-8b-ins-4k-to-16k) | Average Score | 59.21 | — | Unverified |
| 2 | GALI(Llama3-8b-ins-4k-to-32k) | Average Score | 59.1 | — | Unverified |
| 3 | GALI(Llama3-8b-ins-8k-to-32k) | Average Score | 42.79 | — | Unverified |
| 4 | GALI(Llama3-8b-ins-8k-to-16k) | Average Score | 42.32 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GALI(Llama3-8b-ins-4k-to-16k) | Average Score | 46.22 | — | Unverified |
| 2 | GALI(Llama3-8b-ins-8k-to-32k) | Average Score | 45.38 | — | Unverified |
| 3 | GALI(Llama3-8b-ins-8k-to-16k) | Average Score | 45.17 | — | Unverified |