| PG-Video-LLaVA: Pixel Grounding Large Video-Language Models | Nov 22, 2023 | BenchmarkingPhrase Grounding | CodeCode Available | 2 |
| Exponentially Faster Language Modelling | Nov 15, 2023 | BenchmarkingCPU | CodeCode Available | 2 |
| What's In My Big Data? | Oct 31, 2023 | Benchmarking | CodeCode Available | 2 |
| Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks | Oct 30, 2023 | Benchmarkingobject-detection | CodeCode Available | 2 |
| Formalizing and Benchmarking Prompt Injection Attacks and Defenses | Oct 19, 2023 | Benchmarking | CodeCode Available | 2 |
| Octopus: Embodied Vision-Language Programmer from Environmental Feedback | Oct 12, 2023 | BenchmarkingCode Generation | CodeCode Available | 2 |
| ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons | Oct 11, 2023 | BenchmarkingPosition | CodeCode Available | 2 |
| MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation | Oct 5, 2023 | BenchmarkingDecision Making | CodeCode Available | 2 |
| RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models | Oct 1, 2023 | Benchmarking | CodeCode Available | 2 |
| GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond | Sep 28, 2023 | Benchmarking | CodeCode Available | 2 |