| Advancing LLM Reasoning Generalists with Preference Trees | Apr 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 3 |
| AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework | Mar 19, 2024 | BenchmarkingFinancial Analysis | CodeCode Available | 3 |
| Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection | Mar 19, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 3 |
| Recurrent Drafter for Fast Speculative Decoding in Large Language Models | Mar 14, 2024 | BenchmarkingKnowledge Distillation | CodeCode Available | 3 |
| MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries | Jan 27, 2024 | BenchmarkingRAG | CodeCode Available | 3 |
| AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents | Jan 24, 2024 | Benchmarking | CodeCode Available | 3 |
| Benchmarking LLMs via Uncertainty Quantification | Jan 23, 2024 | BenchmarkingUncertainty Quantification | CodeCode Available | 3 |
| A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation | Jan 22, 2024 | BenchmarkingDiagnostic | CodeCode Available | 3 |
| SEED-Bench: Benchmarking Multimodal Large Language Models | Jan 1, 2024 | BenchmarkingImage Generation | CodeCode Available | 3 |
| AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One | Dec 10, 2023 | AllBenchmarking | CodeCode Available | 3 |