| Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training | Jun 18, 2025 | MedQAMMLU | —Unverified | 0 | 0 |
| G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Oct 15, 2024 | HumanEvalLanguage Modelling | —Unverified | 0 | 0 |
| GEB-1.3B: Open Lightweight Large Language Model | Jun 14, 2024 | CPULanguage Modeling | —Unverified | 0 | 0 |
| GECKO: Generative Language Model for English, Code and Korean | May 24, 2024 | kmmluLanguage Modeling | —Unverified | 0 | 0 |
| GEM: Empowering LLM for both Embedding Generation and Language Understanding | Jun 4, 2025 | DecoderLarge Language Model | —Unverified | 0 | 0 |
| A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets | May 9, 2025 | MMLU | —Unverified | 0 | 0 |
| Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation | Dec 4, 2024 | MMLU | —Unverified | 0 | 0 |
| GRIN: GRadient-INformed MoE | Sep 18, 2024 | HellaSwagHumanEval | —Unverified | 0 | 0 |
| HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI | Jan 26, 2025 | MMLUMultiple-choice | —Unverified | 0 | 0 |
| Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models | Feb 19, 2024 | MMLU | —Unverified | 0 | 0 |
| Humanity's Last Exam | Jan 24, 2025 | Humanity's Last ExamLanguage Modeling | —Unverified | 0 | 0 |
| Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents | Dec 1, 2024 | Mathematical ReasoningMMLU | —Unverified | 0 | 0 |
| IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling | May 22, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Inference-Time-Compute: More Faithful? A Research Note | Jan 14, 2025 | AttributeMMLU | —Unverified | 0 | 0 |
| Instance-adaptive Zero-shot Chain-of-Thought Prompting | Sep 30, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Instruction Tuning with Human Curriculum | Oct 14, 2023 | ARCMMLU | —Unverified | 0 | 0 |
| Integrating External Tools with Large Language Models to Improve Accuracy | Jul 9, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 | 0 |
| Interleaved Reasoning for Large Language Models via Reinforcement Learning | May 26, 2025 | Logical ReasoningMath | —Unverified | 0 | 0 |
| Investigating Data Contamination in Modern Benchmarks for Large Language Models | Nov 16, 2023 | Common Sense ReasoningMMLU | —Unverified | 0 | 0 |
| Irreducible Curriculum for Language Model Pretraining | Oct 23, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs | Jan 21, 2025 | GSM8KIn-Context Learning | —Unverified | 0 | 0 |
| KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations | Mar 3, 2024 | MedQAMMLU | —Unverified | 0 | 0 |
| KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning | May 14, 2025 | BenchmarkingMMLU | —Unverified | 0 | 0 |
| KurTail : Kurtosis-based LLM Quantization | Mar 3, 2025 | GPULanguage Modeling | —Unverified | 0 | 0 |