| Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency | Sep 29, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding | May 12, 2025 | Code GenerationComment Generation | CodeCode Available | 0 | 5 |
| CoCoNUT: Structural Code Understanding does not fall out of a tree | Jan 27, 2025 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 | 5 |
| Can Programming Languages Boost Each Other via Instruction Tuning? | Aug 31, 2023 | HumanEval | CodeCode Available | 0 | 5 |
| Multi-Programming Language Ensemble for Code Generation in Large Language Model | Sep 6, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 | 5 |
| Can Github issues be solved with Tree Of Thoughts? | May 20, 2024 | Code GenerationGitHub issue resolution | CodeCode Available | 0 | 5 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| Large Language Models of Code Fail at Completing Code with Potential Bugs | Jun 6, 2023 | Code CompletionHumanEval | CodeCode Available | 0 | 5 |