CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Jul 18, 2025 Code Generation GPU
— Unverified 0Towards Formal Verification of LLM-Generated Code from Natural Language Prompts Jul 17, 2025 Code Generation
— Unverified 0MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks Jul 16, 2025 Code Generation
— Unverified 0Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training Jul 16, 2025 Code Generation Math
— Unverified 0The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Jul 15, 2025 Code Generation Safety Alignment
Code Code Available 2CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks Jul 14, 2025 Benchmarking Code Generation
— Unverified 0Turning the Tide: Repository-based Code Reflection Jul 14, 2025 Code Generation Diversity
— Unverified 0CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance Jul 14, 2025 Benchmarking Code Generation
— Unverified 0Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding Jul 14, 2025 Code Generation Language Modeling
Code Code Available 9Multilingual Multimodal Software Developer for Code Generation Jul 11, 2025 Code Generation Instruction Following
— Unverified 0OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique Jul 11, 2025 Code Generation
— Unverified 0Automating MD simulations for Proteins using Large language Models: NAMD-Agent Jul 10, 2025 Code Generation Navigate
— Unverified 0Rethinking Verification for LLM Code Generation: From Generation to Testing Jul 9, 2025 Code Generation HumanEval
Code Code Available 1ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation Jul 7, 2025 Code Generation
— Unverified 0Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning Jul 7, 2025 Code Generation reinforcement-learning
— Unverified 0CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark Jul 4, 2025 Bug fixing Code Generation
Code Code Available 1EvoAgentX: An Automated Framework for Evolving Agentic Workflows Jul 4, 2025 Code Generation Math
Code Code Available 7CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks Jul 3, 2025 Benchmarking Code Generation
— Unverified 0LLM-based Realistic Safety-Critical Driving Video Generation Jul 2, 2025 Autonomous Driving Autonomous Vehicles
— Unverified 0A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis Jun 27, 2025 Code Generation Language Modeling
— Unverified 0Estimating Correctness Without Oracles in LLM-Based Code Generation Jun 26, 2025 Code Generation
Code Code Available 0Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Jun 26, 2025 Code Generation Large Language Model
— Unverified 0SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization Jun 25, 2025 Code Generation HumanEval
— Unverified 0DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Jun 25, 2025 Code Generation Denoising
Code Code Available 4SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models Jun 25, 2025 Code Generation In-Context Learning
— Unverified 0Language Modeling by Language Models Jun 25, 2025 Code Generation Language Modeling
Code Code Available 2ReCode: Updating Code API Knowledge with Reinforcement Learning Jun 25, 2025 Code Generation reinforcement-learning
Code Code Available 1Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Jun 24, 2025 Code Generation Diversity
— Unverified 0From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking Jun 24, 2025 Code Generation scientific discovery
Code Code Available 0QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges Jun 24, 2025 Benchmarking Code Generation
— Unverified 0Steering Conceptual Bias via Transformer Latent-Subspace Activation Jun 23, 2025 Code Generation
— Unverified 0The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs Jun 23, 2025 Code Generation
— Unverified 0Use Property-Based Testing to Bridge LLM Code Generation and Validation Jun 23, 2025 Code Generation test driven development
— Unverified 0RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation Jun 22, 2025 Code Generation
— Unverified 0TeXpert: A Multi-Level Benchmark for Evaluating LaTeX Code Generation by LLMs Jun 20, 2025 Code Generation
Code Code Available 1cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree Jun 18, 2025 Chunking Code Generation
Code Code Available 2Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team Jun 17, 2025 Code Generation GSM8K
Code Code Available 1Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality Jun 17, 2025 Code Generation Mathematical Reasoning
— Unverified 0Sampling from Your Language Model One Byte at a Time Jun 17, 2025 Code Generation Language Modeling
Code Code Available 1Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification Jun 17, 2025 Code Generation
Code Code Available 2How Does LLM Reasoning Work for Code? A Survey and a Call to Action Jun 16, 2025 Code Generation GitHub issue resolution
— Unverified 0LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning Jun 16, 2025 Code Generation Mathematical Problem-Solving
Code Code Available 0A Technical Study into Small Reasoning Language Models Jun 16, 2025 Code Generation Computational Efficiency
— Unverified 0FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation Jun 16, 2025 Code Generation
— Unverified 0The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models Jun 15, 2025 Chatbot Code Generation
— Unverified 0Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition? Jun 15, 2025 Code Generation
Code Code Available 2Structured Program Synthesis using LLMs: Results and Insights from the IPARC Challenge Jun 15, 2025 ARC Code Generation
— Unverified 0SWE-Bench-CL: Continual Learning for Coding Agents Jun 13, 2025 Code Generation Continual Learning
Code Code Available 0code_transformed: The Influence of Large Language Models on Code Jun 13, 2025 Code Generation
— Unverified 0ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification Jun 13, 2025 Code Generation reinforcement-learning
— Unverified 0