Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research Jun 10, 2025 Code Generation Prompt Engineering
— Unverified 0Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles Jun 9, 2025 Code Generation RAG
— Unverified 0Worst-Case Symbolic Constraints Analysis and Generalisation with Large Language Models Jun 9, 2025 Code Generation
— Unverified 0ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication Protocols Jun 9, 2025 Code Generation Specificity
— Unverified 0LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges Jun 9, 2025 Code Generation
Code Code Available 0Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation Jun 8, 2025 Code Generation Mathematical Problem-Solving
Code Code Available 0VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code Jun 8, 2025 Code Generation Prediction
— Unverified 0Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems Jun 7, 2025 Code Generation valid
— Unverified 0CP-Bench: Evaluating Large Language Models for Constraint Modelling Jun 6, 2025 Code Generation
— Unverified 0SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code Jun 6, 2025 Code Generation Vulnerability Detection
— Unverified 0Can Theoretical Physics Research Benefit from Language Agents? Jun 6, 2025 Code Generation Mathematical Reasoning
— Unverified 0hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation Jun 5, 2025 Code Generation Code Translation
— Unverified 0ScaleRTL: Scaling LLMs with Reasoning Data and Test-Time Compute for Accurate RTL Code Generation Jun 5, 2025 Code Generation
— Unverified 0ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests Jun 5, 2025 Code Generation
— Unverified 0Deployability-Centric Infrastructure-as-Code Generation: An LLM-based Iterative Framework Jun 5, 2025 Code Generation
Code Code Available 0Demonstrations of Integrity Attacks in Multi-Agent Systems Jun 5, 2025 Code Generation Natural Language Understanding
— Unverified 0Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science Jun 4, 2025 Articles Code Generation
Code Code Available 0From Virtual Agents to Robot Teams: A Multi-Robot Framework Evaluation in High-Stakes Healthcare Context Jun 4, 2025 Code Generation
— Unverified 0CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking Jun 4, 2025 Benchmarking Code Generation
— Unverified 0Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems Jun 4, 2025 Benchmarking Code Generation
— Unverified 0VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation Jun 4, 2025 Code Generation
— Unverified 0Rethinking the effects of data contamination in Code Intelligence Jun 3, 2025 Code Generation Code Summarization
— Unverified 0Adaptive Graph Pruning for Multi-Agent Communication Jun 3, 2025 Code Generation Large Language Model
Code Code Available 0DiaBlo: Diagonal Blocks Are Sufficient For Finetuning Jun 3, 2025 Arithmetic Reasoning Code Generation
Code Code Available 0How do Pre-Trained Models Support Software Engineering? An Empirical Study in Hugging Face Jun 3, 2025 Code Generation Text Generation
— Unverified 0ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Jun 2, 2025 Benchmarking Code Generation
— Unverified 0SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design Jun 2, 2025 Code Generation Machine Unlearning
— Unverified 0Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability Jun 2, 2025 Code Generation
Code Code Available 0Legal Compliance Evaluation of Smart Contracts Generated By Large Language Models Jun 1, 2025 Code Generation
— Unverified 0CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval May 31, 2025 Code Generation Information Retrieval
— Unverified 0SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation May 30, 2025 Code Generation HumanEval
— Unverified 0HardTests: Synthesizing High-Quality Test Cases for LLM Coding May 30, 2025 Code Generation Language Modeling
— Unverified 0RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation May 30, 2025 Code Generation Diversity
Code Code Available 0Cascading Adversarial Bias from Injection to Distillation in Language Models May 30, 2025 Bias Detection Code Generation
— Unverified 0A Reward-driven Automated Webshell Malicious-code Generator for Red-teaming May 30, 2025 Code Generation Diversity
— Unverified 0Eye of Judgement: Dissecting the Evaluation of Russian-speaking LLMs with POLLUX May 30, 2025 Code Generation
— Unverified 0Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards May 30, 2025 Code Generation
— Unverified 0OSS-UAgent: An Agent-based Usability Evaluation Framework for Open Source Software May 29, 2025 Code Generation
Code Code Available 0Self-Correcting Code Generation Using Small Language Models May 29, 2025 Code Generation HumanEval
Code Code Available 0Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification May 29, 2025 Code Generation
Code Code Available 0Using Reasoning Models to Generate Search Heuristics that Solve Open Instances of Combinatorial Design Problems May 29, 2025 Code Generation
Code Code Available 0LLM Performance for Code Generation on Noisy Tasks May 29, 2025 Benchmarking Code Generation
Code Code Available 0SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving May 29, 2025 Code Generation
— Unverified 0Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach May 29, 2025 Code Generation HumanEval
— Unverified 0HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding May 28, 2025 Code Completion Code Generation
— Unverified 0DeepRTL2: A Versatile Model for RTL-Related Tasks May 28, 2025 Code Generation Code Search
— Unverified 0Rendering-Aware Reinforcement Learning for Vector Graphics Generation May 27, 2025 Code Generation reinforcement-learning
— Unverified 0An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks May 27, 2025 Code Generation Code Summarization
— Unverified 0RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving May 27, 2025 Code Generation
Code Code Available 0An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation May 26, 2025 Code Generation GitHub issue resolution
Code Code Available 0