| DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments | Mar 8, 2025 | Decision MakingLarge Language Model | CodeCode Available | 0 |
| No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding | Mar 7, 2025 | Large Language Model | —Unverified | 0 |
| Revitalizing Saturated Benchmarks: A Weighted Metric Approach for Differentiating Large Language Model Performance | Mar 7, 2025 | ARCLanguage Modeling | —Unverified | 0 |
| SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding | Mar 7, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning | Mar 7, 2025 | Emotion RecognitionLanguage Modeling | CodeCode Available | 5 |
| A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Mar 7, 2025 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| LLM-based Iterative Approach to Metamodeling in Automotive | Mar 7, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs | Mar 7, 2025 | Large Language ModelMultiple-choice | CodeCode Available | 0 |
| DETQUS: Decomposition-Enhanced Transformers for QUery-focused Summarization | Mar 7, 2025 | DecoderLanguage Modeling | —Unverified | 0 |
| QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation | Mar 7, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Leveraging Approximate Caching for Faster Retrieval-Augmented Generation | Mar 7, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation | Mar 7, 2025 | Large Language ModelMedical Report Generation | CodeCode Available | 0 |
| TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator | Mar 7, 2025 | Large Language ModelRAG | —Unverified | 0 |
| Unveiling Biases in AI: ChatGPT's Political Economy Perspectives and Human Comparisons | Mar 7, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Better Process Supervision with Bi-directional Rewarding Signals | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model | Mar 6, 2025 | General KnowledgeImage Captioning | CodeCode Available | 2 |
| AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks | Mar 6, 2025 | document understandingLanguage Modeling | —Unverified | 0 |
| Measuring temporal effects of agent knowledge by date-controlled tool use | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Mar 6, 2025 | GPUHyperparameter Optimization | —Unverified | 0 |
| KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease | Mar 6, 2025 | ChunkingLanguage Modeling | CodeCode Available | 0 |
| Architecture for a Trustworthy Quantum Chatbot | Mar 6, 2025 | ChatbotLarge Language Model | —Unverified | 0 |
| The Next Frontier of LLM Applications: Open Ecosystems and Hardware Synergy | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services | Mar 6, 2025 | Deep Reinforcement LearningLanguage Modeling | —Unverified | 0 |
| ToolFuzz -- Automated Agent Tool Testing | Mar 6, 2025 | Large Language ModelPrompt Engineering | —Unverified | 0 |
| Leveraging Large Language Models to Address Data Scarcity in Machine Learning: Applications in Graphene Synthesis | Mar 6, 2025 | Binary ClassificationImputation | CodeCode Available | 0 |
| An Egocentric Vision-Language Model based Portable Real-time Smart Assistant | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges | Mar 6, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market | Mar 5, 2025 | ArticlesLarge Language Model | —Unverified | 0 |
| Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm | Mar 5, 2025 | Collision AvoidanceFairness | —Unverified | 0 |
| Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability | Mar 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization | Mar 5, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems | Mar 5, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 3 |
| PAIR: A Novel Large Language Model-Guided Selection Strategy for Evolutionary Algorithms | Mar 5, 2025 | DiversityEvolutionary Algorithms | CodeCode Available | 0 |
| LLM-TabFlow: Synthetic Tabular Data Generation with Inter-column Logical Relationship Preservation | Mar 4, 2025 | Large Language ModelTabular Data Generation | CodeCode Available | 0 |
| Hierarchical Re-ranker Retriever (HRR) | Mar 4, 2025 | Information RetrievalLanguage Modeling | —Unverified | 0 |
| Towards Explainable Doctor Recommendation with Large Language Models | Mar 4, 2025 | FairnessLarge Language Model | —Unverified | 0 |
| Measuring Political Preferences in AI Systems: An Integrative Approach | Mar 4, 2025 | Large Language ModelSentiment Analysis | —Unverified | 0 |
| DriveGen: Towards Infinite Diverse Traffic Scenarios with Large Models | Mar 4, 2025 | Autonomous DrivingDiversity | —Unverified | 0 |
| Trust, Experience, and Innovation: Key Factors Shaping American Attitudes About AI | Mar 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent | Mar 4, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 0 |
| InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model | Mar 4, 2025 | es-enLanguage Modeling | CodeCode Available | 1 |
| BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression | Mar 4, 2025 | Large Language ModelMachine Translation | CodeCode Available | 0 |
| DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models | Mar 4, 2025 | DiversityGPU | CodeCode Available | 2 |
| Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions | Mar 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ATLaS: Agent Tuning via Learning Critical Steps | Mar 4, 2025 | Decision MakingLanguage Modeling | —Unverified | 0 |
| Multimodal AI predicts clinical outcomes of drug combinations from preclinical data | Mar 4, 2025 | Large Language Model | CodeCode Available | 1 |
| RedChronos: A Large Language Model-Based Log Analysis System for Insider Threat Detection in Enterprises | Mar 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Text2Scenario: Text-Driven Scenario Generation for Autonomous Driving Test | Mar 4, 2025 | Autonomous DrivingDescriptive | —Unverified | 0 |
| Use Me Wisely: AI-Driven Assessment for LLM Prompting Skills Development | Mar 4, 2025 | feature selectionFew-Shot Learning | —Unverified | 0 |