| LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts | Dec 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LLM-SR: Scientific Equation Discovery via Programming with Large Language Models | Apr 29, 2024 | Equation DiscoveryInterpretable Machine Learning | CodeCode Available | 1 | 5 |
| LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research | Jun 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text | Mar 25, 2025 | Cross-Modal RetrievalHallucination | CodeCode Available | 1 | 5 |
| LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models | Nov 11, 2024 | Knowledge DistillationLanguage Modeling | CodeCode Available | 1 | 5 |
| AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework | Dec 31, 2023 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 | 5 |
| LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins | Sep 19, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| Adaptive KalmanNet: Data-Driven Kalman Filter with Fast Adaptation | Sep 13, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving | Jun 21, 2025 | Autonomous DrivingDescriptive | CodeCode Available | 1 | 5 |
| DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer | Nov 27, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 1 | 5 |
| LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis | Oct 23, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 1 | 5 |
| LLMDet: A Third Party Large Language Models Generated Text Detection Tool | May 24, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| LLM experiments with simulation: Large Language Model Multi-Agent System for Simulation Model Parametrization in Digital Twins | May 28, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 | 5 |
| DOMINO: A Dual-System for Multi-step Visual Language Reasoning | Oct 4, 2023 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 | 5 |
| Automated Spinal MRI Labelling from Reports Using a Large Language Model | Oct 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LLMBind: A Unified Modality-Task Integration Framework | Feb 22, 2024 | AI AgentAudio Generation | CodeCode Available | 1 | 5 |
| Do Large Language Model Benchmarks Test Reliability? | Feb 5, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment | Oct 28, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon | Jun 12, 2025 | Large Language ModelStarcraft | CodeCode Available | 1 | 5 |
| LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-Explanations | Jan 23, 2024 | counterfactualFact Checking | CodeCode Available | 1 | 5 |
| Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents | Feb 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling | Mar 2, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints | May 29, 2024 | DiversityLanguage Modeling | CodeCode Available | 1 | 5 |
| Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators | Mar 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |