| Compositional Chain-of-Thought Prompting for Large Multimodal Models | Nov 27, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| AttributionBench: How Hard is Automatic Attribution Evaluation? | Feb 23, 2024 | Binary ClassificationLanguage Modeling | CodeCode Available | 1 | 5 |
| Establishing baselines for generative discovery of inorganic crystals | Jan 4, 2025 | Band GapLanguage Modeling | CodeCode Available | 1 | 5 |
| CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning | Jul 30, 2024 | Contrastive LearningDiagnostic | CodeCode Available | 1 | 5 |
| DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments | May 31, 2025 | Large Language Model | CodeCode Available | 1 | 5 |
| Making Language Models Better Tool Learners with Execution Feedback | May 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers | Apr 25, 2025 | Large Language Model | CodeCode Available | 1 | 5 |
| Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation | Aug 20, 2023 | Code GenerationLanguage Modeling | CodeCode Available | 1 | 5 |
| MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration | Nov 14, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |