| Affordable AI Assistants with Knowledge Graph of Thoughts | Apr 3, 2025 | Knowledge GraphsLLM real-life tasks | CodeCode Available | 3 | 5 |
| WebCanvas: Benchmarking Web Agents in Online Environments | Jun 18, 2024 | AI AgentBenchmarking | CodeCode Available | 3 | 5 |
| ITINERA: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning | Feb 11, 2024 | LLM real-life tasksOpen-Domain Question Answering | CodeCode Available | 2 | 5 |
| AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | Mar 2, 2024 | Instruction FollowingLLM real-life tasks | CodeCode Available | 2 | 5 |
| mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imaging | Apr 30, 2025 | AI AgentClassification | CodeCode Available | 2 | 5 |
| LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents | Nov 9, 2023 | Instruction FollowingLLM real-life tasks | CodeCode Available | 2 | 5 |
| MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking | Jan 20, 2025 | Decision MakingGSM8K | CodeCode Available | 1 | 5 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 | 5 |
| Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives | Dec 16, 2024 | counterfactualGeneral Classification | CodeCode Available | 0 | 5 |