| Segment Anything in Medical Images and Videos: Benchmark and Deployment | Aug 6, 2024 | BenchmarkingSegmentation | CodeCode Available | 7 |
| Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline | Aug 6, 2024 | Benchmarking | —Unverified | 0 |
| MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities | Aug 5, 2024 | BenchmarkingGraph Generation | —Unverified | 0 |
| From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | Aug 5, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| LMEMs for post-hoc analysis of HPO Benchmarking | Aug 5, 2024 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 |
| User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance | Aug 4, 2024 | Action AnticipationBenchmarking | —Unverified | 0 |
| SPINEX-TimeSeries: Similarity-based Predictions with Explainable Neighbors Exploration for Time Series and Forecasting Problems | Aug 4, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Visual-Inertial SLAM for Unstructured Outdoor Environments: Benchmarking the Benefits and Computational Costs of Loop Closing | Aug 3, 2024 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |
| Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data | Aug 3, 2024 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 |
| Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations | Aug 3, 2024 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model | Aug 2, 2024 | BenchmarkingFeature Engineering | —Unverified | 0 |
| Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics | Aug 2, 2024 | Adversarial AttackAdversarial Purification | CodeCode Available | 1 |
| Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Aug 2, 2024 | Benchmarkingmultimodal interaction | CodeCode Available | 0 |
| RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework | Aug 2, 2024 | BenchmarkingDataset Generation | CodeCode Available | 3 |
| PINNs for Medical Image Analysis: A Survey | Aug 2, 2024 | AnatomyBenchmarking | —Unverified | 0 |
| IN-Sight: Interactive Navigation through Sight | Aug 1, 2024 | BenchmarkingNavigate | —Unverified | 0 |
| High-Quality, ROS Compatible Video Encoding and Decoding for High-Definition Datasets | Aug 1, 2024 | BenchmarkingSimultaneous Localization and Mapping | CodeCode Available | 0 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making | Jul 31, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication | Jul 30, 2024 | BenchmarkingSuper-Resolution | —Unverified | 0 |
| TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models | Jul 30, 2024 | BenchmarkingCode Completion | —Unverified | 0 |
| GNUMAP: A Parameter-Free Approach to Unsupervised Dimensionality Reduction via Graph Neural Networks | Jul 30, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images | Jul 30, 2024 | BenchmarkingMultiple Instance Learning | —Unverified | 0 |
| Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks | Jul 29, 2024 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 |
| Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning | Jul 29, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |