| No Dataset Needed for Downstream Knowledge Benchmarking: Response Dispersion Inversely Correlates with Accuracy on Domain-specific QA | Aug 24, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory | Aug 24, 2024 | BenchmarkingData Augmentation | —Unverified | 0 |
| Open Llama2 Model for the Lithuanian Language | Aug 23, 2024 | Benchmarkingmodel | —Unverified | 0 |
| Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection | Aug 23, 2024 | BenchmarkingBinary Classification | —Unverified | 0 |
| S3Simulator: A benchmarking Side Scan Sonar Simulator dataset for Underwater Image Analysis | Aug 23, 2024 | Benchmarking | CodeCode Available | 0 |
| Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures | Aug 22, 2024 | BenchmarkingTrajectory Prediction | —Unverified | 0 |
| Benchmarking Counterfactual Interpretability in Deep Learning Models for Time Series Classification | Aug 22, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentation | Aug 22, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| MultiMed: Massively Multimodal and Multitask Medical Understanding | Aug 22, 2024 | BenchmarkingMedical Question Answering | —Unverified | 0 |
| Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis | Aug 22, 2024 | Benchmarking | —Unverified | 0 |
| WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain | Aug 21, 2024 | Answer GenerationBenchmarking | —Unverified | 0 |
| Advances in Preference-based Reinforcement Learning: A Review | Aug 21, 2024 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins | Aug 21, 2024 | Benchmarking | CodeCode Available | 0 |
| RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands | Aug 20, 2024 | BenchmarkingContact-rich Manipulation | —Unverified | 0 |
| QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning | Aug 20, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library | Aug 20, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data | Aug 20, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving | Aug 19, 2024 | BenchmarkingMachine Translation | —Unverified | 0 |
| Benchmarking quantum machine learning kernel training for classification tasks | Aug 17, 2024 | BenchmarkingQuantum Machine Learning | CodeCode Available | 0 |
| Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors | Aug 15, 2024 | BenchmarkingManagement | —Unverified | 0 |
| XCompress: LLM assisted Python-based text compression toolkit | Aug 12, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| A Novel Momentum-Based Deep Learning Techniques for Medical Image Classification and Segmentation | Aug 11, 2024 | Benchmarkingimage-classification | —Unverified | 0 |
| A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements | Aug 11, 2024 | BenchmarkingMotion Planning | —Unverified | 0 |
| Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration | Aug 9, 2024 | BenchmarkingVideo Compression | —Unverified | 0 |
| Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy | Aug 9, 2024 | BenchmarkingMedical Image Analysis | CodeCode Available | 0 |
| h4rm3l: A language for Composable Jailbreak Attack Synthesis | Aug 9, 2024 | BenchmarkingProgram Synthesis | —Unverified | 0 |
| FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data | Aug 8, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| SegXAL: Explainable Active Learning for Semantic Segmentation in Driving Scene Scenarios | Aug 8, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| Towards Explainable Network Intrusion Detection using Large Language Models | Aug 8, 2024 | BenchmarkingIntrusion Detection | —Unverified | 0 |
| Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal | Aug 7, 2024 | BenchmarkingHard Attention | —Unverified | 0 |
| Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions | Aug 7, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline | Aug 6, 2024 | Benchmarking | —Unverified | 0 |
| From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | Aug 5, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| LMEMs for post-hoc analysis of HPO Benchmarking | Aug 5, 2024 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 |
| MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities | Aug 5, 2024 | BenchmarkingGraph Generation | —Unverified | 0 |
| SPINEX-TimeSeries: Similarity-based Predictions with Explainable Neighbors Exploration for Time Series and Forecasting Problems | Aug 4, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance | Aug 4, 2024 | Action AnticipationBenchmarking | —Unverified | 0 |
| Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations | Aug 3, 2024 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data | Aug 3, 2024 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 |
| Visual-Inertial SLAM for Unstructured Outdoor Environments: Benchmarking the Benefits and Computational Costs of Loop Closing | Aug 3, 2024 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |
| IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model | Aug 2, 2024 | BenchmarkingFeature Engineering | —Unverified | 0 |
| Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Aug 2, 2024 | Benchmarkingmultimodal interaction | CodeCode Available | 0 |
| PINNs for Medical Image Analysis: A Survey | Aug 2, 2024 | AnatomyBenchmarking | —Unverified | 0 |
| IN-Sight: Interactive Navigation through Sight | Aug 1, 2024 | BenchmarkingNavigate | —Unverified | 0 |
| High-Quality, ROS Compatible Video Encoding and Decoding for High-Definition Datasets | Aug 1, 2024 | BenchmarkingSimultaneous Localization and Mapping | CodeCode Available | 0 |
| KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making | Jul 31, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication | Jul 30, 2024 | BenchmarkingSuper-Resolution | —Unverified | 0 |
| TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models | Jul 30, 2024 | BenchmarkingCode Completion | —Unverified | 0 |