| No Dataset Needed for Downstream Knowledge Benchmarking: Response Dispersion Inversely Correlates with Accuracy on Domain-specific QA | Aug 24, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory | Aug 24, 2024 | BenchmarkingData Augmentation | —Unverified | 0 |
| Open Llama2 Model for the Lithuanian Language | Aug 23, 2024 | Benchmarkingmodel | —Unverified | 0 |
| Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection | Aug 23, 2024 | BenchmarkingBinary Classification | —Unverified | 0 |
| S3Simulator: A benchmarking Side Scan Sonar Simulator dataset for Underwater Image Analysis | Aug 23, 2024 | Benchmarking | CodeCode Available | 0 |
| Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures | Aug 22, 2024 | BenchmarkingTrajectory Prediction | —Unverified | 0 |
| Benchmarking Counterfactual Interpretability in Deep Learning Models for Time Series Classification | Aug 22, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentation | Aug 22, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| MultiMed: Massively Multimodal and Multitask Medical Understanding | Aug 22, 2024 | BenchmarkingMedical Question Answering | —Unverified | 0 |
| Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis | Aug 22, 2024 | Benchmarking | —Unverified | 0 |
| WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain | Aug 21, 2024 | Answer GenerationBenchmarking | —Unverified | 0 |
| Advances in Preference-based Reinforcement Learning: A Review | Aug 21, 2024 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins | Aug 21, 2024 | Benchmarking | CodeCode Available | 0 |
| RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands | Aug 20, 2024 | BenchmarkingContact-rich Manipulation | —Unverified | 0 |
| QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning | Aug 20, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library | Aug 20, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data | Aug 20, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving | Aug 19, 2024 | BenchmarkingMachine Translation | —Unverified | 0 |
| Benchmarking quantum machine learning kernel training for classification tasks | Aug 17, 2024 | BenchmarkingQuantum Machine Learning | CodeCode Available | 0 |
| Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors | Aug 15, 2024 | BenchmarkingManagement | —Unverified | 0 |
| XCompress: LLM assisted Python-based text compression toolkit | Aug 12, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| A Novel Momentum-Based Deep Learning Techniques for Medical Image Classification and Segmentation | Aug 11, 2024 | Benchmarkingimage-classification | —Unverified | 0 |
| A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements | Aug 11, 2024 | BenchmarkingMotion Planning | —Unverified | 0 |
| Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration | Aug 9, 2024 | BenchmarkingVideo Compression | —Unverified | 0 |