| MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection | Aug 16, 2024 | Event DetectionSound Event Detection | CodeCode Available | 2 |
| AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents | Aug 15, 2024 | | CodeCode Available | 2 |
| SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training | Aug 15, 2024 | Continual Learningimage-classification | CodeCode Available | 2 |
| GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Aug 15, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Snuffy: Efficient Whole Slide Image Classifier | Aug 15, 2024 | Breast Cancer DetectionLung Cancer Diagnosis | CodeCode Available | 2 |
| Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning | Aug 15, 2024 | SegmentationVideo Segmentation | CodeCode Available | 2 |
| HAIR: Hypernetworks-based All-in-One Image Restoration | Aug 15, 2024 | 5-Degradation Blind All-in-One Image RestorationAll | CodeCode Available | 2 |
| Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework | Aug 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SustainDC: Benchmarking for Sustainable Data Center Control | Aug 14, 2024 | BenchmarkingManagement | CodeCode Available | 2 |
| BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning | Aug 14, 2024 | Backdoor AttackPrompt Learning | CodeCode Available | 2 |
| ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area | Aug 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis | Aug 14, 2024 | Anomaly DetectionBoundary Detection | CodeCode Available | 2 |
| Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration | Aug 14, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| Causal Agent based on Large Language Model | Aug 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Parallel Speculative Decoding with Adaptive Draft Length | Aug 13, 2024 | Text Generation | CodeCode Available | 2 |
| Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective | Aug 13, 2024 | Image GenerationSynthetic Image Detection | CodeCode Available | 2 |
| ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation | Aug 13, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training | Aug 12, 2024 | Data AugmentationVirtual Try-on | CodeCode Available | 2 |
| Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment | Aug 12, 2024 | Contrastive Learning | CodeCode Available | 2 |
| Efficient and Scalable Point Cloud Generation with Sparse Point-Voxel Diffusion Models | Aug 12, 2024 | Computational EfficiencyPoint Cloud Completion | CodeCode Available | 2 |
| Strategy Game-Playing with Size-Constrained State Abstraction | Aug 12, 2024 | | CodeCode Available | 2 |
| Post-Training Sparse Attention with Double Sparsity | Aug 11, 2024 | | CodeCode Available | 2 |
| SSL: A Self-similarity Loss for Improving Generative Image Super-resolution | Aug 11, 2024 | HallucinationImage Super-Resolution | CodeCode Available | 2 |
| FuXi Weather: A data-to-forecast machine learning system for global weather | Aug 10, 2024 | Computational EfficiencyWeather Forecasting | CodeCode Available | 2 |
| Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network | Aug 10, 2024 | geo-localizationImage Retrieval | CodeCode Available | 2 |
| ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | Aug 9, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | Aug 9, 2024 | Image to textObject | CodeCode Available | 2 |
| Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection | Aug 8, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents | Aug 8, 2024 | | CodeCode Available | 2 |
| wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech | Aug 8, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| mbrs: A Library for Minimum Bayes Risk Decoding | Aug 8, 2024 | Text Generation | CodeCode Available | 2 |
| Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP | Aug 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| EfficientRAG: Efficient Retriever for Multi-Hop Question Answering | Aug 8, 2024 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 2 |
| Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters | Aug 7, 2024 | GPU | CodeCode Available | 2 |
| Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks | Aug 7, 2024 | AttributeIn-Context Learning | CodeCode Available | 2 |
| RL-ADN: A High-Performance Deep Reinforcement Learning Environment for Optimal Energy Storage Systems Dispatch in Active Distribution Networks | Aug 7, 2024 | Computational EfficiencyData Augmentation | CodeCode Available | 2 |
| L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection | Aug 7, 2024 | 3D Object DetectionAutonomous Navigation | CodeCode Available | 2 |
| CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications | Aug 7, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model | Aug 7, 2024 | 3D Human Pose EstimationLong-range modeling | CodeCode Available | 2 |
| TrafficGPT: An LLM Approach for Open-Set Encrypted Traffic Classification | Aug 6, 2024 | Traffic Classification | CodeCode Available | 2 |
| 500xCompressor: Generalized Prompt Compression for Large Language Models | Aug 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | Aug 6, 2024 | Speech EnhancementSpeech Separation | CodeCode Available | 2 |
| Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs | Aug 6, 2024 | Knowledge GraphsNatural Language Queries | CodeCode Available | 2 |
| GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI | Aug 6, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| LumiGauss: Relightable Gaussian Splatting in the Wild | Aug 6, 2024 | 3D ReconstructionNeRF | CodeCode Available | 2 |
| DaCapo: a modular deep learning framework for scalable 3D image segmentation | Aug 5, 2024 | Image SegmentationManagement | CodeCode Available | 2 |
| Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation | Aug 5, 2024 | RhythmSelf-Supervised Learning | CodeCode Available | 2 |
| YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition | Aug 5, 2024 | Action Detection | CodeCode Available | 2 |
| XMainframe: A Large Language Model for Mainframe Modernization | Aug 5, 2024 | Code SummarizationLanguage Modeling | CodeCode Available | 2 |
| ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems | Aug 5, 2024 | AI Agent | CodeCode Available | 2 |