| In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | Aug 9, 2024 | Image to textObject | CodeCode Available | 2 |
| ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | Aug 9, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| EfficientRAG: Efficient Retriever for Multi-Hop Question Answering | Aug 8, 2024 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 2 |
| mbrs: A Library for Minimum Bayes Risk Decoding | Aug 8, 2024 | Text Generation | CodeCode Available | 2 |
| Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection | Aug 8, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents | Aug 8, 2024 | | CodeCode Available | 2 |
| wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech | Aug 8, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP | Aug 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model | Aug 7, 2024 | 3D Human Pose EstimationLong-range modeling | CodeCode Available | 2 |
| RL-ADN: A High-Performance Deep Reinforcement Learning Environment for Optimal Energy Storage Systems Dispatch in Active Distribution Networks | Aug 7, 2024 | Computational EfficiencyData Augmentation | CodeCode Available | 2 |
| L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection | Aug 7, 2024 | 3D Object DetectionAutonomous Navigation | CodeCode Available | 2 |
| Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters | Aug 7, 2024 | GPU | CodeCode Available | 2 |
| Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks | Aug 7, 2024 | AttributeIn-Context Learning | CodeCode Available | 2 |
| CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications | Aug 7, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| LumiGauss: Relightable Gaussian Splatting in the Wild | Aug 6, 2024 | 3D ReconstructionNeRF | CodeCode Available | 2 |
| GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI | Aug 6, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| 500xCompressor: Generalized Prompt Compression for Large Language Models | Aug 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs | Aug 6, 2024 | Knowledge GraphsNatural Language Queries | CodeCode Available | 2 |
| TrafficGPT: An LLM Approach for Open-Set Encrypted Traffic Classification | Aug 6, 2024 | Traffic Classification | CodeCode Available | 2 |
| TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | Aug 6, 2024 | Speech EnhancementSpeech Separation | CodeCode Available | 2 |
| DaCapo: a modular deep learning framework for scalable 3D image segmentation | Aug 5, 2024 | Image SegmentationManagement | CodeCode Available | 2 |
| Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation | Aug 5, 2024 | RhythmSelf-Supervised Learning | CodeCode Available | 2 |
| XMainframe: A Large Language Model for Mainframe Modernization | Aug 5, 2024 | Code SummarizationLanguage Modeling | CodeCode Available | 2 |
| ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems | Aug 5, 2024 | AI Agent | CodeCode Available | 2 |
| Multistain Pretraining for Slide Representation Learning in Pathology | Aug 5, 2024 | Representation LearningSelf-Supervised Learning | CodeCode Available | 2 |