| Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency | Dec 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation | Dec 17, 2024 | Fact VerificationKnowledge Graphs | CodeCode Available | 2 |
| ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers | Dec 17, 2024 | ArticlesForm | CodeCode Available | 2 |
| Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach | Dec 16, 2024 | Representation LearningRetrieval | CodeCode Available | 2 |
| Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection | Dec 16, 2024 | LLM-generated Text DetectionText Detection | CodeCode Available | 2 |
| Predicting the Original Appearance of Damaged Historical Documents | Dec 16, 2024 | Binarization | CodeCode Available | 2 |
| BiM-VFI: directional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions | Dec 16, 2024 | Knowledge DistillationMotion Estimation | CodeCode Available | 2 |
| Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Dec 16, 2024 | HallucinationRobot Manipulation | CodeCode Available | 2 |
| SCoralDet: Efficient real-time underwater soft coral detection with YOLO | Dec 16, 2024 | 2D Object Detectionobject-detection | CodeCode Available | 2 |
| DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis | Dec 16, 2024 | DisentanglementMultimodal Sentiment Analysis | CodeCode Available | 2 |
| The dark side of the forces: assessing non-conservative force models for atomistic machine learning | Dec 16, 2024 | Computational chemistryComputational Efficiency | CodeCode Available | 2 |
| HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection | Dec 16, 2024 | 3D Object Detection3D Object Detection on View-of-Delft (val) | CodeCode Available | 2 |
| No More Adam: Learning Rate Scaling at Initialization is All You Need | Dec 16, 2024 | All | CodeCode Available | 2 |
| Gramian Multimodal Representation Learning and Alignment | Dec 16, 2024 | Contrastive LearningRepresentation Learning | CodeCode Available | 2 |
| RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation | Dec 16, 2024 | RAGRetrieval | CodeCode Available | 2 |
| FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning | Dec 16, 2024 | DeepFake Detectiondiffusion-generated faces detection | CodeCode Available | 2 |
| Generative Inbetweening through Frame-wise Conditions-Driven Video Generation | Dec 16, 2024 | Video Generation | CodeCode Available | 2 |
| Causal Diffusion Transformers for Generative Modeling | Dec 16, 2024 | DecoderImage Generation | CodeCode Available | 2 |
| DINO-Foresight: Looking into the Future with DINO | Dec 16, 2024 | Autonomous DrivingScene Understanding | CodeCode Available | 2 |
| LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts | Dec 16, 2024 | General KnowledgeInstruction Following | CodeCode Available | 2 |
| ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data | Dec 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| AirMorph: Topology-Preserving Deep Learning for Pulmonary Airway Analysis | Dec 15, 2024 | AnatomyDeep Learning | CodeCode Available | 2 |
| SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models | Dec 15, 2024 | | CodeCode Available | 2 |
| Exploring Enhanced Contextual Information for Video-Level Object Tracking | Dec 15, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| Reliable, Reproducible, and Really Fast Leaderboards with Evalica | Dec 15, 2024 | | CodeCode Available | 2 |