| Leveraging Self-Supervised Learning for Speaker Diarization | Sep 14, 2024 | Self-Supervised Learningspeaker-diarization | CodeCode Available | 3 |
| The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech | Sep 14, 2024 | Self-Supervised LearningTransfer Learning | CodeCode Available | 3 |
| ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood | Sep 14, 2024 | Instruction FollowingText Generation | CodeCode Available | 3 |
| Breaking reCAPTCHAv2 | Sep 13, 2024 | Image SegmentationSemantic Segmentation | CodeCode Available | 3 |
| Apollo: Band-sequence Modeling for High-Quality Audio Restoration | Sep 13, 2024 | Computational EfficiencySpeech Enhancement | CodeCode Available | 3 |
| wgatools: an ultrafast toolkit for manipulating whole genome alignments | Sep 13, 2024 | | CodeCode Available | 3 |
| RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision | Sep 13, 2024 | Decoderobject-detection | CodeCode Available | 3 |
| Neural Message Passing Induced by Energy-Constrained Diffusion | Sep 13, 2024 | Inductive Bias | CodeCode Available | 3 |
| SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity | Sep 13, 2024 | Deep AttentionRepresentation Learning | CodeCode Available | 3 |
| WhisperNER: Unified Open Named Entity and Speech Recognition | Sep 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally | Sep 12, 2024 | | CodeCode Available | 3 |
| RePlay: a Recommendation Framework for Experimentation and Production Use | Sep 11, 2024 | Recommendation Systems | CodeCode Available | 3 |
| Agent Workflow Memory | Sep 11, 2024 | AI AgentLanguage Modeling | CodeCode Available | 3 |
| Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Sep 11, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 3 |
| StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos | Sep 11, 2024 | Video Inpainting | CodeCode Available | 3 |
| Alignment of Diffusion Models: Fundamentals, Challenges, and Future | Sep 11, 2024 | | CodeCode Available | 3 |
| One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion | Sep 10, 2024 | AllDeep Reinforcement Learning | CodeCode Available | 3 |
| Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments | Sep 9, 2024 | Imitation Learning | CodeCode Available | 3 |
| BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec | Sep 9, 2024 | Quantization | CodeCode Available | 3 |
| HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale | Sep 9, 2024 | Code GenerationFault localization | CodeCode Available | 3 |
| Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models | Sep 7, 2024 | ChunkingRetrieval | CodeCode Available | 3 |
| Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | Sep 6, 2024 | Experimental Designscientific discovery | CodeCode Available | 3 |
| DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | Sep 6, 2024 | Image Generation | CodeCode Available | 3 |
| Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| Theory, Analysis, and Best Practices for Sigmoid Self-Attention | Sep 6, 2024 | | CodeCode Available | 3 |
| Attention Heads of Large Language Models: A Survey | Sep 5, 2024 | Survey | CodeCode Available | 3 |
| The Role of Generative Systems in Historical Photography Management: A Case Study on Catalan Archives | Sep 5, 2024 | ManagementTransfer Learning | CodeCode Available | 3 |
| Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching | Sep 5, 2024 | | CodeCode Available | 3 |
| LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture | Sep 4, 2024 | GPUMamba | CodeCode Available | 3 |
| EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video | Sep 3, 2024 | 3D ReconstructionScene Understanding | CodeCode Available | 3 |
| LinFusion: 1 GPU, 1 Minute, 16K Image | Sep 3, 2024 | 16kCausal Inference | CodeCode Available | 3 |
| Affordance-based Robot Manipulation with Flow Matching | Sep 2, 2024 | Action GenerationRobot Manipulation | CodeCode Available | 3 |
| ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems | Sep 2, 2024 | BenchmarkingInstruction Following | CodeCode Available | 3 |
| ContextCite: Attributing Model Generation to Context | Sep 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| TinyAgent: Function Calling at the Edge | Sep 1, 2024 | Language ModellingQuantization | CodeCode Available | 3 |
| Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model | Aug 30, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 |
| VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters | Aug 30, 2024 | Image ReconstructionTime Series | CodeCode Available | 3 |
| CTNet: A Convolutional Transformer Network for EEG-Based Motor Imagery Classification | Aug 30, 2024 | Brain Computer InterfaceEEG | CodeCode Available | 3 |
| SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners | Aug 29, 2024 | Segmentation | CodeCode Available | 3 |
| LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Aug 28, 2024 | Computational EfficiencyHallucination | CodeCode Available | 3 |
| InstanSeg: an embedding-based instance segmentation algorithm optimized for accurate, efficient and portable cell segmentation | Aug 28, 2024 | Cell SegmentationGPU | CodeCode Available | 3 |
| LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation | Aug 28, 2024 | RAGRetrieval | CodeCode Available | 3 |
| The Mamba in the Llama: Distilling and Accelerating Hybrid Models | Aug 27, 2024 | GPULanguage Modeling | CodeCode Available | 3 |
| OctFusion: Octree-based Diffusion Models for 3D Shape Generation | Aug 27, 2024 | 3D Generation3D Shape Generation | CodeCode Available | 3 |
| A Survey of Camouflaged Object Detection and Beyond | Aug 26, 2024 | Instance SegmentationObject | CodeCode Available | 3 |
| SWE-bench-java: A GitHub Issue Resolving Benchmark for Java | Aug 26, 2024 | | CodeCode Available | 3 |
| Foundation Models for Music: A Survey | Aug 26, 2024 | In-Context LearningRepresentation Learning | CodeCode Available | 3 |
| Recent Event Camera Innovations: A Survey | Aug 24, 2024 | ArticlesEvent-based vision | CodeCode Available | 3 |
| LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs | Aug 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |