| EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control | Oct 1, 2024 | Emotional Speech SynthesisSpeech Synthesis | CodeCode Available | 2 |
| GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving | Oct 1, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| Generative causal testing to bridge data-driven models and scientific theories in language neuroscience | Oct 1, 2024 | | CodeCode Available | 2 |
| Uncertainty Modelling and Robust Observer Synthesis using the Koopman Operator | Oct 1, 2024 | | CodeCode Available | 2 |
| Recent Advances in Speech Language Models: A Survey | Oct 1, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages | Oct 1, 2024 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 2 |
| PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection | Oct 1, 2024 | 3D Anomaly DetectionAnomaly Detection | CodeCode Available | 2 |
| EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics | Oct 1, 2024 | | CodeCode Available | 2 |
| CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAM | Oct 1, 2024 | 3DGSSimultaneous Localization and Mapping | CodeCode Available | 2 |
| DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction | Sep 30, 2024 | 3D Object Detection3D Semantic Occupancy Prediction | CodeCode Available | 2 |
| FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" | Sep 30, 2024 | counterfactualHallucination | CodeCode Available | 2 |
| RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models | Sep 30, 2024 | Contrastive Learning | CodeCode Available | 2 |
| HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes | Sep 30, 2024 | Objectobject-detection | CodeCode Available | 2 |
| End-to-end Piano Performance-MIDI to Score Conversion with Transformers | Sep 30, 2024 | | CodeCode Available | 2 |
| Frequency Adaptive Normalization For Non-stationary Time Series Forecasting | Sep 30, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation | Sep 30, 2024 | AttributeCollaborative Filtering | CodeCode Available | 2 |
| Towards Robust Multimodal Sentiment Analysis with Incomplete Data | Sep 30, 2024 | Multimodal Sentiment AnalysisSentiment Analysis | CodeCode Available | 2 |
| QAEncoder: Towards Aligned Representation Learning in Question Answering System | Sep 30, 2024 | Document EmbeddingQuestion Answering | CodeCode Available | 2 |
| Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning | Sep 30, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| PerCo (SD): Open Perceptual Compression | Sep 30, 2024 | AttributeImage Compression | CodeCode Available | 2 |
| LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models | Sep 30, 2024 | Fairness | CodeCode Available | 2 |
| Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models | Sep 30, 2024 | BenchmarkingContinual Learning | CodeCode Available | 2 |
| Melody-Guided Music Generation | Sep 30, 2024 | cross-modal alignmentMusic Generation | CodeCode Available | 2 |
| DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data | Sep 30, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation | Sep 30, 2024 | Cross-Modal RetrievalDynamic Time Warping | CodeCode Available | 2 |