| Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models | Oct 10, 2024 | GSM8KMath | CodeCode Available | 2 |
| Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs) | Oct 10, 2024 | EEGText Generation | CodeCode Available | 2 |
| Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Oct 10, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation | Oct 10, 2024 | | CodeCode Available | 2 |
| Q-VLM: Post-training Quantization for Large Vision-Language Models | Oct 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Window Function-less DFT with Reduced Noise and Latency for Real-Time Music Analysis | Oct 10, 2024 | | CodeCode Available | 2 |
| Interactive4D: Interactive 4D LiDAR Segmentation | Oct 10, 2024 | Interactive SegmentationSegmentation | CodeCode Available | 2 |
| Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting | Oct 9, 2024 | Surface Reconstruction | CodeCode Available | 2 |
| Enhancing Soccer Camera Calibration Through Keypoint Exploitation | Oct 9, 2024 | Camera CalibrationCamera Pose Estimation | CodeCode Available | 2 |
| Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training | Oct 9, 2024 | Caption GenerationContrastive Learning | CodeCode Available | 2 |
| Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates | Oct 9, 2024 | | CodeCode Available | 2 |
| Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond | Oct 9, 2024 | Benchmarking | CodeCode Available | 2 |
| Compositional Entailment Learning for Hyperbolic Vision-Language Models | Oct 9, 2024 | Language ModellingRepresentation Learning | CodeCode Available | 2 |
| CursorCore: Assist Programming through Aligning Anything | Oct 9, 2024 | Code Completion | CodeCode Available | 2 |
| Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | Oct 9, 2024 | cross-modal alignmentVisual Question Answering | CodeCode Available | 2 |
| EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models | Oct 9, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Towards Interpreting Visual Information Processing in Vision-Language Models | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration | Oct 9, 2024 | | CodeCode Available | 2 |
| LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction | Oct 9, 2024 | DecoderForm | CodeCode Available | 2 |
| An Undetectable Watermark for Generative Image Models | Oct 9, 2024 | | CodeCode Available | 2 |
| Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions | Oct 9, 2024 | Semantic Compression | CodeCode Available | 2 |
| Sylber: Syllabic Embedding Representation of Speech from Raw Audio | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers | Oct 9, 2024 | DecoderRe-Ranking | CodeCode Available | 2 |
| MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses | Oct 9, 2024 | scientific discoveryvalid | CodeCode Available | 2 |
| Towards Natural Image Matting in the Wild via Real-Scenario Prior | Oct 9, 2024 | DecoderImage Matting | CodeCode Available | 2 |
| MatMamba: A Matryoshka State Space Model | Oct 9, 2024 | modelRepresentation Learning | CodeCode Available | 2 |
| ReFIR: Grounding Large Restoration Models with Retrieval Augmentation | Oct 8, 2024 | HallucinationImage Restoration | CodeCode Available | 2 |
| FedGraph: A Research Library and Benchmark for Federated Graph Learning | Oct 8, 2024 | BenchmarkingFederated Learning | CodeCode Available | 2 |
| Think While You Generate: Discrete Diffusion with Planned Denoising | Oct 8, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| TRACE: Temporal Grounding Video LLM via Causal Event Modeling | Oct 8, 2024 | Text GenerationVideo Understanding | CodeCode Available | 2 |
| DeMo: Decoupling Motion Forecasting into Directional Intentions and Dynamic States | Oct 8, 2024 | Autonomous DrivingMamba | CodeCode Available | 2 |
| TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data | Oct 8, 2024 | Change DetectionEarth Observation | CodeCode Available | 2 |
| LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs | Oct 8, 2024 | Knowledge GraphsRAG | CodeCode Available | 2 |
| Large Continual Instruction Assistant | Oct 8, 2024 | Question AnsweringSemantic Similarity | CodeCode Available | 2 |
| Motion Forecasting in Continuous Driving | Oct 8, 2024 | Autonomous DrivingMotion Forecasting | CodeCode Available | 2 |
| MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model | Oct 8, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing Images | Oct 8, 2024 | | CodeCode Available | 2 |
| TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation | Oct 8, 2024 | Video Generation | CodeCode Available | 2 |
| Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought | Oct 8, 2024 | | CodeCode Available | 2 |
| Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See | Oct 8, 2024 | | CodeCode Available | 2 |
| MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | Oct 8, 2024 | Mixture-of-ExpertsQuantization | CodeCode Available | 2 |
| LeanAgent: Lifelong Learning for Formal Theorem Proving | Oct 8, 2024 | Abstract AlgebraAutomated Theorem Proving | CodeCode Available | 2 |
| BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation | Oct 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling | Oct 8, 2024 | document understandingLanguage Modeling | CodeCode Available | 2 |
| BEVLoc: Cross-View Localization and Matching via Birds-Eye-View Synthesis | Oct 8, 2024 | Autonomous DrivingContrastive Learning | CodeCode Available | 2 |
| Causal Context Adjustment Loss for Learned Image Compression | Oct 7, 2024 | Image Compression | CodeCode Available | 2 |
| Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Oct 7, 2024 | Causal Inferencecounterfactual | CodeCode Available | 2 |
| Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting | Oct 7, 2024 | 3DGS | CodeCode Available | 2 |
| Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet | Oct 7, 2024 | DenoisingSpeech Denoising | CodeCode Available | 2 |
| TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention | Oct 7, 2024 | Position | CodeCode Available | 2 |