| SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization | Jun 18, 2024 | Landmark-based LipreadingLipreading | CodeCode Available | 2 |
| Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Jun 18, 2024 | Anomaly DetectionAnomaly Localization | CodeCode Available | 2 |
| PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers | Jun 18, 2024 | Decision MakingRAG | CodeCode Available | 2 |
| SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents | Jun 18, 2024 | Code GenerationCode Repair | CodeCode Available | 2 |
| Duoduo CLIP: Efficient 3D Understanding with Multi-View Images | Jun 17, 2024 | GPUObject | CodeCode Available | 2 |
| DiffMM: Multi-Modal Diffusion Model for Recommendation | Jun 17, 2024 | Contrastive Learningmodel | CodeCode Available | 2 |
| DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting | Jun 17, 2024 | Bayesian InferenceComputational Efficiency | CodeCode Available | 2 |
| Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning | Jun 17, 2024 | Data AugmentationMathematical Reasoning | CodeCode Available | 2 |
| Understanding Multi-Granularity for Open-Vocabulary Part Segmentation | Jun 17, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Residual and bidirectional LSTM for epileptic seizure detection | Jun 17, 2024 | EEGElectroencephalogram (EEG) | CodeCode Available | 2 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction | Jun 17, 2024 | Multi-Object TrackingMultiple People Tracking | CodeCode Available | 2 |
| Large Scale Transfer Learning for Tabular Data via Language Modeling | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Transcoders Find Interpretable LLM Feature Circuits | Jun 17, 2024 | | CodeCode Available | 2 |
| Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation | Jun 17, 2024 | DecoderSegmentation | CodeCode Available | 2 |
| Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging | Jun 17, 2024 | | CodeCode Available | 2 |
| MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | Jun 17, 2024 | Visual Question Answering | CodeCode Available | 2 |
| GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities | Jun 17, 2024 | Audio Question AnsweringInstruction Following | CodeCode Available | 2 |
| Zero-Shot Scene Change Detection | Jun 17, 2024 | Change DetectionScene Change Detection | CodeCode Available | 2 |
| Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models | Jun 17, 2024 | | CodeCode Available | 2 |
| OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations | Jun 17, 2024 | Depth Completion | CodeCode Available | 2 |
| GUICourse: From General Vision Language Models to Versatile GUI Agents | Jun 17, 2024 | Natural Language Visual GroundingOptical Character Recognition (OCR) | CodeCode Available | 2 |
| ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO | Jun 17, 2024 | Language ModellingQuestion Answering | CodeCode Available | 2 |
| In-Context Editing: Learning Knowledge from Self-Induced Distributions | Jun 17, 2024 | Image EditingIn-Context Learning | CodeCode Available | 2 |