| Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction | Jun 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding | Jun 18, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 2 |
| Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Jun 18, 2024 | Anomaly DetectionAnomaly Localization | CodeCode Available | 2 |
| Universal Score-based Speech Enhancement with High Content Preservation | Jun 18, 2024 | Speech Enhancement | CodeCode Available | 2 |
| MegaScenes: Scene-Level View Synthesis at Scale | Jun 17, 2024 | Novel View Synthesis | CodeCode Available | 2 |
| Task Me Anything | Jun 17, 2024 | 2kAttribute | CodeCode Available | 2 |
| Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset | Jun 17, 2024 | Aerial Scene ClassificationDiversity | CodeCode Available | 2 |
| Solving the Inverse Problem of Electrocardiography for Cardiac Digital Twins: A Survey | Jun 17, 2024 | AnatomyComputational Efficiency | CodeCode Available | 2 |
| DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors | Jun 17, 2024 | text-to-speechText to Speech | CodeCode Available | 2 |
| mDPO: Conditional Preference Optimization for Multimodal Large Language Models | Jun 17, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction | Jun 17, 2024 | Multi-Object TrackingMultiple People Tracking | CodeCode Available | 2 |
| Residual and bidirectional LSTM for epileptic seizure detection | Jun 17, 2024 | EEGElectroencephalogram (EEG) | CodeCode Available | 2 |
| Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging | Jun 17, 2024 | | CodeCode Available | 2 |
| Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning | Jun 17, 2024 | Data AugmentationMathematical Reasoning | CodeCode Available | 2 |
| MedCalc-Bench: Evaluating Large Language Models for Medical Calculations | Jun 17, 2024 | DescriptiveMedical Diagnosis | CodeCode Available | 2 |
| GUICourse: From General Vision Language Models to Versatile GUI Agents | Jun 17, 2024 | Natural Language Visual GroundingOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Transcoders Find Interpretable LLM Feature Circuits | Jun 17, 2024 | | CodeCode Available | 2 |
| In-Context Editing: Learning Knowledge from Self-Induced Distributions | Jun 17, 2024 | Image EditingIn-Context Learning | CodeCode Available | 2 |
| ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO | Jun 17, 2024 | Language ModellingQuestion Answering | CodeCode Available | 2 |
| Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% | Jun 17, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Understanding Multi-Granularity for Open-Vocabulary Part Segmentation | Jun 17, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Zero-Shot Scene Change Detection | Jun 17, 2024 | Change DetectionScene Change Detection | CodeCode Available | 2 |
| Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models | Jun 17, 2024 | | CodeCode Available | 2 |
| Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation | Jun 17, 2024 | DecoderSegmentation | CodeCode Available | 2 |
| MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | Jun 17, 2024 | Visual Question Answering | CodeCode Available | 2 |
| GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities | Jun 17, 2024 | Audio Question AnsweringInstruction Following | CodeCode Available | 2 |
| OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations | Jun 17, 2024 | Depth Completion | CodeCode Available | 2 |
| Large Scale Transfer Learning for Tabular Data via Language Modeling | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting | Jun 17, 2024 | Bayesian InferenceComputational Efficiency | CodeCode Available | 2 |
| Duoduo CLIP: Efficient 3D Understanding with Multi-View Images | Jun 17, 2024 | GPUObject | CodeCode Available | 2 |
| DiffMM: Multi-Modal Diffusion Model for Recommendation | Jun 17, 2024 | Contrastive Learningmodel | CodeCode Available | 2 |
| Ontology Embedding: A Survey of Methods, Applications and Resources | Jun 16, 2024 | Logical ReasoningOntology Embedding | CodeCode Available | 2 |
| STAR: Scale-wise Text-to-image generation via Auto-Regressive representations | Jun 16, 2024 | DiversityImage Generation | CodeCode Available | 2 |
| Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov Arnold Networks | Jun 16, 2024 | FormKolmogorov-Arnold Networks | CodeCode Available | 2 |
| RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models | Jun 16, 2024 | Adversarial AttackBenchmarking | CodeCode Available | 2 |
| ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models | Jun 16, 2024 | Video Generation | CodeCode Available | 2 |
| CrossFuse: A Novel Cross Attention Mechanism based Infrared and Visible Image Fusion Approach | Jun 15, 2024 | DecoderInfrared And Visible Image Fusion | CodeCode Available | 2 |
| Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection | Jun 15, 2024 | 3D Object DetectionComputational Efficiency | CodeCode Available | 2 |
| Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly | Jun 15, 2024 | | CodeCode Available | 2 |
| CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation | Jun 15, 2024 | In-Context LearningText Generation | CodeCode Available | 2 |
| Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights | Jun 15, 2024 | | CodeCode Available | 2 |
| PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting | Jun 14, 2024 | NeRFNovel View Synthesis | CodeCode Available | 2 |
| SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding | Jun 14, 2024 | Graph GenerationRelation | CodeCode Available | 2 |
| QQQ: Quality Quattuor-Bit Quantization for Large Language Models | Jun 14, 2024 | Quantization | CodeCode Available | 2 |
| GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion | Jun 14, 2024 | 3D GenerationGPU | CodeCode Available | 2 |
| SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages | Jun 14, 2024 | Diversity | CodeCode Available | 2 |
| Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning | Jun 14, 2024 | | CodeCode Available | 2 |
| EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models | Jun 14, 2024 | 3D Object Detection3D Reconstruction | CodeCode Available | 2 |