| Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models | Jun 17, 2024 | | CodeCode Available | 2 |
| Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging | Jun 17, 2024 | | CodeCode Available | 2 |
| ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO | Jun 17, 2024 | Language ModellingQuestion Answering | CodeCode Available | 2 |
| MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | Jun 17, 2024 | Visual Question Answering | CodeCode Available | 2 |
| GUICourse: From General Vision Language Models to Versatile GUI Agents | Jun 17, 2024 | Natural Language Visual GroundingOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| Understanding Multi-Granularity for Open-Vocabulary Part Segmentation | Jun 17, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% | Jun 17, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| STAR: Scale-wise Text-to-image generation via Auto-Regressive representations | Jun 16, 2024 | DiversityImage Generation | CodeCode Available | 2 |
| Ontology Embedding: A Survey of Methods, Applications and Resources | Jun 16, 2024 | Logical ReasoningOntology Embedding | CodeCode Available | 2 |
| ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models | Jun 16, 2024 | Video Generation | CodeCode Available | 2 |
| RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models | Jun 16, 2024 | Adversarial AttackBenchmarking | CodeCode Available | 2 |
| Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov Arnold Networks | Jun 16, 2024 | FormKolmogorov-Arnold Networks | CodeCode Available | 2 |
| Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly | Jun 15, 2024 | | CodeCode Available | 2 |
| Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection | Jun 15, 2024 | 3D Object DetectionComputational Efficiency | CodeCode Available | 2 |
| CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation | Jun 15, 2024 | In-Context LearningText Generation | CodeCode Available | 2 |
| Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights | Jun 15, 2024 | | CodeCode Available | 2 |
| CrossFuse: A Novel Cross Attention Mechanism based Infrared and Visible Image Fusion Approach | Jun 15, 2024 | DecoderInfrared And Visible Image Fusion | CodeCode Available | 2 |
| CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models | Jun 14, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 2 |
| GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion | Jun 14, 2024 | 3D GenerationGPU | CodeCode Available | 2 |
| Make It Count: Text-to-Image Generation with an Accurate Number of Objects | Jun 14, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning | Jun 14, 2024 | | CodeCode Available | 2 |
| EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models | Jun 14, 2024 | 3D Object Detection3D Reconstruction | CodeCode Available | 2 |
| SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding | Jun 14, 2024 | Graph GenerationRelation | CodeCode Available | 2 |
| Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection | Jun 14, 2024 | Decoderspeech-recognition | CodeCode Available | 2 |