| End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Nov 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Nov 8, 2024 | Task PlanningZero-shot Generalization | CodeCode Available | 2 |
| DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions | Nov 8, 2024 | Pose Estimation | CodeCode Available | 2 |
| LLM-PySC2: Starcraft II learning environment for Large Language Models | Nov 8, 2024 | Decision MakingLanguage Modelling | CodeCode Available | 2 |
| AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment | Nov 7, 2024 | Code Generation | CodeCode Available | 2 |
| Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information Retrieval | Nov 7, 2024 | Information RetrievalRe-Ranking | CodeCode Available | 2 |
| Scaling Laws for Precision | Nov 7, 2024 | Quantization | CodeCode Available | 2 |
| Dialectal Coverage And Generalization in Arabic Speech Recognition | Nov 7, 2024 | Arabic Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| Improved Multi-Task Brain Tumour Segmentation with Synthetic Data Augmentation | Nov 7, 2024 | Data AugmentationSynthetic Data Generation | CodeCode Available | 2 |
| PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training | Nov 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Brain Tumour Removing and Missing Modality Generation using 3D WDM | Nov 7, 2024 | GPUPrediction | CodeCode Available | 2 |
| Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives | Nov 7, 2024 | Large Language Model | CodeCode Available | 2 |
| HourVideo: 1-Hour Video-Language Understanding | Nov 7, 2024 | Benchmarkingcounterfactual | CodeCode Available | 2 |
| AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making | Nov 6, 2024 | Decision MakingDiversity | CodeCode Available | 2 |
| Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis | Nov 6, 2024 | 3DGSNeRF | CodeCode Available | 2 |
| 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement | Nov 6, 2024 | 3DGSChange Detection | CodeCode Available | 2 |
| Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding | Nov 6, 2024 | ARCGSM8K | CodeCode Available | 2 |
| Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation? | Nov 6, 2024 | | CodeCode Available | 2 |
| StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding | Nov 6, 2024 | Image ComprehensionStreaming video understanding | CodeCode Available | 2 |
| VQA^2: Visual Question Answering for Video Quality Assessment | Nov 6, 2024 | Question AnsweringVideo Quality Assessment | CodeCode Available | 2 |
| GIS Copilot: Towards an Autonomous GIS Agent for Spatial Analysis | Nov 5, 2024 | Code Generation | CodeCode Available | 2 |
| FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models | Nov 5, 2024 | | CodeCode Available | 2 |
| Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping | Nov 5, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization | Nov 5, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis | Nov 4, 2024 | Contrastive LearningDiversity | CodeCode Available | 2 |
| CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments | Nov 4, 2024 | | CodeCode Available | 2 |
| Attacking Vision-Language Computer Agents via Pop-ups | Nov 4, 2024 | | CodeCode Available | 2 |
| DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Nov 4, 2024 | GPURobot Manipulation | CodeCode Available | 2 |
| EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector | Nov 4, 2024 | DecoderEmotional Speech Synthesis | CodeCode Available | 2 |
| RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | Nov 4, 2024 | Answer GenerationGPU | CodeCode Available | 2 |
| PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Nov 4, 2024 | Caption GenerationMultiple-choice | CodeCode Available | 2 |
| Adaptive Length Image Tokenization via Recurrent Allocation | Nov 4, 2024 | Decoder | CodeCode Available | 2 |
| Combining Induction and Transduction for Abstract Reasoning | Nov 4, 2024 | ARCProgram Synthesis | CodeCode Available | 2 |
| INQUIRE: A Natural World Text-to-Image Retrieval Benchmark | Nov 4, 2024 | Image RetrievalReranking | CodeCode Available | 2 |
| Foundations and Recent Trends in Multimodal Mobile Agents: A Survey | Nov 4, 2024 | multimodal interactionSurvey | CodeCode Available | 2 |
| Training on test proteins improves fitness, structure, and function prediction | Nov 4, 2024 | PredictionProtein Structure Prediction | CodeCode Available | 2 |
| Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation | Nov 4, 2024 | Earth ObservationObject | CodeCode Available | 2 |
| Real-Time Polygonal Semantic Mapping for Humanoid Robot Stair Climbing | Nov 4, 2024 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Mapping Global Floods with 10 Years of Satellite Radar Data | Nov 3, 2024 | Disaster Response | CodeCode Available | 2 |
| GarmentLab: A Unified Simulation and Benchmark for Garment Manipulation | Nov 2, 2024 | Imitation Learning | CodeCode Available | 2 |
| Unlocking the Archives: Using Large Language Models to Transcribe Handwritten Historical Documents | Nov 2, 2024 | Handwritten Text RecognitionHTR | CodeCode Available | 2 |
| X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios | Nov 2, 2024 | Denoising | CodeCode Available | 2 |
| Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization | Nov 1, 2024 | Computational EfficiencyIn-Context Learning | CodeCode Available | 2 |
| On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR | Nov 1, 2024 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 2 |
| A Survey of Financial AI: Architectures, Advances and Open Challenges | Nov 1, 2024 | Decision MakingPortfolio Optimization | CodeCode Available | 2 |
| SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models | Nov 1, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Communication Learning in Multi-Agent Systems from Graph Modeling Perspective | Nov 1, 2024 | | CodeCode Available | 2 |
| APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs | Oct 31, 2024 | | CodeCode Available | 2 |
| What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective | Oct 31, 2024 | | CodeCode Available | 2 |
| On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection | Oct 31, 2024 | Video Forensics | CodeCode Available | 2 |