| General Detection-based Text Line Recognition | Sep 25, 2024 | HTROptical Character Recognition (OCR) | CodeCode Available | 2 |
| Progressive Representation Learning for Real-Time UAV Tracking | Sep 25, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL | Sep 25, 2024 | Natural Language QueriesText to SQL | CodeCode Available | 2 |
| EEGUnity: Open-Source Tool in Facilitating Unified EEG Datasets Towards Large-Scale EEG Model | Sep 24, 2024 | EEGElectroencephalogram (EEG) | CodeCode Available | 2 |
| Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving | Sep 24, 2024 | Autonomous DrivingImitation Learning | CodeCode Available | 2 |
| TFG: Unified Training-Free Guidance for Diffusion Models | Sep 24, 2024 | | CodeCode Available | 2 |
| GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Sep 24, 2024 | 3D geometry3DGS | CodeCode Available | 2 |
| Low Latency Point Cloud Rendering with Learned Splatting | Sep 24, 2024 | | CodeCode Available | 2 |
| Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach | Sep 24, 2024 | Multi-Objective Reinforcement LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation | Sep 24, 2024 | DiversityInstance Segmentation | CodeCode Available | 2 |
| Self-Supervised Any-Point Tracking by Contrastive Random Walks | Sep 24, 2024 | Contrastive LearningData Augmentation | CodeCode Available | 2 |
| MaskBit: Embedding-free Image Generation via Bit Tokens | Sep 24, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 2 |
| DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection | Sep 24, 2024 | Depression DetectionMamba | CodeCode Available | 2 |
| Small Language Models: Survey, Measurements, and Insights | Sep 24, 2024 | BenchmarkingDecoder | CodeCode Available | 2 |
| LTNtorch: PyTorch Implementation of Logic Tensor Networks | Sep 24, 2024 | Binary ClassificationLogical Reasoning | CodeCode Available | 2 |
| HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | Sep 24, 2024 | Long-Context UnderstandingText Generation | CodeCode Available | 2 |
| MonoFormer: One Transformer for Both Diffusion and Autoregression | Sep 24, 2024 | Image GenerationText Generation | CodeCode Available | 2 |
| MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | Sep 24, 2024 | Code GenerationManagement | CodeCode Available | 2 |
| Inference-Friendly Models With MixAttention | Sep 23, 2024 | | CodeCode Available | 2 |
| Archon: An Architecture Search Framework for Inference-Time Techniques | Sep 23, 2024 | Hyperparameter OptimizationInstruction Following | CodeCode Available | 2 |
| OmniBench: Towards The Future of Universal Omni-Language Models | Sep 23, 2024 | Instruction Following | CodeCode Available | 2 |
| SocialCircle+: Learning the Angle-based Conditioned Interaction Representation for Pedestrian Trajectory Prediction | Sep 23, 2024 | counterfactualPedestrian Trajectory Prediction | CodeCode Available | 2 |
| Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots | Sep 23, 2024 | | CodeCode Available | 2 |
| Phantom of Latent for Large Language and Vision Models | Sep 23, 2024 | Visual Question Answering | CodeCode Available | 2 |
| Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities | Sep 23, 2024 | 3DGSNeRF | CodeCode Available | 2 |
| PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs | Sep 23, 2024 | | CodeCode Available | 2 |
| MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding | Sep 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models | Sep 23, 2024 | Robot Task PlanningTask Planning | CodeCode Available | 2 |
| A Survey on Multimodal Benchmarks: In the Era of Large AI Models | Sep 21, 2024 | BenchmarkingSurvey | CodeCode Available | 2 |
| R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models | Sep 21, 2024 | | CodeCode Available | 2 |
| Revisiting BPR: A Replicability Study of a Common Recommender System Baseline | Sep 21, 2024 | Collaborative FilteringRecommendation Systems | CodeCode Available | 2 |
| Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects | Sep 21, 2024 | | CodeCode Available | 2 |
| Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis | Sep 21, 2024 | Model EditingPrediction | CodeCode Available | 2 |
| Diabetica: Adapting Large Language Model to Enhance Multiple Medical Tasks in Diabetes Care and Management | Sep 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts | Sep 20, 2024 | Prompt Engineering | CodeCode Available | 2 |
| PyGRF: An improved Python Geographical Random Forest model and case studies in public health and natural disasters | Sep 20, 2024 | | CodeCode Available | 2 |
| LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement | Sep 20, 2024 | Speech Enhancement | CodeCode Available | 2 |
| Towards Zero-shot Point Cloud Anomaly Detection: A Multi-View Projection Framework | Sep 20, 2024 | Anomaly DetectionSpecificity | CodeCode Available | 2 |
| Longitudinal Segmentation of MS Lesions via Temporal Difference Weighting | Sep 20, 2024 | Inductive BiasLesion Detection | CodeCode Available | 2 |
| V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians | Sep 20, 2024 | 3DGS | CodeCode Available | 2 |
| CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction | Sep 20, 2024 | Depth EstimationPrediction | CodeCode Available | 2 |
| PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images | Sep 20, 2024 | Image SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Occupancy-Based Dual Contouring | Sep 20, 2024 | 3D ReconstructionGPU | CodeCode Available | 2 |
| From Cognition to Precognition: A Future-Aware Framework for Social Navigation | Sep 20, 2024 | Future predictionNavigate | CodeCode Available | 2 |
| Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework | Sep 19, 2024 | Autonomous VehiclesDecision Making | CodeCode Available | 2 |
| Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization | Sep 19, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation | Sep 19, 2024 | Vision-Language-Action | CodeCode Available | 2 |
| AutoVerus: Automated Proof Generation for Rust Code | Sep 19, 2024 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| HSIGene: A Foundation Model For Hyperspectral Image Generation | Sep 19, 2024 | Data AugmentationDenoising | CodeCode Available | 2 |
| GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling | Sep 19, 2024 | Novel View Synthesis | CodeCode Available | 2 |