| Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation | Jul 11, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| β-DPO: Direct Preference Optimization with Dynamic β | Jul 11, 2024 | Informativeness | CodeCode Available | 2 |
| An Economic Framework for 6-DoF Grasp Detection | Jul 11, 2024 | Robotic Grasping | CodeCode Available | 2 |
| LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval | Jul 11, 2024 | Image RetrievalImage to text | CodeCode Available | 2 |
| WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds | Jul 11, 2024 | Retrieval | CodeCode Available | 2 |
| DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception | Jul 11, 2024 | Visual Question Answering | CodeCode Available | 2 |
| Gradient Boosting Reinforcement Learning | Jul 11, 2024 | GPUreinforcement-learning | CodeCode Available | 2 |
| SALT: Introducing a Framework for Hierarchical Segmentations in Medical Imaging using Softmax for Arbitrary Label Trees | Jul 11, 2024 | Diagnostic | CodeCode Available | 2 |
| MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos | Jul 11, 2024 | NeRF | CodeCode Available | 2 |
| Adaptive Parametric Activation | Jul 11, 2024 | imbalanced classificationInstance Segmentation | CodeCode Available | 2 |
| WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving | Jul 11, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 2 |
| AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization | Jul 11, 2024 | Contrastive LearningTransfer Learning | CodeCode Available | 2 |
| Transformer Circuit Faithfulness Metrics are not Robust | Jul 11, 2024 | | CodeCode Available | 2 |
| Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data | Jul 11, 2024 | Autonomous NavigationPrediction | CodeCode Available | 2 |
| Exploiting Scale-Variant Attention for Segmenting Small Medical Objects | Jul 10, 2024 | Cell SegmentationMRI segmentation | CodeCode Available | 2 |
| MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery | Jul 10, 2024 | Vulnerability Detection | CodeCode Available | 2 |
| TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data | Jul 10, 2024 | Contrastive Learningmultimodal interaction | CodeCode Available | 2 |
| Coherent and Multi-modality Image Inpainting via Latent Space Optimization | Jul 10, 2024 | DenoisingImage Inpainting | CodeCode Available | 2 |
| SaMoye: Zero-shot Singing Voice Conversion Model Based on Feature Disentanglement and Enhancement | Jul 10, 2024 | DisentanglementVoice Conversion | CodeCode Available | 2 |
| IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection | Jul 10, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| Density Estimation via Binless Multidimensional Integration | Jul 10, 2024 | Density Estimation | CodeCode Available | 2 |
| InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior | Jul 10, 2024 | BenchmarkingDecoder | CodeCode Available | 2 |
| Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift | Jul 10, 2024 | Change DetectionDisaster Response | CodeCode Available | 2 |
| ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting | Jul 10, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| GLBench: A Comprehensive Benchmark for Graph with Large Language Models | Jul 10, 2024 | | CodeCode Available | 2 |
| PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer | Jul 10, 2024 | DecoderHandwritten Mathmatical Expression Recognition | CodeCode Available | 2 |
| Generative Image as Action Models | Jul 10, 2024 | Image GenerationRobot Manipulation | CodeCode Available | 2 |
| MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis | Jul 10, 2024 | GPUImage Generation | CodeCode Available | 2 |
| LitSearch: A Retrieval Benchmark for Scientific Literature Search | Jul 10, 2024 | ArticlesReranking | CodeCode Available | 2 |
| Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey | Jul 10, 2024 | Adversarial AttackImage Generation | CodeCode Available | 2 |
| Exploring the Causality of End-to-End Autonomous Driving | Jul 9, 2024 | Autonomous Drivingcounterfactual | CodeCode Available | 2 |
| Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning | Jul 9, 2024 | Image GenerationSentence | CodeCode Available | 2 |
| Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis | Jul 9, 2024 | | CodeCode Available | 2 |
| ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction | Jul 9, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | Jul 9, 2024 | ArticlesHallucination | CodeCode Available | 2 |
| LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration | Jul 9, 2024 | 3D ReconstructionAutonomous Navigation | CodeCode Available | 2 |
| Decomposition Betters Tracking Everything Everywhere | Jul 9, 2024 | Motion EstimationPoint Tracking | CodeCode Available | 2 |
| RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models | Jul 9, 2024 | DecoderScheduling | CodeCode Available | 2 |
| Vision language models are blind: Failing to translate detailed visual features into words | Jul 9, 2024 | | CodeCode Available | 2 |
| FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making | Jul 9, 2024 | Decision Making | CodeCode Available | 2 |
| HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance | Jul 9, 2024 | BenchmarkingConditional Image Generation | CodeCode Available | 2 |
| Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | Jul 9, 2024 | Chart UnderstandingLanguage Modeling | CodeCode Available | 2 |
| FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | Jul 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention | Jul 9, 2024 | Autonomous DrivingDecoder | CodeCode Available | 2 |
| Hyperion - A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM | Jul 9, 2024 | Simultaneous Localization and Mapping | CodeCode Available | 2 |
| Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems | Jul 9, 2024 | | CodeCode Available | 2 |
| Graph Neural Networks and Deep Reinforcement Learning Based Resource Allocation for V2X Communications | Jul 9, 2024 | Deep Reinforcement Learning | CodeCode Available | 2 |
| ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement | Jul 9, 2024 | AttributeDisentanglement | CodeCode Available | 2 |
| MEEG and AT-DGNN: Improving EEG Emotion Recognition with Music Introducing and Graph-based Learning | Jul 8, 2024 | Arousal EstimationEEG | CodeCode Available | 2 |
| InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation | Jul 8, 2024 | | CodeCode Available | 2 |