| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 |
| SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation | Dec 28, 2023 | Real-Time Semantic SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Any-point Trajectory Modeling for Policy Learning | Dec 28, 2023 | Trajectory ModelingTransfer Learning | CodeCode Available | 2 |
| Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels | Dec 28, 2023 | Aesthetics Quality AssessmentImage Quality Assessment | CodeCode Available | 2 |
| Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis | Dec 28, 2023 | 8kFeature Splatting | CodeCode Available | 2 |
| Learning Vision from Models Rivals Learning Vision from Data | Dec 28, 2023 | Contrastive LearningImage Captioning | CodeCode Available | 2 |
| I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models | Dec 27, 2023 | Video Generation | CodeCode Available | 2 |
| Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss | Dec 27, 2023 | | CodeCode Available | 2 |
| SVGDreamer: Text Guided SVG Generation with Diffusion Model | Dec 27, 2023 | DiversityVector Graphics | CodeCode Available | 2 |
| State-of-the-Art in Nudity Classification: A Comparative Analysis | Dec 26, 2023 | Classificationimage-classification | CodeCode Available | 2 |
| Semantic Guidance Tuning for Text-To-Image Diffusion Models | Dec 26, 2023 | Zero-shot Generalization | CodeCode Available | 2 |
| LLMLight: Large Language Models as Traffic Signal Control Agents | Dec 26, 2023 | Decision MakingManagement | CodeCode Available | 2 |
| Inter-X: Towards Versatile Human-Human Interaction Analysis | Dec 26, 2023 | Motion Synthesis | CodeCode Available | 2 |
| LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving | Dec 26, 2023 | Autonomous Driving | CodeCode Available | 2 |
| DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision | Dec 26, 2023 | Deep LearningNeRF | CodeCode Available | 2 |
| LeanVec: Searching vectors faster by making them fit | Dec 26, 2023 | Cross-Modal RetrievalDimensionality Reduction | CodeCode Available | 2 |
| Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models | Dec 26, 2023 | DecoderReranking | CodeCode Available | 2 |
| Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 | Dec 26, 2023 | All | CodeCode Available | 2 |
| EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI | Dec 26, 2023 | Scene Understanding | CodeCode Available | 2 |
| What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning | Dec 25, 2023 | | CodeCode Available | 2 |
| UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces | Dec 25, 2023 | Image SegmentationObject | CodeCode Available | 2 |
| YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction | Dec 24, 2023 | UIE | CodeCode Available | 2 |
| Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference | Dec 23, 2023 | GPUHigh-Level Synthesis | CodeCode Available | 2 |
| Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation | Dec 23, 2023 | DecoderImage Segmentation | CodeCode Available | 2 |
| Prototype-based Cross-Modal Object Tracking | Dec 22, 2023 | ObjectObject Tracking | CodeCode Available | 2 |
| Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning | Dec 22, 2023 | Instruction FollowingMixture-of-Experts | CodeCode Available | 2 |
| Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset | Dec 22, 2023 | Object TrackingVisual Tracking | CodeCode Available | 2 |
| TACO: Topics in Algorithmic COde generation dataset | Dec 22, 2023 | Code Generation | CodeCode Available | 2 |
| T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step | Dec 21, 2023 | Instruction FollowingRetrieval | CodeCode Available | 2 |
| V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs | Dec 21, 2023 | Visual Question AnsweringWorld Knowledge | CodeCode Available | 2 |
| PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models | Dec 21, 2023 | Image Animation | CodeCode Available | 2 |
| The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction | Dec 21, 2023 | | CodeCode Available | 2 |
| VCoder: Versatile Vision Encoders for Multimodal Large Language Models | Dec 21, 2023 | Image CaptioningImage Generation | CodeCode Available | 2 |
| TinySAM: Pushing the Envelope for Efficient Segment Anything Model | Dec 21, 2023 | Knowledge DistillationQuantization | CodeCode Available | 2 |
| LingoQA: Visual Question Answering for Autonomous Driving | Dec 21, 2023 | Autonomous DrivingDecision Making | CodeCode Available | 2 |
| Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models | Dec 21, 2023 | 2k | CodeCode Available | 2 |
| HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | Dec 21, 2023 | 2kImage Inpainting | CodeCode Available | 2 |
| AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | Dec 20, 2023 | Code GenerationHumanEval | CodeCode Available | 2 |
| Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting | Dec 20, 2023 | 3D GenerationImage Generation | CodeCode Available | 2 |
| OpenRL: A Unified Reinforcement Learning Framework | Dec 20, 2023 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 |
| Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation | Dec 20, 2023 | Robot ManipulationZero-shot Generalization | CodeCode Available | 2 |
| Machine Mindset: An MBTI Exploration of Large Language Models | Dec 20, 2023 | Large Language ModelPersonality Alignment | CodeCode Available | 2 |
| SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing | Dec 20, 2023 | AttributeCross-Modal Retrieval | CodeCode Available | 2 |
| Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator | Dec 20, 2023 | Data Augmentationobject-detection | CodeCode Available | 2 |
| Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy | Dec 20, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process | Dec 19, 2023 | DenoisingDichotomous Image Segmentation | CodeCode Available | 2 |
| XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX | Dec 19, 2023 | DiversityGPU | CodeCode Available | 2 |
| CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation | Dec 19, 2023 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models | Dec 19, 2023 | DenoisingNeural Architecture Search | CodeCode Available | 2 |
| Intrinsic Image Diffusion for Indoor Single-view Material Estimation | Dec 19, 2023 | | CodeCode Available | 2 |