| Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket | Jan 4, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model | Jan 4, 2024 | Combinatorial OptimizationLanguage Modeling | CodeCode Available | 3 |
| LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry | Jan 3, 2024 | Point TrackingVisual Odometry | CodeCode Available | 3 |
| CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | Jan 2, 2024 | | CodeCode Available | 3 |
| EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals | Jan 1, 2024 | EEGRepresentation Learning | CodeCode Available | 3 |
| LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning | Jan 1, 2024 | 3D dense captioningDense Captioning | CodeCode Available | 3 |
| Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods | Jan 1, 2024 | Image ManipulationImage Manipulation Localization | CodeCode Available | 3 |
| Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models | Jan 1, 2024 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| Inversion-Free Image Editing with Language-Guided Diffusion Models | Jan 1, 2024 | DenoisingImage Manipulation | CodeCode Available | 3 |
| Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | Jan 1, 2024 | Domain GeneralizationSemantic Segmentation | CodeCode Available | 3 |
| Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline | Jan 1, 2024 | Crowd Countingobject-detection | CodeCode Available | 3 |
| SEED-Bench: Benchmarking Multimodal Large Language Models | Jan 1, 2024 | BenchmarkingImage Generation | CodeCode Available | 3 |
| Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation | Jan 1, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 3 |
| Improving Text Embeddings with Large Language Models | Dec 31, 2023 | DecoderDiversity | CodeCode Available | 3 |
| EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling | Dec 31, 2023 | 3D Face AnimationDiversity | CodeCode Available | 3 |
| Fairness in Serving Large Language Models | Dec 31, 2023 | FairnessScheduling | CodeCode Available | 3 |
| Large Language Models for Generative Information Extraction: A Survey | Dec 29, 2023 | Survey | CodeCode Available | 3 |
| TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones | Dec 28, 2023 | Computational EfficiencyImage Captioning | CodeCode Available | 3 |
| MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | Dec 28, 2023 | AutoMLCPU | CodeCode Available | 3 |
| LangSplat: 3D Language Gaussian Splatting | Dec 26, 2023 | NeRFObject Localization | CodeCode Available | 3 |
| XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library | Dec 25, 2023 | CPUDeep Reinforcement Learning | CodeCode Available | 3 |
| SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | Dec 23, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation | Dec 23, 2023 | Emotion RecognitionSelf-Supervised Learning | CodeCode Available | 3 |
| DriveLM: Driving with Graph Visual Question Answering | Dec 21, 2023 | Autonomous DrivingQuestion Answering | CodeCode Available | 3 |
| Splatter Image: Ultra-Fast Single-View 3D Reconstruction | Dec 20, 2023 | 3D Object Reconstruction3D Reconstruction | CodeCode Available | 3 |
| Generative Multimodal Models are In-Context Learners | Dec 20, 2023 | In-Context LearningPersonalized Image Generation | CodeCode Available | 3 |
| Compact 3D Scene Representation via Self-Organizing Gaussian Grids | Dec 19, 2023 | 3DGS | CodeCode Available | 3 |
| pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction | Dec 19, 2023 | 3D ReconstructionGeneralizable Novel View Synthesis | CodeCode Available | 3 |
| Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint | Dec 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| DreamTalk: When Emotional Talking Head Generation Meets Diffusion Probabilistic Models | Dec 15, 2023 | DenoisingTalking Head Generation | CodeCode Available | 3 |
| SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery | Dec 15, 2023 | Contrastive LearningEarth Observation | CodeCode Available | 3 |
| Point Transformer V3: Simpler, Faster, Stronger | Dec 15, 2023 | 3D Semantic SegmentationLIDAR Semantic Segmentation | CodeCode Available | 3 |
| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion | Dec 12, 2023 | 3D Human Pose Estimation | CodeCode Available | 3 |
| Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution | Dec 11, 2023 | DecoderSuper-Resolution | CodeCode Available | 3 |
| EasyVolcap: Accelerating Neural Volumetric Video Research | Dec 11, 2023 | | CodeCode Available | 3 |
| EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM | Dec 11, 2023 | Decoder | CodeCode Available | 3 |
| 4M: Massively Multimodal Masked Modeling | Dec 11, 2023 | Decoder | CodeCode Available | 3 |
| Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models | Dec 11, 2023 | Chart UnderstandingDecoder | CodeCode Available | 3 |
| AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One | Dec 10, 2023 | AllBenchmarking | CodeCode Available | 3 |
| RepViT-SAM: Towards Real-Time Segmenting Anything | Dec 10, 2023 | | CodeCode Available | 3 |
| KwaiAgents: Generalized Information-seeking Agent System with Large Language Models | Dec 8, 2023 | | CodeCode Available | 3 |
| An LLM Compiler for Parallel Function Calling | Dec 7, 2023 | | CodeCode Available | 3 |
| PyThaiNLP: Thai Natural Language Processing in Python | Dec 7, 2023 | | CodeCode Available | 3 |
| Visual Geometry Grounded Deep Structure From Motion | Dec 7, 2023 | Point Tracking | CodeCode Available | 3 |
| Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification | Dec 6, 2023 | AllSpeaker Verification | CodeCode Available | 3 |
| Efficient Large Language Models: A Survey | Dec 6, 2023 | Natural Language UnderstandingSurvey | CodeCode Available | 3 |
| Physical Symbolic Optimization | Dec 6, 2023 | regressionreinforcement-learning | CodeCode Available | 3 |
| Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting | Dec 6, 2023 | Simultaneous Localization and Mapping | CodeCode Available | 3 |
| MatterGen: a generative model for inorganic materials design | Dec 6, 2023 | model | CodeCode Available | 3 |