| UniVS: Unified and Universal Video Segmentation with Prompts as Queries | Feb 28, 2024 | DecoderReferring Expression Segmentation | CodeCode Available | 3 |
| Simple linear attention language models balance the recall-throughput tradeoff | Feb 28, 2024 | Language ModellingMamba | CodeCode Available | 3 |
| Training-Free Long-Context Scaling of Large Language Models | Feb 27, 2024 | 16k | CodeCode Available | 3 |
| TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation | Feb 27, 2024 | Protein Design | CodeCode Available | 3 |
| Explicit Interaction for Fusion-Based Place Recognition | Feb 27, 2024 | Autonomous Vehicles | CodeCode Available | 3 |
| VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | Feb 27, 2024 | Contrastive LearningMedical Image Analysis | CodeCode Available | 3 |
| SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition | Feb 27, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| ShapeLLM: Universal 3D Object Understanding for Embodied Interaction | Feb 27, 2024 | 3D geometry3D Object Captioning | CodeCode Available | 3 |
| Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction | Feb 27, 2024 | Autonomous Driving | CodeCode Available | 3 |
| DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning | Feb 27, 2024 | Code Generation | CodeCode Available | 3 |
| VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction | Feb 27, 2024 | NeRF | CodeCode Available | 3 |
| PreRoutGNN for Timing Prediction with Order Preserving Partition: Global Circuit Pre-training, Local Delay Learning and Attentional Cell Modeling | Feb 27, 2024 | Graph Embedding | CodeCode Available | 3 |
| Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning | Feb 26, 2024 | GPUMinecraft | CodeCode Available | 3 |
| TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis | Feb 26, 2024 | Anomaly DetectionImputation | CodeCode Available | 3 |
| A Survey on Data Selection for Language Models | Feb 26, 2024 | SurveyUnsupervised Pre-training | CodeCode Available | 3 |
| ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors | Feb 26, 2024 | | CodeCode Available | 3 |
| Why Transformers Need Adam: A Hessian Perspective | Feb 26, 2024 | | CodeCode Available | 3 |
| ChatMusician: Understanding and Generating Music Intrinsically with LLM | Feb 25, 2024 | MMLUText Generation | CodeCode Available | 3 |
| UrbanGPT: Spatio-Temporal Large Language Models | Feb 25, 2024 | 10-shot image generation | CodeCode Available | 3 |
| Exploring gene content with pangene graphs | Feb 25, 2024 | | CodeCode Available | 3 |
| Seamless Human Motion Composition with Blended Positional Encodings | Feb 23, 2024 | DenoisingMotion Generation | CodeCode Available | 3 |
| State Space Models for Event Cameras | Feb 23, 2024 | Event-based visionObject Detection | CodeCode Available | 3 |
| Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing | Feb 23, 2024 | LipreadingLip Reading | CodeCode Available | 3 |
| Genie: Generative Interactive Environments | Feb 23, 2024 | | CodeCode Available | 3 |
| Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding | Feb 22, 2024 | DiversityScene Understanding | CodeCode Available | 3 |
| IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus | Feb 22, 2024 | Zero-shot Generalization | CodeCode Available | 3 |
| Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot | Feb 22, 2024 | 3D Human Pose Estimation3D Human Reconstruction | CodeCode Available | 3 |
| OmniPred: Language Models as Universal Regressors | Feb 22, 2024 | Experimental Designregression | CodeCode Available | 3 |
| MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding | Feb 22, 2024 | Computational EfficiencyPrediction | CodeCode Available | 3 |
| Cleaner Pretraining Corpus Curation with Neural Web Scraping | Feb 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition | Feb 22, 2024 | Re-RankingVisual Place Recognition | CodeCode Available | 3 |
| Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping | Feb 21, 2024 | Decision MakingDecoder | CodeCode Available | 3 |
| Towards Building Multilingual Language Model for Medicine | Feb 21, 2024 | Domain AdaptationLanguage Modeling | CodeCode Available | 3 |
| LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens | Feb 21, 2024 | 8k | CodeCode Available | 3 |
| Bench: Extending Long Context Evaluation Beyond 100K Tokens | Feb 21, 2024 | | CodeCode Available | 3 |
| Visual Style Prompting with Swapping Self-Attention | Feb 20, 2024 | DenoisingImage Generation | CodeCode Available | 3 |
| Video ReCap: Recursive Captioning of Hour-Long Videos | Feb 20, 2024 | EgoSchemaVideo Captioning | CodeCode Available | 3 |
| TorchCP: A Python Library for Conformal Prediction | Feb 20, 2024 | Conformal PredictionDeep Learning | CodeCode Available | 3 |
| Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive | Feb 20, 2024 | | CodeCode Available | 3 |
| Codec-SUPERB: An In-Depth Analysis of Sound Codec Models | Feb 20, 2024 | | CodeCode Available | 3 |
| FiT: Flexible Vision Transformer for Diffusion Model | Feb 19, 2024 | Computational EfficiencyImage Cropping | CodeCode Available | 3 |
| A Chinese Dataset for Evaluating the Safeguards in Large Language Models | Feb 19, 2024 | | CodeCode Available | 3 |
| UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction | Feb 19, 2024 | Decision MakingManagement | CodeCode Available | 3 |
| DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation | Feb 19, 2024 | Image Generation | CodeCode Available | 3 |
| Language-Codec: Bridging Discrete Codec Representations and Speech Language Models | Feb 19, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 |
| Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding | Feb 19, 2024 | | CodeCode Available | 3 |
| ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning | Feb 19, 2024 | | CodeCode Available | 3 |
| GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations | Feb 19, 2024 | Card GamesLogical Reasoning | CodeCode Available | 3 |
| Major TOM: Expandable Datasets for Earth Observation | Feb 19, 2024 | Earth Observation | CodeCode Available | 3 |
| Query-Based Adversarial Prompt Generation | Feb 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |