| Text-Guided Synthesis of Eulerian Cinemagraphs | Jul 6, 2023 | Image Animation | CodeCode Available | 2 |
| Lost in the Middle: How Language Models Use Long Contexts | Jul 6, 2023 | Language ModellingPosition | CodeCode Available | 2 |
| FITS: Modeling Time Series with 10k Parameters | Jul 6, 2023 | Anomaly DetectionTime Series | CodeCode Available | 2 |
| DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models | Jul 5, 2023 | Object | CodeCode Available | 2 |
| Building Cooperative Embodied Agents Modularly with Large Language Models | Jul 5, 2023 | Text Generation | CodeCode Available | 2 |
| NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023 | Jul 5, 2023 | Action LocalizationMoment Queries | CodeCode Available | 2 |
| Evaluating AI systems under uncertain ground truth: a case study in dermatology | Jul 5, 2023 | DiagnosticMedical Diagnosis | CodeCode Available | 2 |
| tsdownsample: high-performance time series downsampling for scalable visualization | Jul 5, 2023 | CPUTime Series | CodeCode Available | 2 |
| EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models | Jul 5, 2023 | | CodeCode Available | 2 |
| What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? | Jul 5, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation | Jul 4, 2023 | 3D Shape GenerationDenoising | CodeCode Available | 2 |
| Empirical Sample Complexity of Neural Network Mixed State Reconstruction | Jul 4, 2023 | | CodeCode Available | 2 |
| Spike-driven Transformer | Jul 4, 2023 | | CodeCode Available | 2 |
| ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling | Jul 4, 2023 | BenchmarkingWeather Forecasting | CodeCode Available | 2 |
| FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation | Jul 4, 2023 | Autonomous DrivingPrediction Of Occupancy Grid Maps | CodeCode Available | 2 |
| SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis | Jul 4, 2023 | Image Generation | CodeCode Available | 2 |
| Temporal Graph Benchmark for Machine Learning on Temporal Graphs | Jul 3, 2023 | Node Property PredictionProperty Prediction | CodeCode Available | 2 |
| SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions | Jul 3, 2023 | | CodeCode Available | 2 |
| Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset | Jul 3, 2023 | Human Mesh RecoveryMotion Generation | CodeCode Available | 2 |
| MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion | Jul 3, 2023 | Image Generation | CodeCode Available | 2 |
| Hierarchical Open-vocabulary Universal Image Segmentation | Jul 3, 2023 | Image ComprehensionImage Segmentation | CodeCode Available | 2 |
| JourneyDB: A Benchmark for Generative Image Understanding | Jul 3, 2023 | Image CaptioningImage Comprehension | CodeCode Available | 2 |
| MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval | Jul 2, 2023 | Biomedical Information RetrievalContrastive Learning | CodeCode Available | 2 |
| Numerical Association Rule Mining: A Systematic Literature Review | Jul 2, 2023 | ArticlesSystematic Literature Review | CodeCode Available | 2 |
| BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained Transformer | Jul 1, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 |
| Provable Robust Watermarking for AI-Generated Text | Jun 30, 2023 | Language Modelling | CodeCode Available | 2 |
| MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying | Jun 30, 2023 | Autonomous DrivingDecoder | CodeCode Available | 2 |
| Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation | Jun 29, 2023 | 3D Shape GenerationDecoder | CodeCode Available | 2 |
| MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset | Jun 29, 2023 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| DreamDiffusion: Generating High-Quality Images from Brain EEG Signals | Jun 29, 2023 | EEGElectroencephalogram (EEG) | CodeCode Available | 2 |
| Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train | Jun 29, 2023 | SegmentationTransfer Learning | CodeCode Available | 2 |
| Towards Zero-Shot Scale-Aware Monocular Depth Estimation | Jun 29, 2023 | DecoderDepth Estimation | CodeCode Available | 2 |
| Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models | Jun 29, 2023 | Audio Synthesis | CodeCode Available | 2 |
| BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion | Jun 29, 2023 | Synthetic Data Generation | CodeCode Available | 2 |
| SkiROS2: A skill-based Robot Control Platform for ROS | Jun 29, 2023 | SchedulingTask Planning | CodeCode Available | 2 |
| LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding | Jun 29, 2023 | 16kImage Captioning | CodeCode Available | 2 |
| Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio | Jun 28, 2023 | Language ModellingText Generation | CodeCode Available | 2 |
| RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model | Jun 28, 2023 | Image SegmentationInstance Segmentation | CodeCode Available | 2 |
| Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language | Jun 28, 2023 | DescriptiveLanguage Modeling | CodeCode Available | 2 |
| Towards Open Vocabulary Learning: A Survey | Jun 28, 2023 | Open Set LearningOut-of-Distribution Detection | CodeCode Available | 2 |
| BayesFlow: Amortized Bayesian Workflows With Neural Networks | Jun 28, 2023 | Bayesian InferenceData Compression | CodeCode Available | 2 |
| cuSLINK: Single-linkage Agglomerative Clustering on the GPU | Jun 28, 2023 | ClusteringGPU | CodeCode Available | 2 |
| MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning | Jun 28, 2023 | Deep LearningMultimodal Deep Learning | CodeCode Available | 2 |
| PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment | Jun 27, 2023 | Camera Pose EstimationPose Estimation | CodeCode Available | 2 |
| Detector-Free Structure from Motion | Jun 27, 2023 | Keypoint Detection | CodeCode Available | 2 |
| Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic | Jun 27, 2023 | Image CaptioningReferring Expression Segmentation | CodeCode Available | 2 |
| CellViT: Vision Transformers for Precise Cell Segmentation and Classification | Jun 27, 2023 | Cell DetectionCell Segmentation | CodeCode Available | 2 |
| HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution | Jun 27, 2023 | 4kIn-Context Learning | CodeCode Available | 2 |
| Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System | Jun 27, 2023 | | CodeCode Available | 2 |