| WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models | Aug 21, 2023 | | CodeCode Available | 2 |
| SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression | Aug 21, 2023 | Decoderregression | CodeCode Available | 2 |
| Giraffe: Adventures in Expanding Context Lengths in LLMs | Aug 21, 2023 | 16k4k | CodeCode Available | 2 |
| SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding | Aug 21, 2023 | Entity TypingEvent Extraction | CodeCode Available | 2 |
| Texture Generation on 3D Meshes with Point-UV Diffusion | Aug 21, 2023 | DenoisingTexture Synthesis | CodeCode Available | 2 |
| Turning a CLIP Model into a Scene Text Spotter | Aug 21, 2023 | object-detectionObject Detection | CodeCode Available | 2 |
| STAEformer: Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting | Aug 21, 2023 | Time SeriesTraffic Prediction | CodeCode Available | 2 |
| Towards Real-World Visual Tracking with Temporal Contexts | Aug 20, 2023 | Visual Tracking | CodeCode Available | 2 |
| ExpeL: LLM Agents Are Experiential Learners | Aug 20, 2023 | Decision MakingTransfer Learning | CodeCode Available | 2 |
| Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders | Aug 19, 2023 | Inductive BiasMotion Forecasting | CodeCode Available | 2 |
| BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions | Aug 19, 2023 | MMEOptical Character Recognition (OCR) | CodeCode Available | 2 |
| DiffusionTrack: Diffusion Model For Multi-Object Tracking | Aug 19, 2023 | Denoisingmodel | CodeCode Available | 2 |
| FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models | Aug 19, 2023 | Multiple-choice | CodeCode Available | 2 |
| SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding | Aug 18, 2023 | | CodeCode Available | 2 |
| Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey | Aug 18, 2023 | DeblurringImage Restoration | CodeCode Available | 2 |
| Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization | Aug 18, 2023 | | CodeCode Available | 2 |
| SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos | Aug 18, 2023 | 3D Object DetectionObject | CodeCode Available | 2 |
| LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis | Aug 18, 2023 | Facial Expression RecognitionKnowledge Distillation | CodeCode Available | 2 |
| Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement | Aug 17, 2023 | Bandwidth ExtensionDecoder | CodeCode Available | 2 |
| Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes | Aug 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models | Aug 17, 2023 | Decision MakingHallucination | CodeCode Available | 2 |
| CMB: A Comprehensive Medical Benchmark in Chinese | Aug 17, 2023 | | CodeCode Available | 2 |
| DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local Feature Matching | Aug 16, 2023 | 3D ReconstructionBinary Classification | CodeCode Available | 2 |
| DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory | Aug 16, 2023 | Trajectory ModelingVideo Generation | CodeCode Available | 2 |
| MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions | Aug 16, 2023 | Motion Expressions Guided Video SegmentationObject | CodeCode Available | 2 |
| TeCH: Text-guided Reconstruction of Lifelike Clothed Humans | Aug 16, 2023 | DescriptiveQuestion Answering | CodeCode Available | 2 |
| Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | Aug 15, 2023 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection | Aug 15, 2023 | Multispectral Object Detectionobject-detection | CodeCode Available | 2 |
| UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation | Aug 15, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs | Aug 14, 2023 | Blind Face Restoration | CodeCode Available | 2 |
| ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | Aug 14, 2023 | Text Generation | CodeCode Available | 2 |
| S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields | Aug 14, 2023 | NeRFNovel View Synthesis | CodeCode Available | 2 |
| Global Features are All You Need for Image Retrieval and Reranking | Aug 14, 2023 | AllImage Retrieval | CodeCode Available | 2 |
| Platypus: Quick, Cheap, and Powerful Refinement of LLMs | Aug 14, 2023 | GPU | CodeCode Available | 2 |
| Bayesian Flow Networks | Aug 14, 2023 | Bayesian InferenceData Compression | CodeCode Available | 2 |
| Machine Unlearning: Solutions and Challenges | Aug 14, 2023 | Machine Unlearning | CodeCode Available | 2 |
| Large Language Models for Information Retrieval: A Survey | Aug 14, 2023 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| The Sound Demixing Challenge 2023 x2013 Music Demixing Track | Aug 14, 2023 | Music Source Separation | CodeCode Available | 2 |
| Language is All a Graph Needs | Aug 14, 2023 | AllGraph Learning | CodeCode Available | 2 |
| EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce | Aug 14, 2023 | DiversityInstruction Following | CodeCode Available | 2 |
| #InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models | Aug 14, 2023 | DiversityInstruction Following | CodeCode Available | 2 |
| AerialVLN: Vision-and-Language Navigation for UAVs | Aug 13, 2023 | cross-modal alignmentNavigate | CodeCode Available | 2 |
| Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks | Aug 13, 2023 | Graph Representation LearningLink Prediction | CodeCode Available | 2 |
| A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations | Aug 13, 2023 | Adversarial RobustnessNetwork Pruning | CodeCode Available | 2 |
| GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher | Aug 12, 2023 | EthicsRed Teaming | CodeCode Available | 2 |
| Tiny and Efficient Model for the Edge Detection Generalization | Aug 12, 2023 | Boundary DetectionContour Detection | CodeCode Available | 2 |
| Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow | Aug 11, 2023 | DenoisingImage Generation | CodeCode Available | 2 |
| BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | Aug 11, 2023 | BenchmarkingDecision Making | CodeCode Available | 2 |
| Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion | Aug 11, 2023 | Voice Conversion | CodeCode Available | 2 |
| DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models | Aug 11, 2023 | Dataset GenerationDecoder | CodeCode Available | 2 |