SOTAVerified

2k

Papers

Showing 150 of 288 papers

TitleStatusHype
Long-context LLMs Struggle with Long In-context LearningCode5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and VideosCode5
MovieChat+: Question-aware Sparse Memory for Long Video Question AnsweringCode4
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object DetectionCode4
Scaling Granite Code Models to 128K ContextCode4
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D AssetsCode4
Highly Accurate Dichotomous Image SegmentationCode4
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View SynthesisCode3
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality DataCode3
CAMixerSR: Only Details Need More "Attention"Code3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K ResolutionCode3
MaskGWM: A Generalizable Driving World Model with Video Mask ReconstructionCode3
AIR-Bench: Benchmarking Large Audio-Language Models via Generative ComprehensionCode2
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View SynthesisCode2
HHAvatar: Gaussian Head Avatar with Dynamic HairsCode2
Linear Attention Sequence ParallelismCode2
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid DatasetCode2
Towards Metrical Reconstruction of Human FacesCode2
Hyena Hierarchy: Towards Larger Convolutional Language ModelsCode2
FastVAR: Linear Visual Autoregressive Modeling via Cached Token PruningCode2
XGen-7B Technical ReportCode2
Elevating Flow-Guided Video Inpainting with Reference GenerationCode2
Ultra-Resolution Adaptation with EaseCode2
360MonoDepth: High-Resolution 360deg Monocular Depth EstimationCode2
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual ScenesCode2
Any-resolution Training for High-resolution Image SynthesisCode2
VFIMamba: Video Frame Interpolation with State Space ModelsCode2
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head AvatarsCode2
STICKERCONV: Generating Multimodal Empathetic Responses from ScratchCode2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group QuantizationCode2
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion ModelsCode2
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion ModelsCode2
Task Me AnythingCode2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group QuantizationCode2
High-fidelity 3D Human Digitization from Single 2K Resolution ImagesCode2
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise TrainingCode2
Gated Linear Attention Transformers with Hardware-Efficient TrainingCode1
MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D ScenesCode1
360MonoDepth: High-Resolution 360° Monocular Depth EstimationCode1
HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer AccelerationCode1
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language ModelsCode1
Meticulous Object SegmentationCode1
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data CharacteristicCode1
Efficient and Generic Point Model for Lossless Point Cloud Attribute CompressionCode1
Efficient Scale-Invariant Generator with Column-Row Entangled Pixel SynthesisCode1
CascadeV: An Implementation of Wurstchen Architecture for Video GenerationCode1
Double Domain Guided Real-Time Low-Light Image Enhancement for Ultra-High-Definition Transportation SurveillanceCode1
Dual Adversarial Domain AdaptationCode1
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language ModelsCode1
End-to-End Speech Recognition from Federated Acoustic ModelsCode1
Show:102550
← PrevPage 1 of 6Next →

No leaderboard results yet.