SOTAVerified

Position

Papers

Showing 150 of 3684 papers

TitleStatusHype
Moonshine: Speech Recognition for Live Transcription and Voice CommandsCode9
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image ManifoldCode7
YaRN: Efficient Context Window Extension of Large Language ModelsCode6
Extending Context Window of Large Language Models via Positional InterpolationCode6
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language ModelsCode5
Cosmos World Foundation Model Platform for Physical AICode5
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt CompressionCode5
MIGC++: Advanced Multi-Instance Generation Controller for Image SynthesisCode4
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy PredictionCode4
KeyPoint Relative Position Encoding for Face RecognitionCode4
Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code GenerationCode4
Desiderata for next generation of ML model servingCode4
VideoRoPE: What Makes for Good Video Rotary Position Embedding?Code3
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context TrainingCode3
ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series TransformerCode3
Relation DETR: Exploring Explicit Position Relation Prior for Object DetectionCode3
Scaling Diffusion Transformers to 16 Billion ParametersCode3
Transformers Can Do Arithmetic with the Right EmbeddingsCode3
Rotary Position Embedding for Vision TransformerCode3
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image AnalysisCode3
Position: Graph Foundation Models are Already HereCode3
PETRv2: A Unified Framework for 3D Perception from Multi-Camera ImagesCode3
PETR: Position Embedding Transformation for Multi-View 3D Object DetectionCode3
RoFormer: Enhanced Transformer with Rotary Position EmbeddingCode3
Shifting AI Efficiency From Model-Centric to Data-Centric CompressionCode2
Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPsCode2
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT ImagesCode2
An Approach for Air Drawing Using Background Subtraction and Contour ExtractionCode2
FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable LocalizationCode2
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length GeneralizationCode2
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position EncodingCode2
Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View SynthesisCode2
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse AttentionCode2
1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024Code2
PCP-MAE: Learning to Predict Centers for Point Masked AutoencodersCode2
OPEN: Object-wise Position Embedding for Multi-view 3D Object DetectionCode2
PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud RegistrationCode2
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal TrainingCode2
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest TransformerCode2
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano PerformanceCode2
GLACE: Global Local Accelerated Coordinate EncodingCode2
GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian SplatsCode2
Position: Foundation Agents as the Paradigm Shift for Decision MakingCode2
CityLLaVA: Efficient Fine-Tuning for VLMs in City ScenarioCode2
Commonsense Prototype for Outdoor Unsupervised 3D Object DetectionCode2
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality LocalizationCode2
LongEmbed: Extending Embedding Models for Long Context RetrievalCode2
An End-to-End Structure with Novel Position Mechanism and Improved EMD for Stock ForecastingCode2
Counting-Stars: A Multi-evidence, Position-aware, and Scalable Benchmark for Evaluating Long-Context Large Language ModelsCode2
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented AgentsCode2
Show:102550
← PrevPage 1 of 74Next →

No leaderboard results yet.