DeepSeek DSpark Boosts V4 Inference Speed by Up to 85% via Speculative Sampling

2026-06-27 22:01

Woofun AI reports that DeepSeek, in collaboration with Peking University, has released the DSpark speculative token sampling acceleration framework and open-sourced the DeepSpec codebase. DSpark is now deployed in DeepSeek-V4 online services, enhancing single-user generation speed by 60% to 85% for the Flash version and by 57% to 78% for the Pro version, all while ensuring lossless output.

The framework addresses previous throughput bottlenecks caused by low acceptance rates in parallel draft models and computational waste in multi-token validation. By utilizing DFlash for parallel main spine network generation and integrating a lightweight Markov header with confidence prediction, DSpark enables zero-cost scheduling and prevents large models from validating high-risk tail typos. The accompanying DeepSpec toolchain supports models like Qwen3 and Gemma, providing a complete Python workflow for local deployment.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.

WOOFUN.AI — Your Smart Crypto Assistant. Reconstructing the crypto experience with smart technology. We simplify the complex, break professional barriers, and enable everyone to embrace the digital future with confidence, intelligence, and joy.

iOS

Google Play

Android Apk

Market Ecosystem Alpha Paradise Lost Ratings News News Flash Calendar Exchanges Wallets