Login
Sign Up
Woofun AI reports that DeepSeek, in collaboration with Peking University, has released the DSpark speculative token sampling acceleration framework and open-sourced the DeepSpec codebase. DSpark is now deployed in DeepSeek-V4 online services, enhancing single-user generation speed by 60% to 85% for the Flash version and by 57% to 78% for the Pro version, all while ensuring lossless output.
The framework addresses previous throughput bottlenecks caused by low acceptance rates in parallel draft models and computational waste in multi-token validation. By utilizing DFlash for parallel main spine network generation and integrating a lightweight Markov header with confidence prediction, DSpark enables zero-cost scheduling and prevents large models from validating high-risk tail typos. The accompanying DeepSpec toolchain supports models like Qwen3 and Gemma, providing a complete Python workflow for local deployment.