
deepseek-ai/DeepSeek-V3 · Hugging Face
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.
Paper page - DeepSeek-V3 Technical Report - Hugging Face
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.
Deepseek V3 (All Versions) - a unsloth Collection - Hugging Face
Jan 7, 2025 · Deepseek V3 - available in bf16, original, and GGUF formats, with support for 2, 3, 4, 5, 6 and 8-bit quantized versions.
DeepSeek Releases 700B Params V3 Model Update, DeepSeek-V3 …
2 days ago · DeepSeek, a Chinese AI lab, has released an update to its DeepSeek V3 model, now available as DeepSeek-V3-0324 on Hugging Face. This update, which is licensed under MIT, is a significant enhancement to the base model, with the total size of the files amounting to 641 GB and 700 billion parameters. The updated model ..
DeepSeek Releases Powerful V3-0324 Model with 685 Billion …
2 days ago · DeepSeek has released an updated version of its AI model, DeepSeek-V3-0324, which is now available on the Hugging Face platform. This version features enhancements in post-training capabilities, particularly improving performance in mathematics and coding tasks. The model is powered by a new 32k GPU cluster and boast..
GitHub - deepseek-ai/DeepSeek-V3
The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.
China's DeepSeek releases AI model upgrade, intensifies
1 day ago · The new model, DeepSeek-V3-0324, was made available through AI development platform Hugging Face, marking the company’s latest push to establish itself in the rapidly evolving AI market.
DeepSeek-V3-0324 Released: Free for Commercial Use, Runs on …
1 day ago · DeepSeek quietly released its latest large language model, DeepSeek-V3-0324, causing a stir in the AI industry. This massive 641GB model appeared on the Hugging Face model hub with almost no prior announcement, continuing the company's understated yet impactful release style. Performance leaps rivaling Claude Sonnet3.5 make this release particularly noteworthy.
DeepSeek-V3: the model everyone is talking about
Jan 2, 2025 · Awesome exploration of scaling test-time compute with open models by Hugging Face. "Check out this plot where the tiny 1B and 3B Llama Instruct models outperform their much larger 8B and 70B siblings on the challenging MATH …
DeepSeek-V3, a Chinese AI Dark Horse, Makes a Stunning Debut: …
1 day ago · Chinese AI startup DeepSeek quietly released its large language model, DeepSeek-V3-0324, sending ripples through the AI industry. The model, weighing in at 641GB, appeared on Hugging Face, showcasing DeepSeek's understated yet impactful style—a release with only an empty README file and model weights. Licensed under MIT, the model is free for commercial use and can run on consumer-grade ...
- Some results have been removed