Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Open to Collab
90.0
TFLOPS
88
16
264
s3nh
PRO
s3nh
Follow
edwarddddr's profile picture
imhentai's profile picture
JonchGolden's profile picture
252 followers
·
104 following
s3nhxx
s3nh
AI & ML interests
Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh
Recent Activity
reacted
to
codelion
's
post
with 🔥
1 day ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
reacted
to
giux78
's
post
with 🔥
7 days ago
Together with @mferraretto and @efederici we released #Nesso-4B, a new model specialized for agentic workflows. https://huggingface.co/mii-llm/nesso-4B #Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases. As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments. Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough. #Nesso4B #AgenticAI #PrivateAI #EdgeAI #OnDeviceAI
reacted
to
AdinaY
's
post
with 🔥
7 days ago
GLM just entered the OCR field🔥 https://huggingface.co/zai-org/GLM-OCR ✨ 0.9B ✨ MIT licensed ✨ Multimodal GLM-V architecture ✨ #1 on OmniDocBench v1.5 (94.62)
View all activity
Organizations
s3nh
's datasets
1
Sort: Recently updated
s3nh/alpaca-dolly-instruction-only-polish
Viewer
•
Updated
May 2, 2023
•
23.7k
•
33
•
6