s3nh's picture

Open to Collab

s3nh PRO

s3nh

·

s3nhxx
s3nh

AI & ML interests

Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh

Recent Activity

reacted to codelion's post with 🔥 1 day ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

reacted to giux78's post with 🔥 7 days ago

Together with @mferraretto and @efederici we released #Nesso-4B, a new model specialized for agentic workflows. https://huggingface.co/mii-llm/nesso-4B #Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases. As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments. Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough. #Nesso4B #AgenticAI #PrivateAI #EdgeAI #OnDeviceAI

reacted to AdinaY's post with 🔥 7 days ago

GLM just entered the OCR field🔥 https://huggingface.co/zai-org/GLM-OCR ✨ 0.9B ✨ MIT licensed ✨ Multimodal GLM-V architecture ✨ #1 on OmniDocBench v1.5 (94.62)

View all activity

Organizations

s3nh 's datasets 1

s3nh/alpaca-dolly-instruction-only-polish

Viewer • Updated May 2, 2023 • 23.7k • 33 • 6