Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
17.4
TFLOPS
40
36
65
Ilyas Moutawwakil
IlyasMoutawwakil
Follow
anit00kiss's profile picture
albertvillanova's profile picture
saldistefano's profile picture
145 followers
·
60 following
IlyasMoutawwakil
AI & ML interests
Optimization, LLMs, Hardware, Backends, ..
Recent Activity
liked
a Space
18 days ago
Qwen/Qwen3-TTS
replied
to
their
post
19 days ago
Transformers v5 just landed! 🚀 It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations. My favorite new feature? 🤔 The new dynamic weight loader + converter. Here’s why 👇 Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means we’re no longer constrained by how parameters are laid out inside the safetensors weight files. In practice, this unlocks two big things: - Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 → v3, Qwen v2 → v3 → MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families. - Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it can’t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply weren’t possible before. Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes. Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility. Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match: - Parallelism - Quantization - Custom kernels - Flash/Paged attention - Continuous batching - ... Kudos to everyone involved! I highly recommend the: Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0 Blog post: https://huggingface.co/blog/transformers-v5
posted
an
update
19 days ago
Transformers v5 just landed! 🚀 It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations. My favorite new feature? 🤔 The new dynamic weight loader + converter. Here’s why 👇 Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means we’re no longer constrained by how parameters are laid out inside the safetensors weight files. In practice, this unlocks two big things: - Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 → v3, Qwen v2 → v3 → MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families. - Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it can’t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply weren’t possible before. Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes. Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility. Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match: - Parallelism - Quantization - Custom kernels - Flash/Paged attention - Continuous batching - ... Kudos to everyone involved! I highly recommend the: Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0 Blog post: https://huggingface.co/blog/transformers-v5
View all activity
Organizations
IlyasMoutawwakil
's datasets
8
Sort: Recently updated
IlyasMoutawwakil/OnnxRuntime-Encoder-Benchmark
Updated
Sep 24, 2025
•
2
IlyasMoutawwakil/ORT-Bert-Benchmark
Updated
Sep 23, 2025
•
15
IlyasMoutawwakil/OpenVINO-VLM-Benchmark
Updated
Sep 22, 2025
•
22
IlyasMoutawwakil/pytorch_gpt2
Updated
Sep 1, 2025
IlyasMoutawwakil/benchmarks
Preview
•
Updated
Dec 12, 2024
•
16
IlyasMoutawwakil/OpenVINO-Benchmarks
Updated
Nov 18, 2024
•
4
IlyasMoutawwakil/optimum-benchmarks-ci
Preview
•
Updated
Apr 10, 2024
•
9
IlyasMoutawwakil/llm-race-dataset
Viewer
•
Updated
Nov 23, 2023
•
4.38M
•
34
•
1