Anurag's picture

Building on HF

Anurag

edwixx

ModelsLab

·

https://anuragkanade.com/

AI & ML interests

Machine Learning, and Speech

Recent Activity

new activity about 15 hours ago

huggingface/InferenceSupport:edwixx/whisper-large-hebrew-finetune

reacted to sagar007's post with 🤝 about 16 hours ago

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch

reacted to sagar007's post with 🔥 about 16 hours ago

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch

View all activity

Organizations

edwixx 's datasets 16

edwixx/hindi-female-tts

Updated 15 days ago • 6

edwixx/karaoke_songs_long

Preview • Updated Nov 22, 2025 • 8

edwixx/triton-code-dataset

Preview • Updated Oct 29, 2025 • 1

edwixx/aesthetic-images

Viewer • Updated Aug 8, 2025 • 7.97k • 5 • 2

edwixx/Gujrati_Female_SPeech

Updated Aug 3, 2025 • 27

edwixx/NsFW-Dataset

Updated Jan 3, 2025 • 31 • 7

edwixx/international-multilingual

Updated Nov 21, 2024

edwixx/Gujarati40h

Updated Oct 18, 2024 • 4

edwixx/Speech-Dataset-International

Updated Oct 18, 2024 • 6

edwixx/brazilian-portuguese-TTS

Updated Oct 16, 2024 • 3 • 8

edwixx/QNA_dt

Preview • Updated Jul 31, 2024 • 38

edwixx/gujjudata

Preview • Updated Jul 23, 2024

edwixx/HiFiTTS-Modified

Updated Jun 25, 2024

edwixx/Tamil200hours

Updated May 7, 2024 • 41

edwixx/tamilVoice2

Updated Apr 29, 2024

edwixx/teluguVoice

Viewer • Updated Apr 25, 2024 • 1.89k • 10