Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
12.8
TFLOPS
11
28
100
Anurag
edwixx
Follow
Patil's profile picture
21world's profile picture
theshresthshukla's profile picture
14 followers
Ā·
53 following
https://anuragkanade.com/
edwixxxx
anurag12-webster
anurag-kanade
AI & ML interests
Machine Learning, and Speech
Recent Activity
new
activity
about 15 hours ago
huggingface/InferenceSupport:
edwixx/whisper-large-hebrew-finetune
reacted
to
sagar007
's
post
with š¤
about 16 hours ago
š I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! š§ What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency š Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) š https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding š **Try it yourself:** - š¤ Model: https://huggingface.co/sagar007/multigemma - š® Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - š» GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! š #multimodal #gemma #clip #llava #vision-language #pytorch
reacted
to
sagar007
's
post
with š„
about 16 hours ago
š I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! š§ What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency š Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) š https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding š **Try it yourself:** - š¤ Model: https://huggingface.co/sagar007/multigemma - š® Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - š» GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! š #multimodal #gemma #clip #llava #vision-language #pytorch
View all activity
Organizations
edwixx
's datasets
16
Sort:Ā Recently updated
edwixx/hindi-female-tts
Updated
15 days ago
ā¢
6
edwixx/karaoke_songs_long
Preview
ā¢
Updated
Nov 22, 2025
ā¢
8
edwixx/triton-code-dataset
Preview
ā¢
Updated
Oct 29, 2025
ā¢
1
edwixx/aesthetic-images
Viewer
ā¢
Updated
Aug 8, 2025
ā¢
7.97k
ā¢
5
ā¢
2
edwixx/Gujrati_Female_SPeech
Updated
Aug 3, 2025
ā¢
27
edwixx/NsFW-Dataset
Updated
Jan 3, 2025
ā¢
31
ā¢
7
edwixx/international-multilingual
Updated
Nov 21, 2024
edwixx/Gujarati40h
Updated
Oct 18, 2024
ā¢
4
edwixx/Speech-Dataset-International
Updated
Oct 18, 2024
ā¢
6
edwixx/brazilian-portuguese-TTS
Updated
Oct 16, 2024
ā¢
3
ā¢
8
edwixx/QNA_dt
Preview
ā¢
Updated
Jul 31, 2024
ā¢
38
edwixx/gujjudata
Preview
ā¢
Updated
Jul 23, 2024
edwixx/HiFiTTS-Modified
Updated
Jun 25, 2024
edwixx/Tamil200hours
Updated
May 7, 2024
ā¢
41
edwixx/tamilVoice2
Updated
Apr 29, 2024
edwixx/teluguVoice
Viewer
ā¢
Updated
Apr 25, 2024
ā¢
1.89k
ā¢
10