Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
24
5
48
Michael Anthony
PRO
MikeDoes
Follow
khaled1997's profile picture
Remona20's profile picture
VanessaMGSA's profile picture
97 followers
·
48 following
http://www.ai4privacy.com
MikeDoesDo
MikeDoes
AI & ML interests
Privacy, Large Language Model, Explainable
Recent Activity
posted
an
update
1 day ago
State-of-the-art AI doesn't start with a model. It starts with the data. Achieving near-perfect accuracy for PII & PHI anonymization is one of the toughest challenges in NLP. A model is only as good as the data it learns from, providing this foundational layer is central to our mission. The ai4privacy/pii-masking-400k dataset was built for this exact purpose: to serve as a robust, large-scale, open-source training ground for building high-precision privacy tools. To see the direct impact of this data-first approach, look at the ner_deid_aipii model for Healthcare NLP by johnsnow lab. By training on our 400,000 labeled examples, the model achieved incredible performance: 100% F1-score on EMAIL detection. 99% F1-score on PHONE detection. 97% F1-score on NAME detection. This is the result of combining a cutting-edge architecture with a comprehensive, high-quality dataset. We provide the open-source foundation so developers can build better, safer solutions. Explore the dataset that helps power these next-generation privacy tools: https://huggingface.co/datasets/ai4privacy/pii-masking-400k 🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/ #DataPrivacy #AI #OpenSource #Anonymization #MachineLearning #HealthcareAI #Ai4Privacy
reacted
to
their
post
with 🚀
7 days ago
Can you teach a giant like Google's Gemini to protect user privacy? A new step-by-step guide shows that the answer is a resounding "yes." While powerful, large language models aren't specialized for privacy tasks. This tutorial by Analytics Vidhya walks through how to fine-tune Gemini into a dedicated tool for PII anonymization. To teach the model this critical skill, the author needed a robust dataset with thousands of clear 'before' and 'after' examples. We're thrilled they chose the Ai4Privacy pii-masking-200k dataset for this task. Our data provided the high-quality, paired examples of masked and unmasked text necessary to effectively train Gemini to identify and hide sensitive information accurately. This is a perfect example of how the community can use open-source data to add a crucial layer of safety to the world's most powerful models. Great work! 🔗 Check out the full tutorial here: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/ 🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/ #DataPrivacy #AI #LLM #FineTuning #Anonymization #GoogleGemini #Ai4Privacy #World's largest open privacy masking dataset
posted
an
update
7 days ago
Can you teach a giant like Google's Gemini to protect user privacy? A new step-by-step guide shows that the answer is a resounding "yes." While powerful, large language models aren't specialized for privacy tasks. This tutorial by Analytics Vidhya walks through how to fine-tune Gemini into a dedicated tool for PII anonymization. To teach the model this critical skill, the author needed a robust dataset with thousands of clear 'before' and 'after' examples. We're thrilled they chose the Ai4Privacy pii-masking-200k dataset for this task. Our data provided the high-quality, paired examples of masked and unmasked text necessary to effectively train Gemini to identify and hide sensitive information accurately. This is a perfect example of how the community can use open-source data to add a crucial layer of safety to the world's most powerful models. Great work! 🔗 Check out the full tutorial here: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/ 🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/ #DataPrivacy #AI #LLM #FineTuning #Anonymization #GoogleGemini #Ai4Privacy #World's largest open privacy masking dataset
View all activity
Organizations
MikeDoes
's Spaces
2
Sort: Recently updated
Running
1
Terminal Visualiser
💻
Create and download styled terminal screenshots
Running
1
TKG Visualiser
🌍
Visualize workflows from TSV data