moderation-prompts - a h4c5 Collection

h4c5 's Collections

sts

moderation-prompts

moderation-prompts

updated Apr 18, 2025

mmathys/openai-moderation-api-evaluation

Viewer • Updated Aug 28, 2023 • 1.68k • 244 • 35
Anthropic/hh-rlhf

Viewer • Updated May 26, 2023 • 169k • 18.3k • 1.66k
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Paper • 2406.18495 • Published Jun 26, 2024 • 13
ShieldGemma: Generative AI Content Moderation Based on Gemma

Paper • 2407.21772 • Published Jul 31, 2024 • 14
lmsys/lmsys-chat-1m

Viewer • Updated Jul 27, 2024 • 1M • 5.94k • 838
PKU-Alignment/BeaverTails

Viewer • Updated Oct 17, 2023 • 364k • 15.3k • 95
AgentPublic/camembert-base-toxic-fr-user-prompts

Text Classification • 0.1B • Updated May 30, 2024 • 85 • 7
OpenSafetyLab/Salad-Data

Viewer • Updated Mar 29, 2024 • 30.4k • 621 • 27
meta-llama/Llama-Guard-3-8B

Text Generation • 8B • Updated Oct 11, 2024 • 76.7k • • 267
davanstrien/aart-ai-safety-dataset

Viewer • Updated Jan 9, 2024 • 3.27k • 12 • 2
walledai/AdvBench

Viewer • Updated Jul 4, 2024 • 520 • 9.58k • 85