Open Legal Data Collection A collection of our favorite open-source legal datasets on Hugging Face. • 2 items • Updated Oct 31, 2025 • 4
view article Article Australian-made LLM beats OpenAI and Google at legal retrieval Oct 23, 2025 • 26
Seq vs Seq: An Open Suite of Paired Encoders and Decoders Paper • 2507.11412 • Published Jul 15, 2025 • 30
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published Jul 1, 2025 • 80
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 Jul 1, 2025 • 132
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 12 items • Updated Jan 6, 2025 • 147
view article Article Multi-Label Classification Model From Scratch: Step-by-Step Tutorial Jan 8, 2024 • 49
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28, 2024 • 66
Tajik Datasets Collection Datasets that have tajik subset or entirely tajik • 13 items • Updated Feb 20, 2025 • 4
Open Australian Legal Models Collection A collection of open source Australian legal language models • 6 items • Updated Jun 15, 2024 • 1
Open Australian Legal Data Collection A collection of open source Australian legal datasets • 3 items • Updated Jun 15, 2024 • 5