ks-lit-3m: A 3.1 million word kashmiri text dataset for large language model pretraining Paper • 2601.01091 • Published 7 days ago
600k-ks-ocr: a large-scale synthetic dataset for optical character recognition in kashmiri script Paper • 2601.01088 • Published 7 days ago
Omarrran/3.1Million_KASHMIRI_text_Pre_training_Dataset_for_LLM_2026_by_HNM Viewer • Updated 8 days ago • 1 • 17
Omarrran/3.1Million_KASHMIRI_text_Pre_training_Dataset_for_LLM_2026_by_HNM Viewer • Updated 8 days ago • 1 • 17