babylm-base9m-roberta

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 182
training_steps: 18200

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.8945	0.1052	200	6.4849	0.1015
6.1415	0.2103	400	6.1729	0.1288
6.0058	0.3155	600	6.0430	0.1356
5.9066	0.4206	800	5.9878	0.1416
5.8336	0.5258	1000	5.9424	0.1441
5.8452	0.6309	1200	5.9035	0.1455
5.7943	0.7361	1400	5.8767	0.1483
5.7741	0.8412	1600	5.8617	0.1487
5.7285	0.9464	1800	5.8425	0.1495
5.7183	1.0515	2000	5.8324	0.1509
5.6107	2.1030	4000	5.7540	0.1565
5.5729	3.1546	6000	5.7043	0.1599
5.5388	4.2061	8000	5.6663	0.1614
5.4963	5.2576	10000	5.6391	0.1642
5.455	6.3091	12000	5.6120	0.1661
5.4238	7.3607	14000	5.5991	0.1665
5.4392	8.4122	16000	5.5841	0.1675
5.4325	9.4637	18000	5.5776	0.1677

Safetensors

Model size

98.6M params

Tensor type

F32