babylm-base9m-roberta

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.5708
  • Accuracy: 0.1684

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 182
  • training_steps: 18200

Training results

Training Loss Epoch Step Validation Loss Accuracy
6.8945 0.1052 200 6.4849 0.1015
6.1415 0.2103 400 6.1729 0.1288
6.0058 0.3155 600 6.0430 0.1356
5.9066 0.4206 800 5.9878 0.1416
5.8336 0.5258 1000 5.9424 0.1441
5.8452 0.6309 1200 5.9035 0.1455
5.7943 0.7361 1400 5.8767 0.1483
5.7741 0.8412 1600 5.8617 0.1487
5.7285 0.9464 1800 5.8425 0.1495
5.7183 1.0515 2000 5.8324 0.1509
5.6107 2.1030 4000 5.7540 0.1565
5.5729 3.1546 6000 5.7043 0.1599
5.5388 4.2061 8000 5.6663 0.1614
5.4963 5.2576 10000 5.6391 0.1642
5.455 6.3091 12000 5.6120 0.1661
5.4238 7.3607 14000 5.5991 0.1665
5.4392 8.4122 16000 5.5841 0.1675
5.4325 9.4637 18000 5.5776 0.1677

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.4
Downloads last month
3
Safetensors
Model size
98.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results