eb2d7c7ec26b3d1aa8bc3af5dbdd92d7
This model is a fine-tuned version of google-bert/bert-large-cased-whole-word-masking-finetuned-squad on the contemmcm/cls_20newsgroups dataset. It achieves the following results on the evaluation set:
- Loss: 0.7067
- Data Size: 1.0
- Epoch Runtime: 71.2702
- Accuracy: 0.8508
- F1 Macro: 0.8501
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Accuracy | F1 Macro |
|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 3.0937 | 0 | 4.6194 | 0.0441 | 0.0046 |
| No log | 1 | 499 | 3.0602 | 0.0078 | 5.3616 | 0.0680 | 0.0188 |
| 0.0306 | 2 | 998 | 3.0156 | 0.0156 | 6.1620 | 0.0502 | 0.0080 |
| 0.055 | 3 | 1497 | 2.7450 | 0.0312 | 7.5877 | 0.1124 | 0.0419 |
| 0.0991 | 4 | 1996 | 2.1955 | 0.0625 | 10.8277 | 0.2528 | 0.1609 |
| 1.8231 | 5 | 2495 | 1.5582 | 0.125 | 14.3258 | 0.4554 | 0.3792 |
| 1.2726 | 6 | 2994 | 1.3613 | 0.25 | 22.7187 | 0.4771 | 0.4055 |
| 1.046 | 7 | 3493 | 1.0068 | 0.5 | 38.7059 | 0.6963 | 0.6689 |
| 0.7414 | 8.0 | 3992 | 0.7690 | 1.0 | 71.8667 | 0.7732 | 0.7451 |
| 0.6381 | 9.0 | 4491 | 0.6742 | 1.0 | 71.0718 | 0.8238 | 0.8118 |
| 0.5692 | 10.0 | 4990 | 0.6492 | 1.0 | 70.9770 | 0.8309 | 0.8256 |
| 0.5074 | 11.0 | 5489 | 0.6485 | 1.0 | 71.1081 | 0.8485 | 0.8492 |
| 0.4786 | 12.0 | 5988 | 0.6285 | 1.0 | 71.0364 | 0.8523 | 0.8479 |
| 0.4709 | 13.0 | 6487 | 0.6108 | 1.0 | 71.1770 | 0.8644 | 0.8634 |
| 0.4551 | 14.0 | 6986 | 0.7001 | 1.0 | 71.0054 | 0.8493 | 0.8488 |
| 0.4681 | 15.0 | 7485 | 0.5862 | 1.0 | 70.9093 | 0.8692 | 0.8686 |
| 0.4464 | 16.0 | 7984 | 0.6488 | 1.0 | 70.9935 | 0.8589 | 0.8583 |
| 0.477 | 17.0 | 8483 | 0.6646 | 1.0 | 71.1286 | 0.8632 | 0.8602 |
| 0.4779 | 18.0 | 8982 | 0.6808 | 1.0 | 71.0146 | 0.8586 | 0.8567 |
| 0.5596 | 19.0 | 9481 | 0.7067 | 1.0 | 71.2702 | 0.8508 | 0.8501 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- -