readme: include number of training epochs
Browse files
README.md
CHANGED
|
@@ -22,12 +22,13 @@ Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languag
|
|
| 22 |
|
| 23 |
More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
|
| 24 |
|
| 25 |
-
|
| 26 |
# Pretraining
|
| 27 |
|
| 28 |
We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU.
|
| 29 |
Details about the training can be found [here](https://github.com/stefan-it/hmByT5/tree/main/hmbyt5-flax).
|
| 30 |
|
|
|
|
|
|
|
| 31 |
# Evaluation on Downstream Tasks (NER)
|
| 32 |
|
| 33 |
We evaluated the hmByT5 model on downstream tasks:
|
|
|
|
| 22 |
|
| 23 |
More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
|
| 24 |
|
|
|
|
| 25 |
# Pretraining
|
| 26 |
|
| 27 |
We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU.
|
| 28 |
Details about the training can be found [here](https://github.com/stefan-it/hmByT5/tree/main/hmbyt5-flax).
|
| 29 |
|
| 30 |
+
The model was trained for 0.5 epoch.
|
| 31 |
+
|
| 32 |
# Evaluation on Downstream Tasks (NER)
|
| 33 |
|
| 34 |
We evaluated the hmByT5 model on downstream tasks:
|