2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): ModernBertModel( (embeddings): ModernBertEmbeddings( (tok_embeddings): Embedding(50369, 1024, padding_idx=50283) (norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (drop): Dropout(p=0.0, inplace=False) ) (layers): ModuleList( (0): ModernBertEncoderLayer( (attn_norm): Identity() (attn): ModernBertAttention( (Wqkv): Linear(in_features=1024, out_features=3072, bias=False) (rotary_emb): ModernBertRotaryEmbedding() (Wo): Linear(in_features=1024, out_features=1024, bias=False) (out_drop): Identity() ) (mlp_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): ModernBertMLP( (Wi): Linear(in_features=1024, out_features=5248, bias=False) (act): GELUActivation() (drop): Dropout(p=0.0, inplace=False) (Wo): Linear(in_features=2624, out_features=1024, bias=False) ) ) (1-27): 27 x ModernBertEncoderLayer( (attn_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (attn): ModernBertAttention( (Wqkv): Linear(in_features=1024, out_features=3072, bias=False) (rotary_emb): ModernBertRotaryEmbedding() (Wo): Linear(in_features=1024, out_features=1024, bias=False) (out_drop): Identity() ) (mlp_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): ModernBertMLP( (Wi): Linear(in_features=1024, out_features=5248, bias=False) (act): GELUActivation() (drop): Dropout(p=0.0, inplace=False) (Wo): Linear(in_features=2624, out_features=1024, bias=False) ) ) ) (final_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=2048, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 MultiCorpus: 14987 train + 3466 dev + 3684 test sentences - CONLL_03_ENGLISH Corpus: 14987 train + 3466 dev + 3684 test sentences - /home/stefan/.flair/datasets/conll_03_english 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 Train: 14987 sentences 2025-05-08 16:35:44,100 (train_with_dev=False, train_with_test=False) 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 Training Params: 2025-05-08 16:35:44,100 - optimizer: "" 2025-05-08 16:35:44,100 - learning_rate: "2e-05" 2025-05-08 16:35:44,100 - mini_batch_size: "16" 2025-05-08 16:35:44,100 - max_epochs: "10" 2025-05-08 16:35:44,100 - shuffle: "True" 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 Plugins: 2025-05-08 16:35:44,100 - TensorboardLogger 2025-05-08 16:35:44,100 - LinearScheduler | warmup_fraction: '0.1' 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 Final evaluation on model from best epoch (best-model.pt) 2025-05-08 16:35:44,100 - metric: "('micro avg', 'f1-score')" 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 Computation: 2025-05-08 16:35:44,100 - compute on device: cuda:0 2025-05-08 16:35:44,100 - embedding storage: none 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 Model training base path: "flair-ner-conll03_english-modern_bert_large_tokenizer_fix-bs16-e10-cs0-lr2e-05-2" 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:35:44,100 Logging anything other than scalars to TensorBoard is currently not supported. 2025-05-08 16:35:56,453 epoch 1 - iter 93/937 - loss 51.00117641 - time (sec): 12.35 - samples/sec: 1732.91 - lr: 0.000002 - momentum: 0.000000 2025-05-08 16:36:08,727 epoch 1 - iter 186/937 - loss 39.31915124 - time (sec): 24.63 - samples/sec: 1689.29 - lr: 0.000004 - momentum: 0.000000 2025-05-08 16:36:21,012 epoch 1 - iter 279/937 - loss 28.34790128 - time (sec): 36.91 - samples/sec: 1696.57 - lr: 0.000006 - momentum: 0.000000 2025-05-08 16:36:33,294 epoch 1 - iter 372/937 - loss 21.50534985 - time (sec): 49.19 - samples/sec: 1692.10 - lr: 0.000008 - momentum: 0.000000 2025-05-08 16:36:45,533 epoch 1 - iter 465/937 - loss 17.43941279 - time (sec): 61.43 - samples/sec: 1676.80 - lr: 0.000010 - momentum: 0.000000 2025-05-08 16:36:57,843 epoch 1 - iter 558/937 - loss 14.61276887 - time (sec): 73.74 - samples/sec: 1671.44 - lr: 0.000012 - momentum: 0.000000 2025-05-08 16:37:10,132 epoch 1 - iter 651/937 - loss 12.61916178 - time (sec): 86.03 - samples/sec: 1664.38 - lr: 0.000014 - momentum: 0.000000 2025-05-08 16:37:22,323 epoch 1 - iter 744/937 - loss 11.09955475 - time (sec): 98.22 - samples/sec: 1660.09 - lr: 0.000016 - momentum: 0.000000 2025-05-08 16:37:34,541 epoch 1 - iter 837/937 - loss 9.91430791 - time (sec): 110.44 - samples/sec: 1656.84 - lr: 0.000018 - momentum: 0.000000 2025-05-08 16:37:46,710 epoch 1 - iter 930/937 - loss 8.93883090 - time (sec): 122.61 - samples/sec: 1657.42 - lr: 0.000020 - momentum: 0.000000 2025-05-08 16:37:47,602 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:37:47,602 EPOCH 1 done: loss 8.8805 - lr: 0.000020 2025-05-08 16:37:54,144 DEV : loss 0.1118580624461174 - f1-score (micro avg) 0.9036 2025-05-08 16:37:54,170 saving best model 2025-05-08 16:37:54,784 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:38:06,997 epoch 2 - iter 93/937 - loss 0.11137172 - time (sec): 12.21 - samples/sec: 1652.27 - lr: 0.000020 - momentum: 0.000000 2025-05-08 16:38:19,287 epoch 2 - iter 186/937 - loss 0.11052976 - time (sec): 24.50 - samples/sec: 1627.98 - lr: 0.000020 - momentum: 0.000000 2025-05-08 16:38:31,513 epoch 2 - iter 279/937 - loss 0.10145703 - time (sec): 36.73 - samples/sec: 1641.78 - lr: 0.000019 - momentum: 0.000000 2025-05-08 16:38:43,868 epoch 2 - iter 372/937 - loss 0.09382547 - time (sec): 49.08 - samples/sec: 1653.09 - lr: 0.000019 - momentum: 0.000000 2025-05-08 16:38:56,121 epoch 2 - iter 465/937 - loss 0.09283270 - time (sec): 61.34 - samples/sec: 1656.97 - lr: 0.000019 - momentum: 0.000000 2025-05-08 16:39:08,413 epoch 2 - iter 558/937 - loss 0.09469460 - time (sec): 73.63 - samples/sec: 1657.89 - lr: 0.000019 - momentum: 0.000000 2025-05-08 16:39:20,753 epoch 2 - iter 651/937 - loss 0.09219174 - time (sec): 85.97 - samples/sec: 1655.83 - lr: 0.000018 - momentum: 0.000000 2025-05-08 16:39:33,056 epoch 2 - iter 744/937 - loss 0.09000350 - time (sec): 98.27 - samples/sec: 1655.73 - lr: 0.000018 - momentum: 0.000000 2025-05-08 16:39:45,277 epoch 2 - iter 837/937 - loss 0.09094705 - time (sec): 110.49 - samples/sec: 1651.81 - lr: 0.000018 - momentum: 0.000000 2025-05-08 16:39:57,613 epoch 2 - iter 930/937 - loss 0.08852163 - time (sec): 122.83 - samples/sec: 1653.86 - lr: 0.000018 - momentum: 0.000000 2025-05-08 16:39:58,500 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:39:58,501 EPOCH 2 done: loss 0.0885 - lr: 0.000018 2025-05-08 16:40:03,946 DEV : loss 0.06839137524366379 - f1-score (micro avg) 0.9372 2025-05-08 16:40:03,971 saving best model 2025-05-08 16:40:04,951 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:40:17,146 epoch 3 - iter 93/937 - loss 0.04928881 - time (sec): 12.19 - samples/sec: 1647.64 - lr: 0.000018 - momentum: 0.000000 2025-05-08 16:40:29,461 epoch 3 - iter 186/937 - loss 0.04589421 - time (sec): 24.51 - samples/sec: 1663.15 - lr: 0.000017 - momentum: 0.000000 2025-05-08 16:40:41,705 epoch 3 - iter 279/937 - loss 0.04343733 - time (sec): 36.75 - samples/sec: 1646.12 - lr: 0.000017 - momentum: 0.000000 2025-05-08 16:40:53,964 epoch 3 - iter 372/937 - loss 0.04578370 - time (sec): 49.01 - samples/sec: 1650.85 - lr: 0.000017 - momentum: 0.000000 2025-05-08 16:41:06,206 epoch 3 - iter 465/937 - loss 0.04201880 - time (sec): 61.25 - samples/sec: 1646.44 - lr: 0.000017 - momentum: 0.000000 2025-05-08 16:41:18,565 epoch 3 - iter 558/937 - loss 0.04253207 - time (sec): 73.61 - samples/sec: 1648.62 - lr: 0.000016 - momentum: 0.000000 2025-05-08 16:41:30,845 epoch 3 - iter 651/937 - loss 0.04646610 - time (sec): 85.89 - samples/sec: 1646.48 - lr: 0.000016 - momentum: 0.000000 2025-05-08 16:41:43,169 epoch 3 - iter 744/937 - loss 0.04955133 - time (sec): 98.22 - samples/sec: 1653.16 - lr: 0.000016 - momentum: 0.000000 2025-05-08 16:41:55,483 epoch 3 - iter 837/937 - loss 0.04703390 - time (sec): 110.53 - samples/sec: 1656.33 - lr: 0.000016 - momentum: 0.000000 2025-05-08 16:42:07,683 epoch 3 - iter 930/937 - loss 0.04786307 - time (sec): 122.73 - samples/sec: 1654.62 - lr: 0.000016 - momentum: 0.000000 2025-05-08 16:42:08,556 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:42:08,556 EPOCH 3 done: loss 0.0477 - lr: 0.000016 2025-05-08 16:42:14,012 DEV : loss 0.07307213544845581 - f1-score (micro avg) 0.9533 2025-05-08 16:42:14,038 saving best model 2025-05-08 16:42:15,034 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:42:28,642 epoch 4 - iter 93/937 - loss 0.01898962 - time (sec): 13.61 - samples/sec: 1568.87 - lr: 0.000015 - momentum: 0.000000 2025-05-08 16:42:40,917 epoch 4 - iter 186/937 - loss 0.02010164 - time (sec): 25.88 - samples/sec: 1612.31 - lr: 0.000015 - momentum: 0.000000 2025-05-08 16:42:53,288 epoch 4 - iter 279/937 - loss 0.02142412 - time (sec): 38.25 - samples/sec: 1606.07 - lr: 0.000015 - momentum: 0.000000 2025-05-08 16:43:05,506 epoch 4 - iter 372/937 - loss 0.02282148 - time (sec): 50.47 - samples/sec: 1613.79 - lr: 0.000015 - momentum: 0.000000 2025-05-08 16:43:17,846 epoch 4 - iter 465/937 - loss 0.02263261 - time (sec): 62.81 - samples/sec: 1632.46 - lr: 0.000014 - momentum: 0.000000 2025-05-08 16:43:30,220 epoch 4 - iter 558/937 - loss 0.02330124 - time (sec): 75.18 - samples/sec: 1637.30 - lr: 0.000014 - momentum: 0.000000 2025-05-08 16:43:42,488 epoch 4 - iter 651/937 - loss 0.02443078 - time (sec): 87.45 - samples/sec: 1632.91 - lr: 0.000014 - momentum: 0.000000 2025-05-08 16:43:54,794 epoch 4 - iter 744/937 - loss 0.02445322 - time (sec): 99.76 - samples/sec: 1637.75 - lr: 0.000014 - momentum: 0.000000 2025-05-08 16:44:06,909 epoch 4 - iter 837/937 - loss 0.02507178 - time (sec): 111.87 - samples/sec: 1633.75 - lr: 0.000014 - momentum: 0.000000 2025-05-08 16:44:19,212 epoch 4 - iter 930/937 - loss 0.02554490 - time (sec): 124.18 - samples/sec: 1634.63 - lr: 0.000013 - momentum: 0.000000 2025-05-08 16:44:20,092 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:44:20,092 EPOCH 4 done: loss 0.0255 - lr: 0.000013 2025-05-08 16:44:25,557 DEV : loss 0.09235326200723648 - f1-score (micro avg) 0.9594 2025-05-08 16:44:25,582 saving best model 2025-05-08 16:44:26,579 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:44:38,913 epoch 5 - iter 93/937 - loss 0.01416651 - time (sec): 12.33 - samples/sec: 1603.37 - lr: 0.000013 - momentum: 0.000000 2025-05-08 16:44:51,263 epoch 5 - iter 186/937 - loss 0.01363371 - time (sec): 24.68 - samples/sec: 1650.67 - lr: 0.000013 - momentum: 0.000000 2025-05-08 16:45:03,521 epoch 5 - iter 279/937 - loss 0.01315713 - time (sec): 36.94 - samples/sec: 1650.89 - lr: 0.000013 - momentum: 0.000000 2025-05-08 16:45:15,869 epoch 5 - iter 372/937 - loss 0.01589097 - time (sec): 49.29 - samples/sec: 1657.14 - lr: 0.000012 - momentum: 0.000000 2025-05-08 16:45:28,072 epoch 5 - iter 465/937 - loss 0.01531867 - time (sec): 61.49 - samples/sec: 1651.82 - lr: 0.000012 - momentum: 0.000000 2025-05-08 16:45:40,485 epoch 5 - iter 558/937 - loss 0.01951086 - time (sec): 73.91 - samples/sec: 1646.68 - lr: 0.000012 - momentum: 0.000000 2025-05-08 16:45:52,827 epoch 5 - iter 651/937 - loss 0.01969603 - time (sec): 86.25 - samples/sec: 1652.73 - lr: 0.000012 - momentum: 0.000000 2025-05-08 16:46:05,052 epoch 5 - iter 744/937 - loss 0.01837435 - time (sec): 98.47 - samples/sec: 1650.35 - lr: 0.000012 - momentum: 0.000000 2025-05-08 16:46:17,313 epoch 5 - iter 837/937 - loss 0.01937513 - time (sec): 110.73 - samples/sec: 1652.26 - lr: 0.000011 - momentum: 0.000000 2025-05-08 16:46:29,551 epoch 5 - iter 930/937 - loss 0.01967155 - time (sec): 122.97 - samples/sec: 1651.51 - lr: 0.000011 - momentum: 0.000000 2025-05-08 16:46:30,427 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:46:30,427 EPOCH 5 done: loss 0.0196 - lr: 0.000011 2025-05-08 16:46:35,889 DEV : loss 0.08506825566291809 - f1-score (micro avg) 0.9651 2025-05-08 16:46:35,914 saving best model 2025-05-08 16:46:36,899 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:46:49,267 epoch 6 - iter 93/937 - loss 0.00097797 - time (sec): 12.37 - samples/sec: 1688.57 - lr: 0.000011 - momentum: 0.000000 2025-05-08 16:47:01,644 epoch 6 - iter 186/937 - loss 0.00406677 - time (sec): 24.74 - samples/sec: 1655.65 - lr: 0.000011 - momentum: 0.000000 2025-05-08 16:47:15,115 epoch 6 - iter 279/937 - loss 0.00554206 - time (sec): 38.22 - samples/sec: 1619.79 - lr: 0.000010 - momentum: 0.000000 2025-05-08 16:47:27,338 epoch 6 - iter 372/937 - loss 0.00611831 - time (sec): 50.44 - samples/sec: 1622.06 - lr: 0.000010 - momentum: 0.000000 2025-05-08 16:47:39,647 epoch 6 - iter 465/937 - loss 0.00675677 - time (sec): 62.75 - samples/sec: 1629.17 - lr: 0.000010 - momentum: 0.000000 2025-05-08 16:47:51,891 epoch 6 - iter 558/937 - loss 0.00685979 - time (sec): 74.99 - samples/sec: 1632.98 - lr: 0.000010 - momentum: 0.000000 2025-05-08 16:48:04,124 epoch 6 - iter 651/937 - loss 0.00811317 - time (sec): 87.22 - samples/sec: 1624.14 - lr: 0.000010 - momentum: 0.000000 2025-05-08 16:48:16,545 epoch 6 - iter 744/937 - loss 0.00856449 - time (sec): 99.64 - samples/sec: 1631.43 - lr: 0.000009 - momentum: 0.000000 2025-05-08 16:48:28,801 epoch 6 - iter 837/937 - loss 0.00890904 - time (sec): 111.90 - samples/sec: 1637.56 - lr: 0.000009 - momentum: 0.000000 2025-05-08 16:48:41,002 epoch 6 - iter 930/937 - loss 0.00869569 - time (sec): 124.10 - samples/sec: 1637.91 - lr: 0.000009 - momentum: 0.000000 2025-05-08 16:48:41,887 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:48:41,887 EPOCH 6 done: loss 0.0087 - lr: 0.000009 2025-05-08 16:48:47,351 DEV : loss 0.1142197698354721 - f1-score (micro avg) 0.9642 2025-05-08 16:48:47,377 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:48:59,721 epoch 7 - iter 93/937 - loss 0.00487326 - time (sec): 12.34 - samples/sec: 1630.63 - lr: 0.000009 - momentum: 0.000000 2025-05-08 16:49:11,966 epoch 7 - iter 186/937 - loss 0.00628660 - time (sec): 24.59 - samples/sec: 1654.42 - lr: 0.000008 - momentum: 0.000000 2025-05-08 16:49:24,089 epoch 7 - iter 279/937 - loss 0.00580646 - time (sec): 36.71 - samples/sec: 1636.76 - lr: 0.000008 - momentum: 0.000000 2025-05-08 16:49:36,346 epoch 7 - iter 372/937 - loss 0.00641392 - time (sec): 48.97 - samples/sec: 1647.65 - lr: 0.000008 - momentum: 0.000000 2025-05-08 16:49:48,730 epoch 7 - iter 465/937 - loss 0.00541592 - time (sec): 61.35 - samples/sec: 1649.42 - lr: 0.000008 - momentum: 0.000000 2025-05-08 16:50:01,050 epoch 7 - iter 558/937 - loss 0.00500581 - time (sec): 73.67 - samples/sec: 1656.15 - lr: 0.000008 - momentum: 0.000000 2025-05-08 16:50:13,357 epoch 7 - iter 651/937 - loss 0.00504434 - time (sec): 85.98 - samples/sec: 1645.32 - lr: 0.000007 - momentum: 0.000000 2025-05-08 16:50:25,648 epoch 7 - iter 744/937 - loss 0.00508162 - time (sec): 98.27 - samples/sec: 1649.88 - lr: 0.000007 - momentum: 0.000000 2025-05-08 16:50:37,948 epoch 7 - iter 837/937 - loss 0.00514329 - time (sec): 110.57 - samples/sec: 1653.64 - lr: 0.000007 - momentum: 0.000000 2025-05-08 16:50:50,233 epoch 7 - iter 930/937 - loss 0.00507451 - time (sec): 122.86 - samples/sec: 1654.76 - lr: 0.000007 - momentum: 0.000000 2025-05-08 16:50:51,087 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:50:51,087 EPOCH 7 done: loss 0.0052 - lr: 0.000007 2025-05-08 16:50:56,556 DEV : loss 0.13016767799854279 - f1-score (micro avg) 0.9601 2025-05-08 16:50:56,581 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:51:08,853 epoch 8 - iter 93/937 - loss 0.00129312 - time (sec): 12.27 - samples/sec: 1665.82 - lr: 0.000006 - momentum: 0.000000 2025-05-08 16:51:21,108 epoch 8 - iter 186/937 - loss 0.00142083 - time (sec): 24.53 - samples/sec: 1627.81 - lr: 0.000006 - momentum: 0.000000 2025-05-08 16:51:33,398 epoch 8 - iter 279/937 - loss 0.00195167 - time (sec): 36.82 - samples/sec: 1628.48 - lr: 0.000006 - momentum: 0.000000 2025-05-08 16:51:47,081 epoch 8 - iter 372/937 - loss 0.00191288 - time (sec): 50.50 - samples/sec: 1599.88 - lr: 0.000006 - momentum: 0.000000 2025-05-08 16:51:59,448 epoch 8 - iter 465/937 - loss 0.00166481 - time (sec): 62.87 - samples/sec: 1610.97 - lr: 0.000006 - momentum: 0.000000 2025-05-08 16:52:11,719 epoch 8 - iter 558/937 - loss 0.00161414 - time (sec): 75.14 - samples/sec: 1617.95 - lr: 0.000005 - momentum: 0.000000 2025-05-08 16:52:23,970 epoch 8 - iter 651/937 - loss 0.00157594 - time (sec): 87.39 - samples/sec: 1625.93 - lr: 0.000005 - momentum: 0.000000 2025-05-08 16:52:36,105 epoch 8 - iter 744/937 - loss 0.00154393 - time (sec): 99.52 - samples/sec: 1628.32 - lr: 0.000005 - momentum: 0.000000 2025-05-08 16:52:48,417 epoch 8 - iter 837/937 - loss 0.00203973 - time (sec): 111.83 - samples/sec: 1631.67 - lr: 0.000005 - momentum: 0.000000 2025-05-08 16:53:00,806 epoch 8 - iter 930/937 - loss 0.00201068 - time (sec): 124.22 - samples/sec: 1635.01 - lr: 0.000004 - momentum: 0.000000 2025-05-08 16:53:01,700 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:53:01,700 EPOCH 8 done: loss 0.0020 - lr: 0.000004 2025-05-08 16:53:07,171 DEV : loss 0.13071465492248535 - f1-score (micro avg) 0.963 2025-05-08 16:53:07,196 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:53:19,384 epoch 9 - iter 93/937 - loss 0.00241956 - time (sec): 12.19 - samples/sec: 1618.47 - lr: 0.000004 - momentum: 0.000000 2025-05-08 16:53:31,629 epoch 9 - iter 186/937 - loss 0.00180043 - time (sec): 24.43 - samples/sec: 1619.30 - lr: 0.000004 - momentum: 0.000000 2025-05-08 16:53:43,993 epoch 9 - iter 279/937 - loss 0.00146007 - time (sec): 36.80 - samples/sec: 1610.43 - lr: 0.000004 - momentum: 0.000000 2025-05-08 16:53:56,261 epoch 9 - iter 372/937 - loss 0.00125696 - time (sec): 49.06 - samples/sec: 1630.15 - lr: 0.000004 - momentum: 0.000000 2025-05-08 16:54:08,611 epoch 9 - iter 465/937 - loss 0.00108053 - time (sec): 61.41 - samples/sec: 1652.78 - lr: 0.000003 - momentum: 0.000000 2025-05-08 16:54:20,898 epoch 9 - iter 558/937 - loss 0.00138047 - time (sec): 73.70 - samples/sec: 1649.28 - lr: 0.000003 - momentum: 0.000000 2025-05-08 16:54:33,205 epoch 9 - iter 651/937 - loss 0.00118774 - time (sec): 86.01 - samples/sec: 1649.73 - lr: 0.000003 - momentum: 0.000000 2025-05-08 16:54:45,518 epoch 9 - iter 744/937 - loss 0.00108115 - time (sec): 98.32 - samples/sec: 1655.74 - lr: 0.000003 - momentum: 0.000000 2025-05-08 16:54:57,805 epoch 9 - iter 837/937 - loss 0.00100400 - time (sec): 110.61 - samples/sec: 1653.22 - lr: 0.000002 - momentum: 0.000000 2025-05-08 16:55:10,050 epoch 9 - iter 930/937 - loss 0.00119770 - time (sec): 122.85 - samples/sec: 1652.52 - lr: 0.000002 - momentum: 0.000000 2025-05-08 16:55:10,937 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:55:10,937 EPOCH 9 done: loss 0.0012 - lr: 0.000002 2025-05-08 16:55:16,406 DEV : loss 0.13402138650417328 - f1-score (micro avg) 0.9658 2025-05-08 16:55:16,432 saving best model 2025-05-08 16:55:17,424 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:55:29,704 epoch 10 - iter 93/937 - loss 0.00164854 - time (sec): 12.28 - samples/sec: 1574.90 - lr: 0.000002 - momentum: 0.000000 2025-05-08 16:55:41,942 epoch 10 - iter 186/937 - loss 0.00084318 - time (sec): 24.52 - samples/sec: 1599.16 - lr: 0.000002 - momentum: 0.000000 2025-05-08 16:55:54,292 epoch 10 - iter 279/937 - loss 0.00062474 - time (sec): 36.87 - samples/sec: 1628.86 - lr: 0.000002 - momentum: 0.000000 2025-05-08 16:56:06,581 epoch 10 - iter 372/937 - loss 0.00047798 - time (sec): 49.16 - samples/sec: 1629.04 - lr: 0.000001 - momentum: 0.000000 2025-05-08 16:56:18,756 epoch 10 - iter 465/937 - loss 0.00038804 - time (sec): 61.33 - samples/sec: 1636.96 - lr: 0.000001 - momentum: 0.000000 2025-05-08 16:56:32,377 epoch 10 - iter 558/937 - loss 0.00039768 - time (sec): 74.95 - samples/sec: 1618.78 - lr: 0.000001 - momentum: 0.000000 2025-05-08 16:56:44,569 epoch 10 - iter 651/937 - loss 0.00034346 - time (sec): 87.14 - samples/sec: 1629.88 - lr: 0.000001 - momentum: 0.000000 2025-05-08 16:56:56,802 epoch 10 - iter 744/937 - loss 0.00053786 - time (sec): 99.38 - samples/sec: 1636.84 - lr: 0.000000 - momentum: 0.000000 2025-05-08 16:57:09,165 epoch 10 - iter 837/937 - loss 0.00068597 - time (sec): 111.74 - samples/sec: 1634.46 - lr: 0.000000 - momentum: 0.000000 2025-05-08 16:57:21,552 epoch 10 - iter 930/937 - loss 0.00067756 - time (sec): 124.13 - samples/sec: 1637.73 - lr: 0.000000 - momentum: 0.000000 2025-05-08 16:57:22,414 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:57:22,414 EPOCH 10 done: loss 0.0007 - lr: 0.000000 2025-05-08 16:57:27,886 DEV : loss 0.13694016635417938 - f1-score (micro avg) 0.9645 2025-05-08 16:57:29,653 ---------------------------------------------------------------------------------------------------- 2025-05-08 16:57:29,654 Loading model from best epoch ... 2025-05-08 16:57:29,959 -------------------------------------------------- 2025-05-08 16:57:29,959 - Loading SequenceTagger 2025-05-08 16:57:31,867 - Predicts 17 classes: ['O', 'S-LOC', 'B-LOC', 'E-LOC', 'I-LOC', 'S-PER', 'B-PER', 'E-PER', 'I-PER', 'S-ORG', 'B-ORG', 'E-ORG', 'I-ORG', 'S-MISC', 'B-MISC', 'E-MISC', 'I-MISC'] 2025-05-08 16:57:31,940 -------------------------------------------------- 2025-05-08 16:57:31,941 - Model license: No license information available 2025-05-08 16:57:31,941 -------------------------------------------------- 2025-05-08 16:57:36,988 Results: - F-score (micro) 0.9234 - F-score (macro) 0.9111 - Accuracy 0.8908 By class: precision recall f1-score support ORG 0.8935 0.9145 0.9039 1661 LOC 0.9315 0.9376 0.9346 1668 PER 0.9811 0.9635 0.9722 1617 MISC 0.8178 0.8504 0.8338 702 micro avg 0.9194 0.9274 0.9234 5648 macro avg 0.9060 0.9165 0.9111 5648 weighted avg 0.9204 0.9274 0.9238 5648 2025-05-08 16:57:36,988 ----------------------------------------------------------------------------------------------------