676bc3461109c31fa2f4d651f5f73e52
This model is a fine-tuned version of albert/albert-xxlarge-v1 on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:
- Loss: 0.3856
- Data Size: 1.0
- Epoch Runtime: 14.0362
- Mse: 0.3858
- Mae: 0.4768
- R2: 0.8274
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse | Mae | R2 |
|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 6.9710 | 0 | 1.4291 | 6.9724 | 2.2175 | -2.1190 |
| No log | 1 | 179 | 3.4947 | 0.0078 | 1.8154 | 3.4958 | 1.5549 | -0.5638 |
| No log | 2 | 358 | 2.3162 | 0.0156 | 1.8269 | 2.3172 | 1.3166 | -0.0365 |
| No log | 3 | 537 | 2.3915 | 0.0312 | 2.0984 | 2.3922 | 1.2941 | -0.0701 |
| No log | 4 | 716 | 1.9281 | 0.0625 | 2.7151 | 1.9288 | 1.1885 | 0.1372 |
| No log | 5 | 895 | 0.9311 | 0.125 | 3.4240 | 0.9315 | 0.7854 | 0.5833 |
| 0.1241 | 6 | 1074 | 0.8397 | 0.25 | 5.3528 | 0.8401 | 0.7577 | 0.6242 |
| 0.7157 | 7 | 1253 | 0.4542 | 0.5 | 8.8830 | 0.4543 | 0.5296 | 0.7968 |
| 0.4617 | 8.0 | 1432 | 0.5203 | 1.0 | 16.1616 | 0.5205 | 0.5758 | 0.7672 |
| 0.2733 | 9.0 | 1611 | 0.3796 | 1.0 | 16.0812 | 0.3798 | 0.4806 | 0.8301 |
| 0.2126 | 10.0 | 1790 | 0.4184 | 1.0 | 15.9094 | 0.4186 | 0.5001 | 0.8127 |
| 0.1354 | 11.0 | 1969 | 0.4583 | 1.0 | 15.9263 | 0.4585 | 0.5339 | 0.7949 |
| 0.0944 | 12.0 | 2148 | 0.3701 | 1.0 | 15.9811 | 0.3703 | 0.4641 | 0.8344 |
| 0.075 | 13.0 | 2327 | 0.3573 | 1.0 | 16.0454 | 0.3575 | 0.4602 | 0.8401 |
| 0.063 | 14.0 | 2506 | 0.3639 | 1.0 | 15.9814 | 0.3640 | 0.4587 | 0.8372 |
| 0.063 | 15.0 | 2685 | 0.3605 | 1.0 | 15.9348 | 0.3607 | 0.4583 | 0.8387 |
| 0.0508 | 16.0 | 2864 | 0.3564 | 1.0 | 16.0274 | 0.3565 | 0.4605 | 0.8405 |
| 0.0438 | 17.0 | 3043 | 0.3613 | 1.0 | 16.0141 | 0.3615 | 0.4557 | 0.8383 |
| 0.032 | 18.0 | 3222 | 0.3584 | 1.0 | 16.1577 | 0.3586 | 0.4589 | 0.8396 |
| 0.0425 | 19.0 | 3401 | 0.3999 | 1.0 | 16.1464 | 0.4000 | 0.4877 | 0.8210 |
| 0.0372 | 20.0 | 3580 | 0.3543 | 1.0 | 15.9491 | 0.3545 | 0.4557 | 0.8414 |
| 0.0421 | 21.0 | 3759 | 0.3642 | 1.0 | 16.1302 | 0.3643 | 0.4606 | 0.8370 |
| 0.0306 | 22.0 | 3938 | 0.3514 | 1.0 | 15.9895 | 0.3516 | 0.4460 | 0.8427 |
| 0.0261 | 23.0 | 4117 | 0.3589 | 1.0 | 14.0872 | 0.3590 | 0.4551 | 0.8394 |
| 0.0235 | 24.0 | 4296 | 0.3653 | 1.0 | 14.1242 | 0.3655 | 0.4629 | 0.8365 |
| 0.024 | 25.0 | 4475 | 0.3707 | 1.0 | 14.0228 | 0.3709 | 0.4701 | 0.8341 |
| 0.0175 | 26.0 | 4654 | 0.3856 | 1.0 | 14.0362 | 0.3858 | 0.4768 | 0.8274 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- -
Model tree for contemmcm/676bc3461109c31fa2f4d651f5f73e52
Base model
albert/albert-xxlarge-v1