676bc3461109c31fa2f4d651f5f73e52

This model is a fine-tuned version of albert/albert-xxlarge-v1 on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	6.9710	0	1.4291	6.9724	2.2175	-2.1190
No log	1	179	3.4947	0.0078	1.8154	3.4958	1.5549	-0.5638
No log	2	358	2.3162	0.0156	1.8269	2.3172	1.3166	-0.0365
No log	3	537	2.3915	0.0312	2.0984	2.3922	1.2941	-0.0701
No log	4	716	1.9281	0.0625	2.7151	1.9288	1.1885	0.1372
No log	5	895	0.9311	0.125	3.4240	0.9315	0.7854	0.5833
0.1241	6	1074	0.8397	0.25	5.3528	0.8401	0.7577	0.6242
0.7157	7	1253	0.4542	0.5	8.8830	0.4543	0.5296	0.7968
0.4617	8.0	1432	0.5203	1.0	16.1616	0.5205	0.5758	0.7672
0.2733	9.0	1611	0.3796	1.0	16.0812	0.3798	0.4806	0.8301
0.2126	10.0	1790	0.4184	1.0	15.9094	0.4186	0.5001	0.8127
0.1354	11.0	1969	0.4583	1.0	15.9263	0.4585	0.5339	0.7949
0.0944	12.0	2148	0.3701	1.0	15.9811	0.3703	0.4641	0.8344
0.075	13.0	2327	0.3573	1.0	16.0454	0.3575	0.4602	0.8401
0.063	14.0	2506	0.3639	1.0	15.9814	0.3640	0.4587	0.8372
0.063	15.0	2685	0.3605	1.0	15.9348	0.3607	0.4583	0.8387
0.0508	16.0	2864	0.3564	1.0	16.0274	0.3565	0.4605	0.8405
0.0438	17.0	3043	0.3613	1.0	16.0141	0.3615	0.4557	0.8383
0.032	18.0	3222	0.3584	1.0	16.1577	0.3586	0.4589	0.8396
0.0425	19.0	3401	0.3999	1.0	16.1464	0.4000	0.4877	0.8210
0.0372	20.0	3580	0.3543	1.0	15.9491	0.3545	0.4557	0.8414
0.0421	21.0	3759	0.3642	1.0	16.1302	0.3643	0.4606	0.8370
0.0306	22.0	3938	0.3514	1.0	15.9895	0.3516	0.4460	0.8427
0.0261	23.0	4117	0.3589	1.0	14.0872	0.3590	0.4551	0.8394
0.0235	24.0	4296	0.3653	1.0	14.1242	0.3655	0.4629	0.8365
0.024	25.0	4475	0.3707	1.0	14.0228	0.3709	0.4701	0.8341
0.0175	26.0	4654	0.3856	1.0	14.0362	0.3858	0.4768	0.8274

Safetensors

Model size

0.2B params

Tensor type

F32

Base model

Finetuned

(19)

this model