Pretraining Datasets wikimedia/wikipedia Viewer • Updated Jan 9, 2024 • 61.6M • 97.7k • 1.16k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 6.92k • 399 Skywork/SkyPile-150B Viewer • Updated Dec 7, 2023 • 1.76M • 21.1k • 403
Awesome Instruction Tuning Dataset Open-Orca/OpenOrca Viewer • Updated Feb 19, 2025 • 2.94M • 16.1k • 1.51k glaiveai/glaive-code-assistant Viewer • Updated Sep 27, 2023 • 136k • 428 • 99 silk-road/alpaca-data-gpt4-chinese Viewer • Updated May 23, 2023 • 52k • 867 • 103 anon8231489123/ShareGPT_Vicuna_unfiltered Updated Apr 12, 2023 • 136k • 851
Awesome Instruction Tuning Dataset Open-Orca/OpenOrca Viewer • Updated Feb 19, 2025 • 2.94M • 16.1k • 1.51k glaiveai/glaive-code-assistant Viewer • Updated Sep 27, 2023 • 136k • 428 • 99 silk-road/alpaca-data-gpt4-chinese Viewer • Updated May 23, 2023 • 52k • 867 • 103 anon8231489123/ShareGPT_Vicuna_unfiltered Updated Apr 12, 2023 • 136k • 851
Pretraining Datasets wikimedia/wikipedia Viewer • Updated Jan 9, 2024 • 61.6M • 97.7k • 1.16k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 6.92k • 399 Skywork/SkyPile-150B Viewer • Updated Dec 7, 2023 • 1.76M • 21.1k • 403