DLINNLP Telegram 1781
Soumith Chintala (создатель pytorch) выдаёт базу о том как тренироваться на 10К GPU
x.com/soumithchintala/status/1841498799652708712

Оч короткий TL;DR (всем рекомендую прочитать оригинал, он не длинный)

1. Maximize batch size and GPU utilization: 3D parallelism + gradient checkpointing
1. Overlap communication, e.g. while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce
1. Optimize for your GPU cluster network topology

1. Failure recovery, at 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc
1. At 10K scale bit flips actually become a problem and can cause loss explosions. Save your model state as frequently and as quickly as you can. To speed it up save it in shards and to CPU memory first and then in a seaprate thread write to disk
🔥3720👍9



tgoop.com/dlinnlp/1781
Create:
Last Update:

Soumith Chintala (создатель pytorch) выдаёт базу о том как тренироваться на 10К GPU
x.com/soumithchintala/status/1841498799652708712

Оч короткий TL;DR (всем рекомендую прочитать оригинал, он не длинный)

1. Maximize batch size and GPU utilization: 3D parallelism + gradient checkpointing
1. Overlap communication, e.g. while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce
1. Optimize for your GPU cluster network topology

1. Failure recovery, at 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc
1. At 10K scale bit flips actually become a problem and can cause loss explosions. Save your model state as frequently and as quickly as you can. To speed it up save it in shards and to CPU memory first and then in a seaprate thread write to disk

BY DL in NLP


Share with your friend now:
tgoop.com/dlinnlp/1781

View MORE
Open in Telegram


Telegram News

Date: |

How to Create a Private or Public Channel on Telegram? Your posting frequency depends on the topic of your channel. If you have a news channel, it’s OK to publish new content every day (or even every hour). For other industries, stick with 2-3 large posts a week. The main design elements of your Telegram channel include a name, bio (brief description), and avatar. Your bio should be: The best encrypted messaging apps The Channel name and bio must be no more than 255 characters long
from us


Telegram DL in NLP
FROM American