Notice: file_put_contents(): Write of 3318 bytes failed with errno=28 No space left on device in /var/www/tgoop/post.php on line 50

Warning: file_put_contents(): Only 12288 of 15606 bytes written, possibly out of free disk space in /var/www/tgoop/post.php on line 50
DL in NLP@dlinnlp P.1630
DLINNLP Telegram 1630
Current Best Practices for Training LLMs from Scratch

Забавный документ от Wandb в котором описывают важные вещи для тренировки LLM:
1. Scaling laws, compute-optimal training
1. Data Parallelism, Tensor Parallelism, Pipeline parallelism
1. Data clearning, deduplication, and upsampling
1. Hyperparameters (high-level)
1. Evaluation
1. Instruction tuning, RLHF

Гайд оч классный, местами немного устаревший, например не обсуждает zero-redundancy opitmizers (DeepSpeed) или Chinchilla trap — если вам надо деплоить, модели выгодно тренировать на большем числе токенов чем оптимально. В общем даёт неплохой high-level overview.
10👍5



tgoop.com/dlinnlp/1630
Create:
Last Update:

Current Best Practices for Training LLMs from Scratch

Забавный документ от Wandb в котором описывают важные вещи для тренировки LLM:
1. Scaling laws, compute-optimal training
1. Data Parallelism, Tensor Parallelism, Pipeline parallelism
1. Data clearning, deduplication, and upsampling
1. Hyperparameters (high-level)
1. Evaluation
1. Instruction tuning, RLHF

Гайд оч классный, местами немного устаревший, например не обсуждает zero-redundancy opitmizers (DeepSpeed) или Chinchilla trap — если вам надо деплоить, модели выгодно тренировать на большем числе токенов чем оптимально. В общем даёт неплохой high-level overview.

BY DL in NLP


Share with your friend now:
tgoop.com/dlinnlp/1630

View MORE
Open in Telegram


Telegram News

Date: |

While some crypto traders move toward screaming as a coping mechanism, many mental health experts have argued that “scream therapy” is pseudoscience. Scientific research or no, it obviously feels good. The visual aspect of channels is very critical. In fact, design is the first thing that a potential subscriber pays attention to, even though unconsciously. When choosing the right name for your Telegram channel, use the language of your target audience. The name must sum up the essence of your channel in 1-3 words. If you’re planning to expand your Telegram audience, it makes sense to incorporate keywords into your name. Judge Hui described Ng as inciting others to “commit a massacre” with three posts teaching people to make “toxic chlorine gas bombs,” target police stations, police quarters and the city’s metro stations. This offence was “rather serious,” the court said. Commenting about the court's concerns about the spread of false information related to the elections, Minister Fachin noted Brazil is "facing circumstances that could put Brazil's democracy at risk." During the meeting, the information technology secretary at the TSE, Julio Valente, put forward a list of requests the court believes will disinformation.
from us


Telegram DL in NLP
FROM American