КПД@quant_prune_distill P.249

QUANT_PRUNE_DISTILL Telegram 249

Релиз 🦙3 таки не оказался 18-апрельской шуткой. Выпустили ровно спустя 8 месяцев после 2-ой версии.

Что известно на данный момент.

Обучение
1️⃣ 15 T токенов на обучении (в 7 раз больше, чем у Llama-2)
2️⃣ 8к контекстное окно
3️⃣ 95% обучающей выборки на английском, и остальные 5% - на других 30 языцех
4️⃣ Instruction-finetuning включает SFT, DPO, PPO

Модель
1️⃣ Архитектура не поменялась (не MoE)
2️⃣ 8B - тоже GQA
3️⃣ Размер токенизатора увеличили до 128к

Метрики
1️⃣ 8B модель бьет модели аналогичного размера (Mistral, Gemma) на бенчах
2️⃣ 70B модель бьет Gemini-Pro-1, 1.5, Mixtral 8x22B и Claude 3 Sonnet

В ходе разработки собрали свой датасет из 1800 разнообразны инструкций на котором замерялись.

Что еще обещают
1️⃣ 400B модель, которая еще учится. Предьявили метрики на чекпоинте от 15 апреля.
2️⃣ Будет техрепорт.
3️⃣ Накатят еще более длинный контекст.

[Блог]
[Коллекция на хабе]

Introducing Meta Llama 3: The most capable openly available LLM to date

Today, we’re introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model. In the coming months, we expect to share new capabilities, additional model sizes, and more.

🔥13👍3🕊2

www.tgoop.com/quant_prune_distill/249

1.46K viewsedited Apr 18, 2024 at 16:23

tgoop.com/quant_prune_distill/249

Create: 2024-04-18
Last Update: 2025-09-09 00:16:10

Релиз 🦙3 таки не оказался 18-апрельской шуткой. Выпустили ровно спустя 8 месяцев после 2-ой версии.

Что известно на данный момент.

Обучение
1️⃣ 15 T токенов на обучении (в 7 раз больше, чем у Llama-2)
2️⃣ 8к контекстное окно
3️⃣ 95% обучающей выборки на английском, и остальные 5% - на других 30 языцех
4️⃣ Instruction-finetuning включает SFT, DPO, PPO

Модель
1️⃣ Архитектура не поменялась (не MoE)
2️⃣ 8B - тоже GQA
3️⃣ Размер токенизатора увеличили до 128к

Метрики
1️⃣ 8B модель бьет модели аналогичного размера (Mistral, Gemma) на бенчах
2️⃣ 70B модель бьет Gemini-Pro-1, 1.5, Mixtral 8x22B и Claude 3 Sonnet

В ходе разработки собрали свой датасет из 1800 разнообразны инструкций на котором замерялись.

Что еще обещают
1️⃣ 400B модель, которая еще учится. Предьявили метрики на чекпоинте от 15 апреля.
2️⃣ Будет техрепорт.
3️⃣ Накатят еще более длинный контекст.

[Блог]
[Коллекция на хабе]

BY КПД

Share with your friend now:
tgoop.com/quant_prune_distill/249

Open in Telegram

Telegram News

Date: 2025-09-09|

The group also hosted discussions on committing arson, Judge Hui said, including setting roadblocks on fire, hurling petrol bombs at police stations and teaching people to make such weapons. The conversation linked to arson went on for two to three months, Hui said. Joined by Telegram's representative in Brazil, Alan Campos, Perekopsky noted the platform was unable to cater to some of the TSE requests due to the company's operational setup. But Perekopsky added that these requests could be studied for future implementation. A new window will come up. Enter your channel name and bio. (See the character limits above.) Click “Create.” How to build a private or public channel on Telegram? Find your optimal posting schedule and stick to it. The peak posting times include 8 am, 6 pm, and 8 pm on social media. Try to publish serious stuff in the morning and leave less demanding content later in the day.
from us

Telegram КПД
FROM American