GenAi, Deep Learning and Computer Vision@awesomedeeplearning P.236

AWESOMEDEEPLEARNING Telegram 236

GenAi, Deep Learning and Computer Vision

Microsoft just casually shared their new Phi-3 LLMs less than a week after Llama 3 release. Based on the benchmarks in technical report (https://arxiv.org/abs/2404.14219), even the smallest Phi-3 model beats Llama 3 8B despite being less than half the size.

Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion)

Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B.

Despite being small enough to be deployed on a phone (according to report), it matches the performance of the much larger Mixtral 8x7B and GPT-3.5. (Phi-3 mini can be quantized to 4-bits, so it only requires ≈ 1.8GB of memory.)

What is the secret sauce? According to the technical report, it's dataset quality over quantity: "heavily filtered web data and synthetic data".

Next to the 4k context-window version, there's also a phi-3-mini-128K model that supports up to 128k tokens.

Fun fact: Phi-3 uses the same tokenizer with a vocabulary size of 32,064 as Llama 2.

❤4👍2

www.tgoop.com/awesomedeeplearning/236

35.9K viewsArtificial Intelligence, edited Apr 23, 2024 at 16:41

tgoop.com/awesomedeeplearning/236

Create: 2024-04-23
Last Update: 2025-10-15 11:52:53

Microsoft just casually shared their new Phi-3 LLMs less than a week after Llama 3 release. Based on the benchmarks in technical report (https://arxiv.org/abs/2404.14219), even the smallest Phi-3 model beats Llama 3 8B despite being less than half the size.

Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion)

Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B.

Despite being small enough to be deployed on a phone (according to report), it matches the performance of the much larger Mixtral 8x7B and GPT-3.5. (Phi-3 mini can be quantized to 4-bits, so it only requires ≈ 1.8GB of memory.)

What is the secret sauce? According to the technical report, it's dataset quality over quantity: "heavily filtered web data and synthetic data".

Next to the 4k context-window version, there's also a phi-3-mini-128K model that supports up to 128k tokens.

Fun fact: Phi-3 uses the same tokenizer with a vocabulary size of 32,064 as Llama 2.

BY GenAi, Deep Learning and Computer Vision

Share with your friend now:
tgoop.com/awesomedeeplearning/236

Open in Telegram

Telegram News

Date: 2025-10-15|

A new window will come up. Enter your channel name and bio. (See the character limits above.) Click “Create.” During a meeting with the president of the Supreme Electoral Court (TSE) on June 6, Telegram's Vice President Ilya Perekopsky announced the initiatives. According to the executive, Brazil is the first country in the world where Telegram is introducing the features, which could be expanded to other countries facing threats to democracy through the dissemination of false content. Telegram Channels requirements & features Matt Hussey, editorial director of NEAR Protocol (and former editor-in-chief of Decrypt) responded to the news of the Telegram group with “#meIRL.” The creator of the channel becomes its administrator by default. If you need help managing your channel, you can add more administrators from your subscriber base. You can provide each admin with limited or full rights to manage the channel. For example, you can allow an administrator to publish and edit content while withholding the right to add new subscribers.
from us

Telegram GenAi, Deep Learning and Computer Vision
FROM American