tgoop.com/awesomedeeplearning/230
Last Update:
How big do LLMs need to be able to reason?๐ค Microsoft released Orca 2 this week, a 13B Llama-based LLM trained on complex tasks and reasoning. ๐ง Orca's performance comes from its use of synthetically generated data from bigger LLMs. I took a deeper look at paper and extracted the implementation and other insights.
๐๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป:
1๏ธโฃ Constructed a new dataset (Orca 2) with ~817K samples using prompts from FLAN, and GPT-4 to generate reasoning responses with the help of detailed system prompts.
2๏ธโฃ Grouped prompts into categories based on similarity to assign tailored system prompt that demonstrate different reasoning techniques.
3๏ธโฃ Replaced the original system prompt with a more generic one, to have the model learn the underlying reasoning strategy (Prompt erasing).
4๏ธโฃ Used progressive learning, starting with finetune Llama on FLAN-v2 (1 ep) , retrain on 5M ChatGPT data from Orca 1 (3 ep), combine 1M GPT-4 data from Orca 1 & 800k new Orca 2 data for final training (4 ep).
๐๐ป๐๐ถ๐ด๐ต๐๐:
๐ Imitation learning can improve capabilities with enough data.
๐ฌ Reasoning and longer generations to get the correct answer help smaller models to compete with bigger LLMs.
๐ซ Prompt Erasing helped Orca to โlearnโ reasoning
๐ฏ Lowest hallucination rates of comparable models on summarization
โ๏ธ Used packing for training, concatenating multiple examples into one sequence.
๐จโ๐ฆฏ Masked user & system inputs (prompt) and only used generation for loss
๐ฅ Trained on 32 A100 for 80h
Paper: https://huggingface.co/papers/2311.11045
Model: https://huggingface.co/microsoft/Orca-2-13b
BY GenAi, Deep Learning and Computer Vision

Share with your friend now:
tgoop.com/awesomedeeplearning/230