PLUSH_PYTHON Telegram 85
Сегодня делюсь с вами крутым списком ключевых статей по NLP, опубликованных _после_ BERT, по мнению Ильи Гусева. Кто не знает, это один из авторитетнейших экспертов в NLP. Илья составил этот список в ответ на вопрос пользователя в одном из профильных чатов. Не знаю, будет ли где-то ещё публиковать, но я уверен, что этот список точно заслуживает внимания, поэтому решил репостнуть здесь.

- GPT-2, Radford et al., Language Models are Unsupervised Multitask Learners, https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- GPT-3, Brown et al, Language Models are Few-Shot Learners, https://arxiv.org/abs/2005.14165
- LaBSE, Feng et al., Language-agnostic BERT Sentence Embedding, https://arxiv.org/abs/2007.01852
- CLIP, Radford et al., Learning Transferable Visual Models From Natural Language Supervision, https://arxiv.org/abs/2103.00020
- RoPE, Su et al., RoFormer: Enhanced Transformer with Rotary Position Embedding, https://arxiv.org/abs/2104.09864
- LoRA, Hu et al., LoRA: Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2106.09685
- InstructGPT, Ouyang et al., Training language models to follow instructions with human feedback, https://arxiv.org/abs/2203.02155
- Scaling laws, Hoffmann et al., Training Compute-Optimal Large Language Models, https://arxiv.org/abs/2203.15556
- FlashAttention, Dao et al., FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, https://arxiv.org/abs/2205.14135
- NLLB, NLLB team, No Language Left Behind: Scaling Human-Centered Machine Translation, https://arxiv.org/abs/2207.04672
- Q8, Dettmers et al., LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale, https://arxiv.org/abs/2208.07339
- Self-instruct, Wang et al., Self-Instruct: Aligning Language Models with Self-Generated Instructions, https://arxiv.org/abs/2212.10560
- Alpaca, Taori et al., Alpaca: A Strong, Replicable Instruction-Following Model, https://crfm.stanford.edu/2023/03/13/alpaca.html
- LLaMA, Touvron, et al., LLaMA: Open and Efficient Foundation Language Models, https://arxiv.org/abs/2302.13971

P.S. Коллеги из того чата также предложили добавить в список статью про Flan-T5, я согласен с этим: https://arxiv.org/abs/2210.11416



tgoop.com/plush_python/85
Create:
Last Update:

Сегодня делюсь с вами крутым списком ключевых статей по NLP, опубликованных _после_ BERT, по мнению Ильи Гусева. Кто не знает, это один из авторитетнейших экспертов в NLP. Илья составил этот список в ответ на вопрос пользователя в одном из профильных чатов. Не знаю, будет ли где-то ещё публиковать, но я уверен, что этот список точно заслуживает внимания, поэтому решил репостнуть здесь.

- GPT-2, Radford et al., Language Models are Unsupervised Multitask Learners, https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- GPT-3, Brown et al, Language Models are Few-Shot Learners, https://arxiv.org/abs/2005.14165
- LaBSE, Feng et al., Language-agnostic BERT Sentence Embedding, https://arxiv.org/abs/2007.01852
- CLIP, Radford et al., Learning Transferable Visual Models From Natural Language Supervision, https://arxiv.org/abs/2103.00020
- RoPE, Su et al., RoFormer: Enhanced Transformer with Rotary Position Embedding, https://arxiv.org/abs/2104.09864
- LoRA, Hu et al., LoRA: Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2106.09685
- InstructGPT, Ouyang et al., Training language models to follow instructions with human feedback, https://arxiv.org/abs/2203.02155
- Scaling laws, Hoffmann et al., Training Compute-Optimal Large Language Models, https://arxiv.org/abs/2203.15556
- FlashAttention, Dao et al., FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, https://arxiv.org/abs/2205.14135
- NLLB, NLLB team, No Language Left Behind: Scaling Human-Centered Machine Translation, https://arxiv.org/abs/2207.04672
- Q8, Dettmers et al., LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale, https://arxiv.org/abs/2208.07339
- Self-instruct, Wang et al., Self-Instruct: Aligning Language Models with Self-Generated Instructions, https://arxiv.org/abs/2212.10560
- Alpaca, Taori et al., Alpaca: A Strong, Replicable Instruction-Following Model, https://crfm.stanford.edu/2023/03/13/alpaca.html
- LLaMA, Touvron, et al., LLaMA: Open and Efficient Foundation Language Models, https://arxiv.org/abs/2302.13971

P.S. Коллеги из того чата также предложили добавить в список статью про Flan-T5, я согласен с этим: https://arxiv.org/abs/2210.11416

BY Плюшевый Питон


Share with your friend now:
tgoop.com/plush_python/85

View MORE
Open in Telegram


Telegram News

Date: |

Judge Hui described Ng as inciting others to “commit a massacre” with three posts teaching people to make “toxic chlorine gas bombs,” target police stations, police quarters and the city’s metro stations. This offence was “rather serious,” the court said. A Telegram channel is used for various purposes, from sharing helpful content to implementing a business strategy. In addition, you can use your channel to build and improve your company image, boost your sales, make profits, enhance customer loyalty, and more. It’s yet another bloodbath on Satoshi Street. As of press time, Bitcoin (BTC) and the broader cryptocurrency market have corrected another 10 percent amid a massive sell-off. Ethereum (EHT) is down a staggering 15 percent moving close to $1,000, down more than 42 percent on the weekly chart. How to create a business channel on Telegram? (Tutorial) The main design elements of your Telegram channel include a name, bio (brief description), and avatar. Your bio should be:
from us


Telegram Плюшевый Питон
FROM American