tgoop.com/bigdatai/904
Last Update:
Zamba2-Instruct - ΡΠ΅ΠΌΠ΅ΠΉΡΡΠ²ΠΎ ΠΈΠ½ΡΡΡΡΠΊΡΠΈΠ²Π½ΡΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π½Π° Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ΅ Mamba2+Transformers Π΄Π»Ρ NLP-Π·Π°Π΄Π°Ρ.
Π ΡΠ΅ΠΌΠ΅ΠΉΡΡΠ²Π΅ 2 ΠΌΠΎΠ΄Π΅Π»ΠΈ:
ΠΡΡΠΎΠΊΠ°Ρ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΡΡΡ ΡΠ΅ΠΌΠ΅ΠΉΡΡΠ²Π° ΠΏΠΎ ΡΡΠ°Π²Π½Π΅Π½ΠΈΡ Ρ ΡΠ΅Π»Π΅Π²Π°Π½ΡΠ½ΡΠΌΠΈ Transformers-only ΠΌΠΎΠ΄Π΅Π»ΡΠΌΠΈ Π΄ΠΎΡΡΠΈΠ³Π°Π΅ΡΡΡ Π·Π° ΡΡΠ΅Ρ ΠΊΠΎΠ½ΠΊΠ°ΡΠ΅Π½Π°ΡΠΈΠΈ ΡΠΌΠ±Π΅Π΄ΠΈΠ½Π³ΠΎΠ² ΠΌΠΎΠ΄Π΅Π»ΠΈ Ρ Π²Ρ
ΠΎΠ΄Π½ΡΠΌΠΈ Π΄Π°Π½Π½ΡΠΌΠΈ Π΄Π»Ρ Π±Π»ΠΎΠΊΠ° Π²Π½ΠΈΠΌΠ°Π½ΠΈΡ ΠΈ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ LoRA projection matrices ΠΊ ΠΎΠ±ΡΠ΅ΠΌΡ MLP-ΡΠ»ΠΎΡ.
ΠΠΎΠ΄Π΅Π»ΠΈ ΡΠ°ΠΉΠ½ΡΡΠ½ΠΈΠ»ΠΈΡΡ (SFT+DPO) Π½Π° instruct-ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
Π½Π°Π±ΠΎΡΠ°Ρ
Π΄Π°Π½Π½ΡΡ
(ultrachat_200k, Infinity-Instruct, ultrafeedback_binarized, orca_dpo_pairs ΠΈ OpenHermesPreferences).
Π’Π΅ΡΡΡ Zamba2-Instruct ΠΏΡΠΎΠ΄Π΅ΠΌΠΎΠ½ΡΡΡΠΈΡΠΎΠ²Π°Π»ΠΈ Π²Π½ΡΡΠΈΡΠ΅Π»ΡΠ½ΡΡ ΡΠΊΠΎΡΠΎΡΡΡ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ ΡΠ΅ΠΊΡΡΠ° ΠΈ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠ΅ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΏΠ°ΠΌΡΡΠΈ, ΠΎΠ±Ρ
ΠΎΠ΄Ρ MT-bench Π±ΠΎΠ»Π΅Π΅ ΠΊΡΡΠΏΠ½ΡΠ΅ ΠΏΠΎ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠΎΠ² ΠΌΠΎΠ΄Π΅Π»ΠΈ/ (Zamba2-Instruct-2.7B ΠΏΡΠ΅Π²Π·ΠΎΡΠ»Π° Mistral-7B-Instruct-v0.1, Π° Zamba2-Instruct-1.2B - Gemma2-2B-Instruct)
β οΈ ΠΠ»Ρ Π·Π°ΠΏΡΡΠΊΠ° Π½Π° Π‘PU ΡΠΊΠ°ΠΆΠΈΡΠ΅ use_mamba_kernels=False
ΠΏΡΠΈ Π·Π°Π³ΡΡΠ·ΠΊΠ΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ Ρ ΠΏΠΎΠΌΠΎΡΡΡ AutoModelForCausalLM.from_pretrained
.
# Clone repo
git clone https://github.com/Zyphra/transformers_zamba2.git
cd transformers_zamba2
# Install the repository & accelerate:
pip install -e .
pip install accelerate
# Inference:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)
user_turn_1 = "user_prompt1."
assistant_turn_1 = "assistant_prompt."
user_turn_2 = "user_prompt2."
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
@ai_machinelearning_big_data
#AI #ML #SLM #Zamba2 #Instruct