tgoop.com/pytorch_howsam/671
Create:
Last Update:
Last Update:
مقایسه زمانی BPE Tokenizer روی دو کتابخونه Hugging Face Tokenizers و OpenAI TikToken روی ولیدیشن دیتاست تاینیاستوریز:
dataset = load_dataset("roneneldan/TinyStories")
texts = dataset["validation"]["text"]
# Load the GPT-2 tokenizer for both libraries
tiktokenizer = tiktoken.get_encoding("gpt2") # tiktoken
hf_tokenizer = Tokenizer.from_pretrained("gpt2") # Hugging Face tokenizers
# Measure tiktoken speed
start_time = time.time()
tiktoken_results = [tiktokenizer.encode(text) for text in texts]
tiktoken_time = time.time() - start_time
# Measure tokenizers speed
start_time = time.time()
hf_results = [hf_tokenizer.encode(text).ids for text in texts]
hf_time = time.time() - start_time
# Print results
print(f"tiktoken Time: {tiktoken_time:.4f} seconds")
print(f"tokenizers Time: {hf_time:.4f} seconds")
tiktoken Time: 2.6481 seconds
tokenizers Time: 16.7744 seconds
BY PyTorch Howsam
Share with your friend now:
tgoop.com/pytorch_howsam/671