DAVID_RANDOM Telegram 574
想着五一前后挑战llama4 400B发现并没什么挑战性,q4模型用7970X (150GB/s)纯CPU prefll 108 t/s decode 13.8 t/s,用8G显存offload dense层27 t/s,塞满双卡96G显存能30.8 t/s

不过llama.cpp的override tensors的prefill看起来是用纯GPU走PCIe访问内存里的模型,还有优化空间。至少不应该比纯CPU差



tgoop.com/david_random/574
Create:
Last Update:

想着五一前后挑战llama4 400B发现并没什么挑战性,q4模型用7970X (150GB/s)纯CPU prefll 108 t/s decode 13.8 t/s,用8G显存offload dense层27 t/s,塞满双卡96G显存能30.8 t/s

不过llama.cpp的override tensors的prefill看起来是用纯GPU走PCIe访问内存里的模型,还有优化空间。至少不应该比纯CPU差

BY David's random thoughts




Share with your friend now:
tgoop.com/david_random/574

View MORE
Open in Telegram


Telegram News

Date: |

The initiatives announced by Perekopsky include monitoring the content in groups. According to the executive, posts identified as lacking context or as containing false information will be flagged as a potential source of disinformation. The content is then forwarded to Telegram's fact-checking channels for analysis and subsequent publication of verified information. bank east asia october 20 kowloon Click “Save” ; Hui said the messages, which included urging the disruption of airport operations, were attempts to incite followers to make use of poisonous, corrosive or flammable substances to vandalize police vehicles, and also called on others to make weapons to harm police. Matt Hussey, editorial director of NEAR Protocol (and former editor-in-chief of Decrypt) responded to the news of the Telegram group with “#meIRL.”
from us


Telegram David's random thoughts
FROM American