Warning: mkdir(): No space left on device in /var/www/tgoop/post.php on line 37

Warning: file_put_contents(aCache/aDaily/post/v2ex_hot/--): Failed to open stream: No such file or directory in /var/www/tgoop/post.php on line 50
V2EX 最热主题@v2ex_hot P.3618
V2EX_HOT Telegram 3618
V2EX-最热主题
C++如何优化矩阵乘法 gemm
#v2ex

Avafly:

最近在用 C 手写模型推理, 其中 gemm 可以说是核心计算, 于是决定以学习为目的自己尝试优化一下.

用 3 个 for 循环可以实现最基本的矩阵乘法, 在我用 simd, blocking, 并行计算这些方法之后, 速度比 naive 版本的快了很多, 但还是会比 openblas 慢不少. 接下来该怎么做有点没头绪了. 我想知道有没有办法能进一步提升? 谢谢

代码地址: https://github.com/Avafly/optimize-gemm

source



tgoop.com/v2ex_hot/3618
Create:
Last Update:

V2EX-最热主题
C++如何优化矩阵乘法 gemm
#v2ex

Avafly:

最近在用 C 手写模型推理, 其中 gemm 可以说是核心计算, 于是决定以学习为目的自己尝试优化一下.

用 3 个 for 循环可以实现最基本的矩阵乘法, 在我用 simd, blocking, 并行计算这些方法之后, 速度比 naive 版本的快了很多, 但还是会比 openblas 慢不少. 接下来该怎么做有点没头绪了. 我想知道有没有办法能进一步提升? 谢谢

代码地址: https://github.com/Avafly/optimize-gemm

source

BY V2EX 最热主题


Share with your friend now:
tgoop.com/v2ex_hot/3618

View MORE
Open in Telegram


Telegram News

Date: |

During the meeting with TSE Minister Edson Fachin, Perekopsky also mentioned the TSE channel on the platform as one of the firm's key success stories. Launched as part of the company's commitments to tackle the spread of fake news in Brazil, the verified channel has attracted more than 184,000 members in less than a month. To edit your name or bio, click the Menu icon and select “Manage Channel.” SUCK Channel Telegram The initiatives announced by Perekopsky include monitoring the content in groups. According to the executive, posts identified as lacking context or as containing false information will be flagged as a potential source of disinformation. The content is then forwarded to Telegram's fact-checking channels for analysis and subsequent publication of verified information. Channel login must contain 5-32 characters
from us


Telegram V2EX 最热主题
FROM American