Mmul TB 4_16_16

top Top: -

par Par: 24 lines

Problem Statement

\(4 \times 16\) 行列 \(A\) と、\(16 \times 16\) 行列 \(B\) に対して、行列積 \(C = A \times B^T\) (shape:\(4 \times 16\)) を計算してください。\(A,B,C\) のレイアウトは以下のとおりです。

A: ((4:2), (2:1, 4_PE:1, 2_W:1)) B: ((16:2), (2:1, 4_PE:1, 2_W:1)) C: ((4:2), (2:1, 4_PE:1, 2_W:1))
\(A, B, C\) の値はこちらです。

A:

[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [ 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [ 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], [ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]]

B:

[[100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115], [116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131], [132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147], [148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163], [164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179], [180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195], [196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211], [212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227], [228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243], [244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259], [260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275], [276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291], [292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307], [308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323], [324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339], [340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355]]

C:
import numpy as np A = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [ 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [ 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], [ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]]) B = np.array([[100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115], [116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131], [132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147], [148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163], [164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179], [180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195], [196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211], [212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227], [228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243], [244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259], [260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275], [276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291], [292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307], [308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323], [324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339], [340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355]]) A @ B.T

[[ 13240, 15160, 17080, 19000, 20920, 22840, 24760, 26680, 28600, 30520, 32440, 34360, 36280, 38200, 40120, 42040], [ 40760, 46776, 52792, 58808, 64824, 70840, 76856, 82872, 88888, 94904,100920,106936,112952,118968,124984,131000], [ 68280, 78392, 88504, 98616,108728,118840,128952,139064,149176,159288,169400,179512,189624,199736,209848,219960], [ 95800,110008,124216,138424,152632,166840,181048,195256,209464,223672,237880,252088,266296,280504,294712,308920]]

Explanation

Mmul TB 4_8_8 をベースに、Mmul TB 4_8_16Mmul TB 4_16_8 の合せ技です。

gmmul を 2 回、gmfma を 2 回使います。

回答例(1/4)
gbfn $lm16v4 $nowrite gmwrite $aluf $ly0 gbfn $lm32v4 $nowrite gmwrite $aluf $ly4 gbfn $lm0v4 $nowrite gmmul $ly $aluf $ls0v4

Inputs

Outputs

Testcases

testcase.vsm

Submission

ログイン / 新規登録