High performance unstructured spmm computation using tensor cores

P Okanovic; G Kwasniewski; Paolo Sylos Labini; M Besta; T Hoefler

doi:10.1109/SC41406.2024.00060

Back

High performance unstructured spmm computation using tensor cores

Conference proceeding

Open access

Peer reviewed

High performance unstructured spmm computation using tensor cores

P Okanovic, G Kwasniewski, Paolo Sylos Labini, M Besta and T Hoefler

Proceedings of SC24: The International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, Georgia, November 17-22, 2024, pp.1-14

2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 (Atlanta, 17/11/2024–22/11/2024)

2024

DOI: https://doi.org/10.1109/SC41406.2024.00060

Handle:

https://hdl.handle.net/10863/51378

Abstract

Mathematics of computing

Matrix Multiplication

SpMM

Tensor Cores

High-performance sparse matrix-matrix (SpMM) multiplication is paramount for science and industry, as the ever-increasing sizes of data prohibit using dense data structures. Yet, existing hardware, such as Tensor Cores (TC), is ill-suited for SpMM, as it imposes strict constraints on data structures that cannot be met by unstructured sparsity found in many applications. To address this, we introduce (S)parse (Ma)trix Matrix (T)ensor Core-accelerated (SMaT): a novel SpMM library that utilizes TCs for unstructured sparse matrices. Our block-sparse library leverages the low-level CUDA MMA (matrix-matrix-accumulate) API, maximizing the performance offered by modern GPUs. Algorithmic optimizations such as sparse matrix permutation, further improve performance by minimizing the number of non-zero blocks. The evaluation on NVIDIA A100 shows that SMaT outperforms SotA libraries (DASP, cuSPARSE, and Magicube) by up to 125x (on average 2.6x). SMaT can be used to accelerate many workloads in scientific computing, large model training, inference, and others.

Files and links (3)

pdf

High_Performance_Unstructured_SpMM_Computation_Using_Tensor_Cores1.69 MBDownload View

Open Access

pdf

2408.11551v11.33 MBDownload View

Open Access

url

https://ieeexplore.ieee.org/abstract/document/10793184View

Details

Title: High performance unstructured spmm computation using tensor cores
Creators: P Okanovic - ETH Zurich
G Kwasniewski - ETH Zurich
Paolo Sylos Labini
M Besta - ETH Zurich
T Hoefler - ETH Zurich
Publication Details: Proceedings of SC24: The International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, Georgia, November 17-22, 2024, pp.1-14
ISBN: 9798350352917
Conference: 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 (Atlanta, 17/11/2024–22/11/2024)
Publisher: IEEE
Piscataway, NJ
Format: Online
Number of pages: 14
Identifiers: 979-8-3503-5291-7
(UNIBZ)93984453
991007296253501241
Web of Science ID: 001414891300063
Scopus ID: 2-s2.0-85215004295
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Conference proceeding
Author Names String: Okanovic P, Kwasniewski G, Sylos Labini P, Besta M, Vella F, Hoefler T

Metrics

1 Record Views