High-performance and programmable attentional graph neural networks with global tensor formulations

M Besta; P Renc; R Gerstenberger; Paolo Sylos Labini; A Ziogas; T Chen; L Gianinazzi; F Scheidl; K Szenes; A Carigiet; P Iff; G Kwasniewski; R Kanakagiri; C Ge; S Jaeger; J Wąs; F Vella; T Hoefler

doi:10.1145/3581784.3607067

Back

High-performance and programmable attentional graph neural networks with global tensor formulations

Conference proceeding

Open access

Peer reviewed

High-performance and programmable attentional graph neural networks with global tensor formulations

M Besta, P Renc, R Gerstenberger, Paolo Sylos Labini, A Ziogas, T Chen, L Gianinazzi, F Scheidl, K Szenes, A Carigiet, …

SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-16

2023 International Conference for High Performance Computing, Networking, Storage, and Analysis (Denver,CO, 12/11/2023–17/11/2023)

2023

DOI: https://doi.org/10.1145/3581784.3607067

Handle:

https://hdl.handle.net/10863/51766

Abstract

Graph Attention Models

Graph neural networks

Sparse-Dense Tensor Operations

Graph attention models (A-GNNs), a type of Graph Neural Networks (GNNs), have been shown to be more powerful than simpler convolutional GNNs (C-GNNs). However, A-GNNs are more complex to program and difficult to scale. To address this, we develop a novel mathematical formulation, based on tensors that group all the feature vectors, targeting both training and inference of A-GNNs. The formulation enables straightforward adoption of communication-minimizing routines, it fosters optimizations such as vectorization, and it enables seamless integration with established linear algebra DSLs or libraries such as GraphBLAS. Our implementation uses a data redistribution scheme explicitly developed for sparse-dense tensor operations used heavily in GNNs, and fusing optimizations that further minimize memory usage and communication cost. We ensure theoretical asymptotic reductions in communicated data compared to the established message-passing GNN paradigm. Finally, we provide excellent scalability and speedups of even 4--5x over modern libraries such as Deep Graph Library.

Files and links (2)

pdf

3581784.36070672.65 MBDownload View

Open Access

url

https://dl.acm.org/doi/10.1145/3581784.3607067View

Details

Title: High-performance and programmable attentional graph neural networks with global tensor formulations
Creators: M Besta - ETH Zurich
P Renc - AGH University of Krakow
R Gerstenberger - ETH Zurich
Paolo Sylos Labini
A Ziogas - ETH Zurich
T Chen - ETH Zurich
L Gianinazzi - ETH Zurich
F Scheidl - ETH Zurich
K Szenes - ETH Zurich
A Carigiet - ETH Zurich
P Iff - ETH Zurich
G Kwasniewski
R Kanakagiri - University of Illinois Urbana-Champaign
C Ge - ETH Zurich
S Jaeger - ETH Zurich
J Wąs - AGH University of Krakow
F Vella - University of Trento
T Hoefler - ETH Zurich
Publication Details: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-16
Editor(s): Arnold D
ISBN: 9798400701092
Conference: 2023 International Conference for High Performance Computing, Networking, Storage, and Analysis (Denver,CO, 12/11/2023–17/11/2023)
Publisher: IEEE
Format: Online
Number of pages: 16
Identifiers: 979-8-4007-0109-2
(UNIBZ)93984941
991007295553701241
Scopus ID: n.a.
Copyright: Free Access
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Conference proceeding
Author Names String: Besta M, Renc P, Gerstenberger R, Sylos Labini P, Ziogas A, Chen T, Gianinazzi L, Scheidl F, Szenes K, Carigiet A, Iff P, Kwasniewski G, Kanakagiri R, Ge C, Jaeger S, Wąs J, Vella F, Hoefler T
Additional Description: Editors/Supervisors: Arnold D

Metrics

1 Record Views