Higher-Order and Tuple-Based Massively-Parallel Prefix Sums
Date
2016-05
Authors
Maleki, Sepideh
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Prefix sums are an important parallel primitive, especially in massively-parallel programs. This thesis discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a GPU-friendly algorithm for computing prefix sums and other scans that supports higher orders and tuple values. Our templated CUDA implementation unifies all of these compu- tations in a single 100-statement kernel. SAM is communication-efficient in the sense that it minimizes main-memory accesses. When computing prefix sums of a million or more values, it outperforms Thrust and CUDPP on both a Titan X and a K40 GPU. On very large inputs, it is even faster than CUB on the Titan X. SAM outperforms CUB by up to a factor of 2.9 on higher-order prefix sums and by up to a factor of 2.6 on tuple-based prefix sums.
Description
Keywords
prefixsum, GPU, tuplebased, higherorder, scan
Citation
Maleki, S. (2016). Higher-order and tuple-based massively-parallel prefix sums (Unpublished thesis). Texas State University, San Marcos, Texas.