Higher-Order and Tuple-Based Massively-Parallel Prefix Sums

Date

2016-05

Authors

Maleki, Sepideh

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Prefix sums are an important parallel primitive, especially in massively-parallel programs. This thesis discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a GPU-friendly algorithm for computing prefix sums and other scans that supports higher orders and tuple values. Our templated CUDA implementation unifies all of these compu- tations in a single 100-statement kernel. SAM is communication-efficient in the sense that it minimizes main-memory accesses. When computing prefix sums of a million or more values, it outperforms Thrust and CUDPP on both a Titan X and a K40 GPU. On very large inputs, it is even faster than CUB on the Titan X. SAM outperforms CUB by up to a factor of 2.9 on higher-order prefix sums and by up to a factor of 2.6 on tuple-based prefix sums.

Description

Keywords

prefixsum, GPU, tuplebased, higherorder, scan

Citation

Maleki, S. (2016). Higher-order and tuple-based massively-parallel prefix sums (Unpublished thesis). Texas State University, San Marcos, Texas.

Rights

Rights Holder

Rights License

Rights URI