Burtscher, MartinMaleki, Sepideh2016-06-172016-06-172016-05Maleki, S. (2016). Higher-order and tuple-based massively-parallel prefix sums (Unpublished thesis). Texas State University, San Marcos, Texas.https://hdl.handle.net/10877/6056Prefix sums are an important parallel primitive, especially in massively-parallel programs. This thesis discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a GPU-friendly algorithm for computing prefix sums and other scans that supports higher orders and tuple values. Our templated CUDA implementation unifies all of these compu- tations in a single 100-statement kernel. SAM is communication-efficient in the sense that it minimizes main-memory accesses. When computing prefix sums of a million or more values, it outperforms Thrust and CUDPP on both a Titan X and a K40 GPU. On very large inputs, it is even faster than CUB on the Titan X. SAM outperforms CUB by up to a factor of 2.9 on higher-order prefix sums and by up to a factor of 2.6 on tuple-based prefix sums.Text73 pages1 file (.pdf)enprefixsumGPUtuplebasedhigherorderscanHigher-Order and Tuple-Based Massively-Parallel Prefix SumsThesis