cuTranspose: a library to transpose 3D arrays in Nvidia CUDA GPUs

cuTranspose is a library to transpose 3D arrays in Nvidia CUDA GPUs. It is written in CUDA C and all its functionality is exposed through C functions. The library is based on the transpositions described in this article: Jose L. Jodra, Ibai Gurrutxaga and Javier Muguerza. "Efficient 3D Transpositions in Graphics Processing Units" International Journal of Parallel Programming, 43:4, pp. 876-891, 2015. Please cite us in your publications if you use cuTranspose.

The last version of the library is located at http://www.aldapa.eus/res/cuTranspose/.

This document shows how to build and use this library.

Index

Installation

To build this library you will need the Nvidia CUDA SDK and the CMAKE builiding system installed. You have to specify the build configuration through CMake. The most important configuration elements are:

We recommend NOT to build the library in the source code tree, so you should create a new folder. For example, you can type the following commands in a linux system:

mkdir build
cd build
ccmake ..
make

This commands build the code and create 3 files for you in the build folder.

Using the library

The library has a single C function that allows performing every kind of 3D transpositions. This transpositions are named xzy, yxz, yzx, zxy and zyx. As an example, let's define A, a 3D array of size nx*ny*nz points. The element A(i,j,k) will be in position (i + j*nx + k*nx*ny). If we perform a yzx transposition, the size of the transposed array, A', will be ny*nz*nx, the previously mentioned element will be stored in A'(j,k,i) and its new offset will be (j + k*ny + i*ny*nz). For more information see the article mentioned in the introduction.

The function that performs the 3D transposition is named cut_transpose3d and its prototype is

int cut_transpose3d( data_t*       output,
                     const data_t* input,
                     const int*    size,
                     const int*    permutation,
                     int           elements_per_thread )

The return value is 0 for a successful execution and -1 otherwise. The meaning of each parameter is explained below:

Copyright license

cuTranspose is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

cuTranspose is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with cuTranspose. If not, see http://www.gnu.org/licenses/.

Copyright 2016 Ibai Gurrutxaga, Javier Muguerza, Jose L. Jodra.