Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
update tutorial accordingly
  • Loading branch information
grlee77 committed Dec 15, 2014
1 parent 6a59ca4 commit 7391cbe
Showing 1 changed file with 19 additions and 17 deletions.
36 changes: 19 additions & 17 deletions doc/source/tutorial.rst
Expand Up @@ -20,7 +20,7 @@ Transferring Data
The next step in most programs is to transfer data onto the device.
In PyCuda, you will mostly transfer data from :mod:`numpy` arrays
on the host. (But indeed, everything that satisfies the Python buffer
interface will work, even a :class:`str`.) Let's make a 4x4 array
interface will work, even a :class:`str`.) Let's make a 4x4 array
of random numbers::

import numpy
Expand All @@ -31,7 +31,7 @@ devices only support single precision::

a = a.astype(numpy.float32)

Finally, we need somewhere to transfer data to, so we need to
Finally, we need somewhere to transfer data to, so we need to
allocate memory on the device::

a_gpu = cuda.mem_alloc(a.nbytes)
Expand All @@ -56,8 +56,8 @@ code, and feed it into the constructor of a
}
""")

If there aren't any errors, the code is now compiled and loaded onto the
device. We find a reference to our :class:`pycuda.driver.Function` and call
If there aren't any errors, the code is now compiled and loaded onto the
device. We find a reference to our :class:`pycuda.driver.Function` and call
it, specifying *a_gpu* as the argument, and a block size of 4x4::

func = mod.get_function("doublify")
Expand All @@ -81,9 +81,9 @@ This will print something like this::
[-0.37920788 -0.59378809 1.36134958 1.56078029]
[ 0.14413041 -1.46224082 0.60812396 1.43176913]
[ 0.78825873 0.31750482 1.10785341 -0.22268796]]
It worked! That completes our walkthrough. Thankfully, PyCuda takes
over from here and does all the cleanup for you, so you're done.

It worked! That completes our walkthrough. Thankfully, PyCuda takes
over from here and does all the cleanup for you, so you're done.
Stick around for some bonus material in the next section, though.

(You can find the code for this demo as :file:`examples/demo.py` in the PyCuda
Expand All @@ -109,13 +109,15 @@ to argument types (as designated by Python's standard library :mod:`struct`
module), and then called. This also avoids having to assign explicit argument
sizes using the `numpy.number` classes::

func.prepare("P", block=(4,4,1))
func.prepared_call((1, 1), a_gpu)
grid = (1, 1)
block = (4, 4, 1)
func.prepare("P")
func.prepared_call(grid, block, a_gpu)

Bonus: Abstracting Away the Complications
-----------------------------------------
Using a :class:`pycuda.gpuarray.GPUArray`, the same effect can be

Using a :class:`pycuda.gpuarray.GPUArray`, the same effect can be
achieved with much less writing::

import pycuda.gpuarray as gpuarray
Expand Down Expand Up @@ -144,7 +146,7 @@ length arrays::
int datalen, __padding; // so 64-bit ptrs can be aligned
float *ptr;
};

__global__ void double_array(DoubleOperation *a) {
a = &a[blockIdx.x];
for (int idx = threadIdx.x; idx < a->datalen; idx += blockDim.x) {
Expand All @@ -164,14 +166,14 @@ two arrays are instantiated::
def __init__(self, array, struct_arr_ptr):
self.data = cuda.to_device(array)
self.shape, self.dtype = array.shape, array.dtype
cuda.memcpy_htod(int(struct_arr_ptr), numpy.int32(array.size))
cuda.memcpy_htod(int(struct_arr_ptr) + 8, numpy.intp(int(self.data)))
cuda.memcpy_htod(int(struct_arr_ptr), numpy.getbuffer(numpy.int32(array.size)))
cuda.memcpy_htod(int(struct_arr_ptr) + 8, numpy.getbuffer(numpy.intp(int(self.data))))
def __str__(self):
return str(cuda.from_device(self.data, self.shape, self.dtype))

struct_arr = cuda.mem_alloc(2 * DoubleOpStruct.mem_size)
do2_ptr = int(struct_arr) + DoubleOpStruct.mem_size

array1 = DoubleOpStruct(numpy.array([1, 2, 3], dtype=numpy.float32), struct_arr)
array2 = DoubleOpStruct(numpy.array([0, 4], dtype=numpy.float32), do2_ptr)
print("original arrays", array1, array2)
Expand All @@ -185,7 +187,7 @@ only the second::
func = mod.get_function("double_array")
func(struct_arr, block = (32, 1, 1), grid=(2, 1))
print("doubled arrays", array1, array2)

func(numpy.intp(do2_ptr), block = (32, 1, 1), grid=(1, 1))
print("doubled second only", array1, array2, "\n")

Expand Down

0 comments on commit 7391cbe

Please sign in to comment.