c++ - OpenCL struct values correct on CPU but not on GPU -
i have struct in file wich included host code , kernel
typedef struct { float x, y, z, dir_x, dir_y, dir_z; int radius; } workliststruct;
i'm building struct in c++ host code , passing via buffer opencl kernel.
if i'm choosing cpu device computation following result:
printf ( "item:[%f,%f,%f][%f,%f,%f]%d,%d\n", item.x, item.y, item.z, item.dir_x, item.dir_y, item.dir_z , item.radius ,sizeof(float));
host:
item:[20.169043,7.000000,34.933712][0.000000,-3.000000,0.000000]1,4
device (cpu):
item:[20.169043,7.000000,34.933712][0.000000,-3.000000,0.000000]1,4
and if choose gpu device (amd) computation weird things happening:
host:
item:[58.406261,57.786015,58.137501][2.000000,2.000000,2.000000]2,4
device (gpu):
item:[58.406261,2.000000,0.000000][0.000000,0.000000,0.000000]0,0
notable sizeof(float) garbage on gpu.
i assume there problem layouts of floats on different devices.
note: struct contained in array of structs of type , every struct in array garbage on gpu
anyone have idea why case , how can predict this?
edit added %d @ , and replaced 1, result is:1065353216
edit: here 2 structs wich i'm using
typedef struct { float x, y, z,//base coordinates dir_x, dir_y, dir_z;//directio int radius;//radius } workliststruct; typedef struct { float base_x, base_y, base_z; //base point float radius;//radius float dir_x, dir_y, dir_z; //initial direction } returnstruct;
i tested other things, looks problem printf. values seems right. passed arguments return struct, read them , these values correct.
i don't want post of related code, few hundred lines. if noone has idea compress bit.
ah, , printing i'm using #pragma opencl extension cl_amd_printf : enable
.
edit: looks problem printf. don't use anymore.
there simple method check happens:
1 - create host-side data & initialize it:
int num_points = 128; std::vector<workliststruct> works(num_points); std::vector<returnstruct> returns(num_points); for(workliststruct &work : works){ work = initializeitsomehow(); std::cout << work.x << " " << work.y << " " << work.z << std::endl; std::cout << work.radius << std::endl; } // same stuff returns ...
2 - create device-side buffers using copy_host_ptr flag, map & check data consistency:
cl::buffer dev_works(..., copy_host_ptr, (void*)&works[0]); cl::buffer dev_rets(..., copy_host_ptr, (void*)&returns[0]); // map check data workliststruct *mapped_works = dev_works.map(...); returnstruct *mapped_rets = dev_rets.map(...); // output values & unmap buffers ...
3 - check data consistency on device side did previously.
also, make sure code (presumably - header), included both kernel & host-side code pure opencl c (amd compiler can "swallow" errors) , you've imported directory includes searching, when building opencl kernel ("-i" flag @ clbuildprogramm stage)
edited: @ every step, please collect return codes (or catch exceptions). beside that, "-werror" flag @ clbuildprogramm stage can helpfull.
Comments
Post a Comment