最近在琢磨CUDA的JIT compiling,不同于使用CUDA runtime API + NVCC,我尝试使用CUDA Driver API + NCRTC进行动态编译kernel,但是现在遇到一个问题,cuda kernel里面没有float4等vector的操作符重载,即没办法像opencl那样 float4 a = b + c; 只能分别按照.xyzw元素去做相加,这个就有点蠢了。 但据我了解在使用runtime API的时候,我们是可以定义__device__ float4 operator+(const float4& a, const float4& b){....}来重载“+”运算的。
那么现在的问题来了,如何能在使用kernel string动态编译的时候让string 中的kernel 函数能够捕捉到重载的操作符呢? 请问各位[名词6]有没有这方便的经验?
下面我贴下我的部分实现代码:
Operator.hpp:
static __host__ __device__ __inline__ float4 operator + (const float4 & a, const float4 & b)
{
return make_float4(a.x+b.x, a.y+b.y, a.z+b.z, a.w+b.w);
}
main.cpp
std::string c = " \n\
#include <Operator.hpp> \n\
extern \"C\" __global__ \n\
void add(float4 *x, float4 *y, float4 *out) \n\
{ \n\
size_t tid = threadIdx.x; \n\
float4 dst; \n\
dst = x[tid] + y[tid]; \n\
out[tid] = dst; \n\
} \n";
nvrtcProgram prog;
std::vector<const char *> headers;
std::vector<const char *> includeNames;
headers.push_back("C:\\Users\\Admin\\Desktop\\CUDA_Halloworld\\");
includeNames.push_back("Operator.hpp");
NVRTC_SAFE_CALL(nvrtcCreateProgram(&prog, // prog
c.c_str(), // buffer
"add.cu", // name
1, // numHeaders
&headers[0], // headers
&includeNames[0])); // includeNames
.....剩下的就是生成ptx进行create kernel & lauch kernel 的代码,就不贴了。
PS: 貌似在nvrtcCreateProgram这个借口中是提供了能力,能让我们把制定的header 个include进来,但是我编译会报错,报错如下:
Operator.hpp(1): error: this declaration has no storage class or type specifier
Operator.hpp(1): error: A namespace scope variable without memory space annotations (__device__/__constant__/__shared__/__managed__) is considered a host variable, and host variables are not allowed in JIT mode. Consider using -default-device flag to process unannotated namespace scope variables as __device__ variables in JIT mode
Operator.hpp(1): error: expected a ";"
Operator.hpp(1): error: incorrectly formed universal character name
Operator.hpp(1): error: unrecognized token
Operator.hpp(1): error: unrecognized token
Operator.hpp(1): error: unrecognized token
Operator.hpp(1): error: unrecognized token
At end of source: warning: parsing restarts here after previous syntax error
8 errors detected in the compilation of "add.cu".
跪求[名词6]帮助和交流!谢谢!