请问这个是内存分配的错误吗?

  • 9 replies
  • 286 views
*

sisiy

  • *****
  • 174
    • 查看个人资料
请问这个是内存分配的错误吗?
« 于: 十月 13, 2019, 02:17:43 pm »
请问这个是内存分配的错误吗?我是按照例子程序copy的,为什么会分配失败。。我的核函数是把2进制数转为10进制数
« 最后编辑时间: 十月 13, 2019, 02:18:55 pm 作者 sisiy »

Re: 请问这个是内存分配的错误吗?
« 回复 #1 于: 十月 13, 2019, 02:28:55 pm »
请问这个是内存分配的错误吗?我是按照例子程序copy的,为什么会分配失败。。我的核函数是把2进制数转为10进制数

Hello.
"an illegal memory access was encountered" error means that: Your kernel has failed due to invalid memory access patterns.

Please check the followings conditions according to your screen shots and partial code lines:

(1)Whether kernel argument "a" is a valid pointer. As error code 700 suggests, the address (the value of your pointer a) is invalid.

(2)if "a" is invalid, for example, it's NULL. then check your host code, and find why the allocations on "a" failed.

(3)Please note you have provided only a portion of code. We cannot analyze your host code for you in this situation.

mao@GPUWorld

Re: 请问这个是内存分配的错误吗?
« 回复 #2 于: 十月 13, 2019, 02:38:43 pm »
Hello.
"an illegal memory access was encountered" error means that: Your kernel has failed due to invalid memory access patterns.

Please check the followings conditions according to your screen shots and partial code lines:

(1)Whether kernel argument "a" is a valid pointer. As error code 700 suggests, the address (the value of your pointer a) is invalid.

(2)if "a" is invalid, for example, it's NULL. then check your host code, and find why the allocations on "a" failed.

(3)Please note you have provided only a portion of code. We cannot analyze your host code for you in this situation.

mao@GPUWorld

As you have two kernels, which one really failed remains unknown to us at the moment. You have to check both of them. (Your pictures are mis-leading. In-consistent kernel names in your error logs)

The second kernel accessed only one element of buffer a. that's fine. just check the allocation on "a" itself.

The first kernel however accessed several elements of three buffers. You do have to check all three of them: their allocations, and their capacities. If you used more elements from these buffers, going beyong their capacities. Kernel would fail too.

You have to check all the conditions above.

Re: 请问这个是内存分配的错误吗?
« 回复 #3 于: 十月 13, 2019, 03:17:07 pm »
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <math.h>


cudaError_t convertBinaryToDecimal(int* a, unsigned int size);


__global__ void convertBinaryToDecimalKernel(int* a)
{
   long n = a[0];
   int decimalNumber = 0, i = 0, remainder;
   while (n != 0)
   {
      remainder = n % 10;
      n /= 10;
      decimalNumber += remainder * powf(2, i);
      ++i;
   }
   a[0] = decimalNumber;
}

int main()
{
    const int arraySize = 5;

   int a[arraySize] = { 110, 101, 100, 111, 100 };

   printf("{110, 101, 100, 111, 100} = {%d,%d,%d,%d,%d}\n",
      a[0], a[1], a[2], a[3], a[4]);

    // Add vectors in parallel.
   cudaError_t cudaStatus = convertBinaryToDecimal(a, arraySize);

    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "convertBinaryToDecimal failed!");
      fprintf(stderr, "Failed to allocate  (error code %s)!\n", cudaGetErrorString(cudaStatus));
        return 1;
    }
   
   printf("{110, 11, 10, 11, 10} = {%d,%d,%d,%d,%d}\n",
      a[0], a[1], a[2], a[3], a[4]);


    // cudaDeviceReset must be called before exiting in order for profiling and
    // tracing tools such as Nsight and Visual Profiler to show complete traces.
    cudaStatus = cudaDeviceReset();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaDeviceReset failed!");
        return 1;
    }

    return 0;
}

// 2进制转10进制主机函数
cudaError_t convertBinaryToDecimal(int* a,  unsigned int size)
{
   int* dev_a = 0;

   cudaError_t cudaStatus;

   // Choose which GPU to run on, change this on a multi-GPU system.
   cudaStatus = cudaSetDevice(0);
   if (cudaStatus != cudaSuccess) {
      fprintf(stderr, "cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?");
      goto Error;
   }

   // Allocate GPU buffers for three vectors (one input, one output)    .
   cudaStatus = cudaMalloc((void**)& dev_a, size * sizeof(int));
   if (cudaStatus != cudaSuccess) {
      fprintf(stderr, "cudaMalloc failed!");
      goto Error;
   }


   // Copy input vectors from host memory to GPU buffers.
   cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);
   if (cudaStatus != cudaSuccess) {
      fprintf(stderr, "cudaMemcpy failed!");
      goto Error;
   }

   // Launch a kernel on the GPU with one thread for each element.
   convertBinaryToDecimalKernel << <1, 1 >> > (a);

   // Check for any errors launching the kernel
   cudaStatus = cudaGetLastError();
   if (cudaStatus != cudaSuccess) {
      fprintf(stderr, "addKernel launch failed: %s\n", cudaGetErrorString(cudaStatus));
      goto Error;
   }

   // cudaDeviceSynchronize waits for the kernel to finish, and returns
   // any errors encountered during the launch.
   cudaStatus = cudaDeviceSynchronize();
   if (cudaStatus != cudaSuccess) {
      fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching convertBinaryToDecimalKernel!\n", cudaStatus);
      goto Error;
   }

   // Copy output vector from GPU buffer to host memory.
   cudaStatus = cudaMemcpy(a, dev_a, size * sizeof(int), cudaMemcpyDeviceToHost);
   if (cudaStatus != cudaSuccess) {
      fprintf(stderr, "cudaMemcpy failed!");
      goto Error;
   }

Error:
   cudaFree(dev_a);
   return cudaStatus;
   
}
« 最后编辑时间: 十月 13, 2019, 03:25:21 pm 作者 gcshang »

Re: 请问这个是内存分配的错误吗?
« 回复 #4 于: 十月 13, 2019, 03:26:53 pm »
cudaDeviceSynchronize waits for the kernel to finish, and returns 是这里返回的错误值,不是说主机和设备是异步执行的吗?为什么要同步等待呢?一掉用核函数就报错了

Re: 请问这个是内存分配的错误吗?
« 回复 #5 于: 十月 13, 2019, 03:32:06 pm »
   // Copy input vectors from host memory to GPU buffers.
   cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);
   if (cudaStatus != cudaSuccess) {
      fprintf(stderr, "cudaMemcpy failed!");
      goto Error;
   }

a[]数组是在主机上定义的,我用cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);把它复制到了设备,然后再操作应该没问题把?

Re: 请问这个是内存分配的错误吗?
« 回复 #6 于: 十月 14, 2019, 01:36:07 pm »
楼主你好,并不存在不能使用cudaDeviceSynchronize(), "一使用kernel就挂了“的情况。

同步的时候kernel挂掉,只能证明是你的kernel本身存在问题,而不是同步让它挂掉。

你的代码至少存在如下问题:
(1)wrapper函数传入的是host内存地址:
程序代码: [选择]
cudaError_t convertBinaryToDecimal(int* a,  unsigned int size)
(2)wrapper内部正确的分配了device显存缓冲区:
程序代码: [选择]
cudaStatus = cudaMalloc((void**)& dev_a, size * sizeof(int));
(3)以上两步都没问题,然而你的kernel并没有使用该分配出来的缓冲区(显存),而是试图直接使用内存:
程序代码: [选择]
convertBinaryToDecimalKernel <<<1, 1 >>> (a);

这里才是导致你的kernel挂掉的根源。楼主你未能理解为何kernel要使用显存,为何CPU使用内存的根本原理,只是照抄了大致的骨架(形式看上去是对的),而未能明白kernel给他传递的地址到底应当是什么。

建议的解决方案:
(1)将a改成dev_a。或者:
(2)阅读本论坛的Sisiy妹子的《阅读CUDA 100天》系列文章,学会使用其中的managed memory (unified memory), 后者可以全自动的规避你的这种问题。

Re: 请问这个是内存分配的错误吗?
« 回复 #7 于: 十月 15, 2019, 09:18:22 am »
1. 非常感谢回答,问题已经解决!
2. 我还有一个问题,虽然现在2进制转10进制在核函数级别是并行执行的,但是核函数内部的算法仍然是有串行的,怎么才能使用cuda达到解码算法内部也是并行的呢?

Re: 请问这个是内存分配的错误吗?
« 回复 #8 于: 十月 15, 2019, 09:20:23 am »
__global__ void convertBinaryToDecimalKernel(int* a)
{
   long n = a[0];
   int decimalNumber = 0, i = 0, remainder;
   while (n != 0)
   {
      remainder = n % 10;
      n /= 10;
      decimalNumber += remainder * powf(2, i);
      ++i;
   }
   a[0] = decimalNumber;
}

这里while循环是否能并行呢?cuda是否有类似的库可以使用,或者有更高效的算法吗?
谢谢

*

sisiy

  • *****
  • 174
    • 查看个人资料
Re: 请问这个是内存分配的错误吗?
« 回复 #9 于: 十月 15, 2019, 01:10:59 pm »
新问题,请另起一贴,我帮你发到这里了:https://bbs.gpuworld.cn/index.php?topic=73257.new#new