列出帖子

该操作将允许你查看该会员所有的帖子,注意你只能看到你有权限看到的板块的帖子。


只显示主题 - 奈奈

页: [1]
1
我在实现距离变换(非0点到最近的0的距离)的GPU版,对应CPU版结果如下:
CPU version...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 3 3 3 3 2 2 2 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 3 3 3 3 4 4 3 3 3 3 2 2 2 2 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 2 3 3 3 4 4 4 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 2 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 3 3 4 4 4 5 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 3 2 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 2 2 3 3 4 4 4 4 5 5 5 6 6 6 6 5 5 5 5 5 4 4 4 4 4 4 3 3 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 2 2 2 3 3 4 4 5 5 5 5 6 6 6 7 7 6 6 6 6 6 5 5 5 5 5 5 4 4 3 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 1 1 2 2 3 3 3 4 4 5 5 6 6 6 6 7 7 7 7 7 7 7 7 6 6 6 6 6 6 5 4 4 3 2 2 1 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 3 3 4 4 4 5 5 6 6 7 7 7 7 8 8 8 8 8 8 7 7 7 7 6 6 5 5 4 3 3 2 2 1 0 0 0 0 0 0 0
0 0 0 0 1 2 2 3 3 4 4 5 5 5 6 6 7 7 8 8 8 8 9 9 9 9 8 8 7 6 6 6 5 5 4 4 3 2 2 1 1 0 0 0 0 0 0 0
0 0 0 1 1 2 3 3 4 4 5 5 6 6 6 7 7 8 8 8 8 8 8 8 8 8 8 8 7 6 5 5 5 4 4 3 3 2 1 1 0 0 0 0 0 0 0 0
0 0 1 1 2 2 3 4 4 5 5 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 5 4 4 4 3 3 2 2 1 0 0 0 0 0 0 0 0 0
0 0 1 2 2 3 3 4 5 5 6 6 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 4 3 3 3 2 2 1 1 0 0 0 0 0 0 0 0 0
0 0 1 2 2 3 3 4 4 5 5 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 3 2 2 2 1 1 0 0 0 0 0 0 0 0 0 0
0 0 1 1 2 2 3 3 4 4 5 5 6 6 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 5 4 3 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 2 2 3 3 4 4 5 5 5 5 4 4 4 3 3 3 3 3 3 3 3 3 3 3 4 4 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 3 3 4 4 5 4 4 4 3 3 3 2 2 2 2 2 2 2 2 2 3 3 4 4 3 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 2 2 3 3 4 4 4 3 3 3 2 2 2 1 1 1 1 1 1 1 2 2 3 4 4 3 2 2 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 2 2 3 3 3 3 3 2 2 2 1 1 1 0 0 0 0 0 1 1 2 3 4 4 3 3 2 2 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 2 2 2 3 2 2 2 1 1 1 0 0 0 0 0 0 0 0 1 2 3 4 4 4 3 3 2 2 2 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 2 2 2 1 1 1 0 0 0 0 0 0 0 0 1 1 1 2 3 4 5 4 4 3 3 3 3 2 2 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 4 5 5 4 4 4 4 3 3 2 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 2 2 3 3 3 4 5 5 5 5 5 4 4 3 2 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 3 3 3 3 3 3 4 4 4 5 6 6 6 5 5 4 3 2 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 3 3 3 4 4 4 4 4 4 5 5 5 6 7 6 6 5 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 3 3 4 4 4 5 5 5 5 5 5 6 6 6 7 7 6 5 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 3 4 4 5 5 5 6 6 6 6 6 6 7 7 7 6 6 5 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 3 3 4 4 4 5 5 6 6 7 7 7 7 7 7 7 6 5 5 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 2 3 3 3 4 4 5 5 6 6 7 7 6 6 6 6 6 5 4 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 3 3 4 4 5 5 6 6 6 6 5 5 5 5 5 4 3 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 3 3 4 4 5 5 5 5 5 5 4 4 4 4 4 3 2 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 4 4 4 4 4 4 4 4 3 3 3 3 3 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 3 2 2 2 2 2 2 2 2 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

但是GPU版,最外面几层的计算我打印出来还正确:
第一次未进入while循环:
GPU version...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
这些1实际就代表了这个点的线程此次参数了主要计算

第一次后开始进入while循环:GPU version...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 0 0 0 0 2 2 2 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0 0
0 0 0 0 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 1 1 0 0 0 0 0 0 0
0 0 0 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 0 0 0 0
0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0 0 0 0
0 0 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 1 1 0 0 0 0 0 0 0 0 0
0 0 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 1 1 0 0 0 0 0 0 0 0 0 0
0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 0 0 0 0 0 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 2 2 2 1 1 1 1 1 1 1 2 2 0 0 0 0 2 2 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 2 2 2 1 1 1 0 0 0 0 0 1 1 2 0 0 0 0 0 2 2 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 2 2 2 0 2 2 2 1 1 1 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 2 2 2 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 2 2 2 1 1 1 0 0 0 0 0 0 0 0 1 1 1 2 0 0 0 0 0 0 0 0 0 2 2 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 2 2 2 0 0 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 0 2 2 2 2 2 2 2 2 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
这些2实际就是此次循环时参数计算的线程即active的线程

但GPU最终结果如下:
max dt value:1 !max dt value:2 !GPU version...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 3 3 3 3 2 2 2 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 3 3 3 3 4 4 3 3 3 3 2 2 2 2 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 2 3 3 3 4 4 4 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 2 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 3 3 4 4 4 5 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 3 2 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 2 2 3 3 4 4 4 4 5 5 5 6 6 6 6 5 5 5 5 5 4 4 4 4 4 4 3 3 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 2 2 2 3 3 4 4 5 5 5 5 6 6 6 7 7 6 6 6 6 6 5 5 5 5 5 5 4 4 3 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 1 1 2 2 3 3 3 4 4 5 5 6 6 6 6 7 8 7 7 7 7 7 7 6 6 6 6 6 6 5 4 4 3 2 2 1 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 3 3 4 4 4 5 5 6 6 7 7 7 7 8 9 8 8 8 8 7 7 7 7 6 6 5 5 4 3 3 2 2 1 0 0 0 0 0 0 0
0 0 0 0 1 2 2 3 3 4 4 5 5 5 6 6 7 7 9 9 9 9 9 9 9 9 8 8 7 6 6 6 5 5 4 4 3 2 2 1 1 0 0 0 0 0 0 0
0 0 0 1 1 2 3 3 4 4 5 5 6 6 6 7 7 8 8 8 8 8 8 8 8 8 8 8 7 6 5 5 5 4 4 3 3 2 1 1 0 0 0 0 0 0 0 0
0 0 1 1 2 2 3 4 4 5 5 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 5 4 4 4 3 3 2 2 1 0 0 0 0 0 0 0 0 0
0 0 1 2 2 3 3 4 5 5 6 6 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 4 3 3 3 2 2 1 1 0 0 0 0 0 0 0 0 0
0 0 1 2 2 3 3 4 4 5 5 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 3 2 2 2 1 1 0 0 0 0 0 0 0 0 0 0
0 0 1 1 2 2 3 3 4 4 5 5 6 6 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 5 4 3 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 2 2 3 3 4 4 5 5 5 5 4 4 4 3 3 3 3 3 3 3 3 3 3 3 4 4 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 3 3 4 4 5 4 4 4 3 3 3 2 2 2 2 2 2 2 2 2 3 3 4 4 3 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 2 2 3 3 4 4 4 3 3 3 2 2 2 1 1 1 1 1 1 1 2 2 3 4 4 3 2 2 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 2 2 3 3 3 3 3 2 2 2 1 1 1 0 0 0 0 0 1 1 2 3 4 4 3 3 2 2 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 2 2 2 3 2 2 2 1 1 1 0 0 0 0 0 0 0 0 1 2 3 4 4 4 3 3 2 2 2 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 2 2 2 1 1 1 0 0 0 0 0 0 0 0 1 1 1 2 3 4 5 4 4 3 3 3 3 2 2 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 4 5 5 4 4 4 4 3 3 2 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 2 2 3 3 3 4 5 5 5 5 5 4 4 3 2 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 3 3 3 3 3 3 4 4 4 5 6 6 6 5 5 4 3 2 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 3 3 3 4 4 4 4 4 4 5 5 5 6 7 6 6 5 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 3 3 4 4 4 5 5 5 5 5 5 6 6 6 7 7 6 5 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 3 4 4 5 5 5 6 6 6 6 6 6 7 7 7 6 6 5 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 3 3 4 4 4 5 5 6 6 7 7 7 7 7 7 7 6 5 5 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 2 3 3 3 4 4 5 5 6 6 7 7 6 6 6 6 6 5 4 4 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 3 3 4 4 5 5 6 6 6 6 5 5 5 5 5 4 3 3 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 3 3 4 4 5 5 5 5 5 5 4 4 4 4 4 3 2 2 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 4 4 4 4 4 4 4 4 3 3 3 3 3 2 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 3 2 2 2 2 2 2 2 2 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

可以看到GPU的最后几个线程结果与CPU版有出入。GPU代码如下:
texture<uchar, 2,cudaReadModeElementType>  texbw;
#define THREADS_SIZE 34

__device__ uint count=0;
__global__ void dtgpu_min(size_t width,size_t height,uchar* dev_dtimg)
{
   uchar tmp_dt=0;//初始化为0
   int x = threadIdx.x + blockIdx.x * blockDim.x;//这个是width的坐标(横坐标)
   int y = threadIdx.y + blockIdx.y * blockDim.y;//这个是height的坐标(纵坐标)
   int offset = x + y * blockDim.x * gridDim.x;

   /////第一次迭代 默认肯定有石头 不会是黑图
   uchar center,up,down,left,right,upleft,upright,downright,downleft;
   //二维纹理不需要user处理图像边界情况
   center= tex2D(texbw,x,y);//本线程中心点
   up= tex2D(texbw,x,y-1);
   down= tex2D(texbw,x,y+1);
   left= tex2D(texbw,x-1,y);
   right= tex2D(texbw,x+1,y);
   upleft= tex2D(texbw,x-1,y-1);
   upright= tex2D(texbw,x+1,y-1);
   downright= tex2D(texbw,x-1,y+1);
   downleft= tex2D(texbw,x+1,y+1);
   if(center!=0)
   {
      int neighbor8=up*down*left*right*upleft*upright*downright*downleft;
      if(neighbor8==0)
      {
         tmp_dt=1;
         dev_dtimg[offset]=1;//第一次迭代即石头最外面一圈
          __threadfence();
      }
   }
   if(offset==0)
   {
      atomicAdd(&count, 1);//让一个线程去改变全图最大距离值
      printf("max dt value:%d !\n",int(count));
   }

   bool has255=true;//默认一张图肯定未处理之前是有255的
   while(has255)
   {
      has255=false;//假设此次迭代后就没有255了
      //////第n次迭代
      uchar lastdis=tmp_dt;//上一轮迭代时此线程计算的距离值
      if((center!=0 && lastdis==0)&&(x>0 && x<width-1)&&(y>0 && y<height-1))
      {
         //有活儿干的线程
         uchar min=255;
         up=dev_dtimg[offset-width];
         if(up!=0 && up<min)
         {
            min=up;
         }
         down=dev_dtimg[offset+width];
         if(down!=0 && down<min)
         {
            min=down;
         }
         left=dev_dtimg[offset-1];
         if(left!=0 && left<min)
         {
            min=left;
         }
         right=dev_dtimg[offset+1];
         if(right!=0 && right<min)
         {
            min=right;
         }
         upleft=dev_dtimg[offset-width-1];
         if(upleft!=0 && upleft<min)
         {
            min=upleft;
         }
         upright=dev_dtimg[offset-width+1];
         if(upright!=0 && upright<min)
         {
            min=upright;
         }
         downright=dev_dtimg[offset+width+1];
         if(downright!=0 && downright<min)
         {
            min=downright;
         }
         downleft=dev_dtimg[offset+width-1];
         if(downleft!=0 && downleft<min)
         {
            min=downleft;
         }
         has255=true;//迭代过程中发现还是有255,说明下次还要迭代
         tmp_dt=min+1;
      }
      //为下一次迭代做准备//将临时dt值写到全局距离变换结果中
      dev_dtimg[offset]=tmp_dt;
      __threadfence();
      //改变全局最大距离值
      if(offset==0)
      {
         int maxdis=atomicAdd(&count, 1);//让一个线程去改变全图最大距离值
         printf("max dt value:%d !\n",int(count));
      }
      //开始下一次迭代
   }
}

int main()
{
   cv::Mat testimg = cv::imread("/media/root/Ubuntu43/xrt/imgs/watershed-min/0.bmp",-1);
   cv::cvtColor(testimg,testimg,cv::COLOR_BGR2GRAY);
   //二值图像
   cv::Mat bwimg;
   cv::threshold(testimg,bwimg,100,255,cv::THRESH_BINARY);
   int rows=bwimg.rows;
   int cols=bwimg.cols;
   int imgsize=sizeof(uchar)*rows*cols;
   //距离变换结果
   cv::Mat dtimg=cv::Mat::zeros(rows,cols,CV_8UC1);
   //GPU版本--不扩展图像,2D纹理自动会处理图像边界问题
   dim3 threads(THREADS_SIZE, 1);
   dim3 blocks(2,rows);//68/64=2

   //二值图bwimg使用2D纹理内存 texbw 只读
   cudaChannelFormatDesc desc = cudaCreateChannelDesc<uchar>();
   cudaArray *dev_Src;
   cudaMallocArray(&dev_Src,&desc,cols,rows);
   cudaMemcpyToArray(dev_Src,0,0,bwimg.data,imgsize,cudaMemcpyHostToDevice);
   texbw.filterMode =cudaFilterModePoint;
   texbw.addressMode[0] = cudaAddressModeWrap;
   texbw.addressMode[1] = cudaAddressModeWrap;
   cudaBindTextureToArray(&texbw,dev_Src,&desc);

   //距离变换结果使用全局内存 读写
   uchar *dev_dtimg;
   cudaMalloc((void**)&dev_dtimg, imgsize);
   dtgpu_min<<<blocks,threads>>>(cols,rows,dev_dtimg);

   cudaMemcpy(dtimg.data,dev_dtimg,imgsize,cudaMemcpyDeviceToHost);
   cudaThreadSynchronize();
   
   cudaUnbindTexture(texbw);// 解绑和释放内存
   cudaFree(dev_Src);
   cudaFree(dev_dtimg);

   return 0;
}
(1)我想了一下为什么全局距离最大值应该输出9,而我输出2,是因为我这种写法有以下问题:比如第一次进入while,offset0这个线程给全局count加1,在这个动作期间,active的线程们可能已经计算到1或者2或者3了,别的有活儿干的线程并不会等待offset0完成将count=0加1变成1这个动作,是这样吗?
(2)为什么GPU结果最后几个线程,即输出结果为8和9那里与CPU版有点出入也是因为不是所有的6计算完毕再所有线程一起进入下一次while,可能某个线程T本该写7,而他周围3x3邻域内已有4个线程写了7,一个线程写了9,还有两个线程Tm、Tn正在写6,而这个线程T比Tm、Tn更快,所以它环顾自己的邻域发现最小的是7,所以输出自己的结果为8。也就是线程T没有等到Tm或Tn写完6。是这样吗?怎么让所有block的所有线程都等待呢,难道真只有核函数结束这一种办法?如果是这样,那我这个思路就得大改了。
(3)我想输出最大值9作为全局变量,参考了您说的手册中有一个例子是对一个数组(长度m*n)求和,每个block完成m个数的求和,然后每个block派出自己的thread0将本组求的一个临时和写到全局。总共n个block。通过全局变量count的计数atomicInc,其实就相当于标志每个thread0的快慢,最慢的一个计数完肯定是n-1,所以只要判断哪个thread0计数完是n-1,那么其代表的block就是最慢的。然后由这个最慢的block完成对大小为n的临时和的最终求和。我本来想利用这个例子,但好像不适合,因为我每次while内active的线程数不固定,我无法找到最慢的thread或block。

页: [1]