two-dimensional arrays in CUDA

I’m practicing this simple code which takes a two-dimensional array and sums them up with CUDA. In the end, the result of C is not what I accepting. Also, I was wondering whether I can use vector instead of c-style arrays.

#include <iostream>
using namespace std; 
#define N 2   
__global__ void MatAdd(double** a, double** b,
                       double** c)
{
    int i = threadIdx.x;
    int j = threadIdx.y;
    c[i][j] = a[i][j] + b[i][j];
}

int main()
{

    
    double a[2][2]= {{1.0,2.0},{3.0,4.0}};
    double b[2][2]= {{1.0,2.0},{3.0,4.0}};
    double c[2][2]; // it will be the result! 
    double**  a_d; 
    double**  b_d;
    double**  c_d; 
    int d_size = N * N * sizeof(double);
    int numBlocks = 1;
        dim3 threadsPerBlock(N, N);
        
        cudaMalloc(&a_d, d_size);
        
        cudaMalloc(&b_d, d_size);
        
        cudaMalloc(&c_d, d_size);
        
        cudaMemcpy(a_d, a, d_size, cudaMemcpyHostToDevice);
    
        cudaMemcpy(b_d, b, d_size, cudaMemcpyHostToDevice);
        
        cudaMemcpy(c_d, c, d_size, cudaMemcpyHostToDevice);
        
        MatAdd<<<numBlocks, threadsPerBlock>>>(a_d, b_d, c_d);
        
        //cudaDeviceSynchronize();
        cudaMemcpy(c, c_d, d_size, cudaMemcpyDeviceToHost);
     
     for (int i=0; i<N; i++){
        for(int j=0; j<N; j++){
            
            cout<<c[i][j]<<endl;    
        }
     
    }
    return 0; 
    
   
}

Answer

You must not use the double** type in this case. Alternatively, you should use a flatten array that contains all the values of a given matrix in a double*-type variable.

The heart of the problem is located in the following line (and the similar next ones):

cudaMemcpy(a_d, a, d_size, cudaMemcpyHostToDevice);

Here you assume that a and a_d are compatible types, but they are not. A double**-typed variable is a pointer that refer to one or more pointers in memory (typically an array of pointer referencing many different double-typed arrays), while a double*-typed variable or a static 2D C array refer to a contiguous location in memory.

Note that you can access to a given (i,j) cell of a matrix using matrix[N*i+j], where N is the number of column, assuming matrix is a flatten matrix of type double* and use a row-major ordering.