ECE408@UIUC 并行编程 C++报错free(): invalid next size (normal)解决

问题背景：ECE408 Project实现一个串行的CNN卷积层

需要实现一个串行的实现达到教育我们串行跑CNN是一个时间非常长没有训练效率的东西（雾）

实现代码思路是纯串行：（以下是buggy version）


x
void conv_forward_cpu(float *output, const float *input, const float *mask, const int B, const int M, const int C, const int H, const int W, const int K, const int S)
{
  /*
    Modify this function to implement the forward pass described in Chapter 16.
    The code in 16 is for a single image.
    We have added an additional dimension to the tensors to support an entire mini-batch
    The goal here is to be correct, not fast (this is the CPU implementation.)

    Function paramters:
    output - output
    input - input
    k - kernel
    B - batch_size (number of images in x)
    M - number of output feature maps
    C - number of input feature maps
    H - input height dimension
    W - input width dimension
    K - kernel height and width (K x K)
    S - stride step length
    */

  const int H_out = (H - K)/S + 1;
  const int W_out = (W - K)/S + 1;
  

  // We have some nice #defs for you below to simplify indexing. Feel free to use them, or create your own.
  // An example use of these macros:
  // float a = in_4d(0,0,0,0)
  // out_4d(0,0,0,0) = a
  #define out_4d(i3, i2, i1, i0) output[(i3) * (M * H_out * W_out) + (i2) * (H_out * W_out) + (i1) * (W_out) + i0]
  #define in_4d(i3, i2, i1, i0) input[(i3) * (C * H * W) + (i2) * (H * W) + (i1) * (W) + i0]
  #define mask_4d(i3, i2, i1, i0) mask[(i3) * (C * K * K) + (i2) * (K * K) + (i1) * (K) + i0]
  for (int b = 0; b < B; b++){
    for (int m = 0; m < M; m++){
      for (int h = 0; h < H; h++){
        for (int w = 0; w < W; w++){
          out_4d(b,m,h,w) = 0;
          for (int c = 0; c < C; c++){
            for (int p = 0; p < K; p++){
              for (int q = 0; q < K; q++){
                out_4d(b,m,h,w) += in_4d(b,c,(h*S+p),(w*S+q))*mask_4d(m,c,p,q);
              }
            }
          }
        }
      }
    }
  }
  #undef out_4d
  #undef in_4d
  #undef mask_4d
  return;
}

问题报错：

问题可以归结为：allocate的大小和free的大小不匹配，导致出现了一个invalid的free()错误。

问题的关键在于合理的理解这个实现中convolution的意思

我们还是看这张图：这里的convolution做串行的矩阵应该是卷积后的小矩阵，所以串行应该改为：


xxxxxxxxxx
 // These are height and width of the result matrix
 const int H_out = (H - K)/S + 1; 
 const int W_out = (W - K)/S + 1; 

 for (int b = 0; b < B; b++){
    for (int m = 0; m < M; m++){
      for (int h = 0; h < H_out; h++){
        for (int w = 0; w < W_out; w++){
          out_4d(b,m,h,w) = 0;
          for (int c = 0; c < C; c++){
            for (int p = 0; p < K; p++){
              for (int q = 0; q < K; q++){
                out_4d(b,m,h,w) += in_4d(b,c,(h*S+p),(w*S+q))*mask_4d(m,c,p,q);
              }
            }
          }
        }
      }
    }
  }

这样就可以避免allocate和free的不匹配。

总结一下就是写代码的时候需要仔细check边界条件是什么，串行化的矩阵到底是哪一个。

还有一个比较有意思的点是为什么是free报错，有一个有意思的帖子

还有一个topic是为什么不是段错误而是free报错？

因为代码在服务器端运行，服务器端操作系统不一定会执行我们在linux或windows或osx上强制保护的机制，属于一种更general的保护。

搜尋此網誌

Jade Hong's Technical Blogger

ECE408@UIUC 报错free(): invalid next size (normal)解决

ECE408@UIUC 并行编程 C++报错free(): invalid next size (normal)解决

问题背景：ECE408 Project实现一个串行的CNN卷积层

留言

張貼留言

這個網誌中的熱門文章

ECE438@UIUC WSL中的VSCode C/C++编译环境出错

ECE408@UIUC MP5 CUDA C++实现并行计算的Reduction Tree Addition