ECE408@UIUC 报错free(): invalid next size (normal)解决
ECE408@UIUC 并行编程 C++报错free(): invalid next size (normal)解决
问题背景:ECE408 Project实现一个串行的CNN卷积层

需要实现一个串行的实现达到教育我们串行跑CNN是一个时间非常长没有训练效率的东西(雾)
实现代码思路是纯串行:(以下是buggy version)
xvoid conv_forward_cpu(float *output, const float *input, const float *mask, const int B, const int M, const int C, const int H, const int W, const int K, const int S){/*Modify this function to implement the forward pass described in Chapter 16.The code in 16 is for a single image.We have added an additional dimension to the tensors to support an entire mini-batchThe goal here is to be correct, not fast (this is the CPU implementation.)Function paramters:output - outputinput - inputk - kernelB - batch_size (number of images in x)M - number of output feature mapsC - number of input feature mapsH - input height dimensionW - input width dimensionK - kernel height and width (K x K)S - stride step length*/const int H_out = (H - K)/S + 1;const int W_out = (W - K)/S + 1;// We have some nice #defs for you below to simplify indexing. Feel free to use them, or create your own.// An example use of these macros:// float a = in_4d(0,0,0,0)// out_4d(0,0,0,0) = a#define out_4d(i3, i2, i1, i0) output[(i3) * (M * H_out * W_out) + (i2) * (H_out * W_out) + (i1) * (W_out) + i0]#define in_4d(i3, i2, i1, i0) input[(i3) * (C * H * W) + (i2) * (H * W) + (i1) * (W) + i0]#define mask_4d(i3, i2, i1, i0) mask[(i3) * (C * K * K) + (i2) * (K * K) + (i1) * (K) + i0]for (int b = 0; b < B; b++){for (int m = 0; m < M; m++){for (int h = 0; h < H; h++){for (int w = 0; w < W; w++){out_4d(b,m,h,w) = 0;for (int c = 0; c < C; c++){for (int p = 0; p < K; p++){for (int q = 0; q < K; q++){out_4d(b,m,h,w) += in_4d(b,c,(h*S+p),(w*S+q))*mask_4d(m,c,p,q);}}}}}}}#undef out_4d#undef in_4d#undef mask_4dreturn;}
问题报错:

问题可以归结为:allocate的大小和free的大小不匹配,导致出现了一个invalid的free()错误。
问题的关键在于合理的理解这个实现中convolution的意思

我们还是看这张图:这里的convolution做串行的矩阵应该是卷积后的小矩阵,所以串行应该改为:
xxxxxxxxxx// These are height and width of the result matrixconst int H_out = (H - K)/S + 1;const int W_out = (W - K)/S + 1;for (int b = 0; b < B; b++){for (int m = 0; m < M; m++){for (int h = 0; h < H_out; h++){for (int w = 0; w < W_out; w++){out_4d(b,m,h,w) = 0;for (int c = 0; c < C; c++){for (int p = 0; p < K; p++){for (int q = 0; q < K; q++){out_4d(b,m,h,w) += in_4d(b,c,(h*S+p),(w*S+q))*mask_4d(m,c,p,q);}}}}}}}
这样就可以避免allocate和free的不匹配。
总结一下就是写代码的时候需要仔细check边界条件是什么,串行化的矩阵到底是哪一个。
还有一个比较有意思的点是为什么是free报错,有一个有意思的帖子
还有一个topic是为什么不是段错误而是free报错?

因为代码在服务器端运行,服务器端操作系统不一定会执行我们在linux或windows或osx上强制保护的机制,属于一种更general的保护。
留言
張貼留言