Jade Hong's Technical Blogger

發表文章

ECE408@UIUC MP5 CUDA C++实现并行计算的Reduction Tree Addition

10月 21, 2023

ECE408@UIUC CUDA C++实现并行计算的Reduction Tree Addition ECE408@UIUC CUDA C++实现并行计算的Reduction Tree Addition 如何理解Reduction Tree: 一句话概括：用树形结构实现多线程的并行化从而降低时间复杂度。 Reduction Tree的性质：我们知道求和一个长度为N的数列怎么的也得进行N-1次加法，怎么优化都是进行基于加法的交换/结合定律进行优化。并行计算设备上实现Reduction Tree的优势：我们可以同时计算很多路，所以我们可以结合足够多的简单的加法计算实现很多个相邻元素的相加结合，归并地进行计算在实际效果上形成了低时间复杂度的一个树形结构：Reduction Tree，总的时间上从O(N)降低到了O(log(N))。 CUDA C++实现 x int main (){ /* Some launch code..... */ //@@ Initialize the grid and block dimensions here dim3 DimGrid ( numOutputElements , 1 , 1 ); dim3 DimBlock ( BLOCK_SIZE , 1 , 1 ); //@@ Launch the GPU Kernel here total <<< DimGrid , DimBlock >>> ( deviceInput , deviceOutput , numInputElements ); cudaDeviceSynchronize (); cudaMemcpy ( hostOutput , deviceOutput , numOutputElements * sizeof ( float ), cudaMemcpyDeviceToHost ); /********************************************************...

閱讀完整內容

ECE438@UIUC WSL中的VSCode C/C++编译环境出错

10月 21, 2023

ECE438@UIUC WSL中的VSCode C/C++编译环境出错问题描述：Compile Path出错，VSCode无法找到标准库文件问题尝试解决的过程： WSL中C++/C编译器默认放在：问题尝试：官方文档网址： https://code.visualstudio.com/docs/cpp/config-wsl 文档建议：寻找gcc和g++的位置。【失败】尝试查找gcc的路径，g++不存在，gcc不是文件，只是一个壳。【成功】尝试执行自动更新把gcc和gdb装上问题成功解决。

閱讀完整內容

ECE408@UIUC 报错free(): invalid next size (normal)解决

10月 12, 2023

问题背景：ECE408 Project实现一个串行的CNN卷积层 ECE408@UIUC 并行编程 C++报错free(): invalid next size (normal)解决问题背景：ECE408 Project实现一个串行的CNN卷积层需要实现一个串行的实现达到教育我们串行跑CNN是一个时间非常长没有训练效率的东西（雾）实现代码思路是纯串行：（以下是buggy version） x void conv_forward_cpu(float *output, const float *input, const float *mask, const int B, const int M, const int C, const int H, const int W, const int K, const int S) { /* Modify this function to implement the forward pass described in Chapter 16. The code in 16 is for a single image. We have added an additional dimension to the tensors to support an entire mini-batch The goal here is to be correct, not fast (this is the CPU implementation.) Function paramters: output - output input - input k - kernel B - batch_size (number of images in x) M - number of output feature maps C - number of input feature maps H - input height dimension W - input width dimension ...

閱讀完整內容

搜尋此網誌

Jade Hong's Technical Blogger

發表文章

ECE438@UIUC MP1: OpenSSL Cannot open source file /ssl.h

ECE408@UIUC MP5 CUDA C++实现并行计算的Reduction Tree Addition

ECE438@UIUC WSL中的VSCode C/C++编译环境出错

ECE408@UIUC 报错free(): invalid next size (normal)解决