Vaibhav Gurunathan

GPU Programming - Speed up a Convolutional Neural Network

This was my project for Parallel Programming with GPUs (Fall 2024). We had to implement a CNN in CUDA — convolution layers, pooling, and activations — and make it faster than a CPU version. The hard part was not the math; it was getting the memory access right so the GPU was actually doing work in parallel instead of stalling. I spent a lot of time on shared memory tiling and making sure reads were coalesced. I will add the actual speedup numbers and layer details when I have them handy.