GitHub - forresti/convolution: Communication-Minimizing 2D Convolution in GPU Registers

This fast convolution implementation accompanies the following paper

Forrest N. Iandola, David Sheffield, Michael Anderson, Phitchaya Mangpo Phothilimthana, and Kurt Keutzer. 
"Communication-Minimizing 2D Convolution in GPU Registers."
To appear in IEEE International Conference on Image Processing (ICIP), 2013.

Paper (PDF)

We accelerate 2D convolution by reducing memory communication, prefetching data to registers, unrolling loops, and utilizing the texture cache. Our techniques draw on Vasily Volkov's Better Performance at Lower Occupancy line of work.

Recommended System Configuration

- Ubuntu 12.04 - NVIDIA Fermi or Kepler GPU (compute capability 2.0 or higher) - CUDA 5.0 or higher - Install **OpenCV** for file I/O

Getting started

``` C make ./main ```

Code structure

The crux of our convolution implementation (convolution.cu)

``` C //memoryLayout = {global, global_register, texCache, or texCache_register} //filter dims and size to load = {2x2, 3x3, ...} __global__ convolutionKernel_memoryLayout_filter3x3_size4x4(...) //one of many implementations that we generate load 4x4 window to registers

compute 4 convolution windows, 3x3 each.

save results to global memory


<h5>Autotuning</h5>
`doCodegen.sh` produces code for many convolution filter sizes and many levels of unrolling. 
Feel free to modify `doCodegen.sh `and `codeGen_*.sh` to explore alternative filter sizes, unrolling levels, and memory layouts. 
Dive into this at your own risk though -- some 1337ness with bash may be required.

<h3>FAQ</h3>
<h5>Can I use this code as a library in my own system?</h5>
This code is designed as a research prototype. 
Feel free to try to use it as a library, but our main goal is to illustrate the performance of our communication-minimizing convolution technique.

<h5>How do I set the convolution filter?</h5>
Our autotuner uses a hard-coded blur kernel. To plug in your own filter with no computational performance penalty, use constant memory:

``` C  
//adapted from NVIDIA_CUDA-5.0_Samples/3_Imaging/convolutionSeparable/convolutionSeparable.cu
#define KERNEL_LENGTH 9
__constant__ float c_Kernel[KERNEL_LENGTH]; //constant memory on GPU (global variable in C  )

extern "C" void initialize_constant_memory() //call this before launching a convolution on your GPU
{
    float h_Kernel[9] = {-1, 0, 1, -1, 0, 1, -1, 0, 1}; //example filter (you can set this dynamically)
    CHECK_CUDART(cudaMemcpyToSymbol(c_Kernel, h_Kernel, KERNEL_LENGTH * sizeof(float)));
}

//code from our autotuner, with the addition of a user-specified kernel:
__device__ void convolutionDevice_size4x4_kernel3x3(float in[4][4], float *out, int startX, int startY)
{
    float tmp=
        in[startY 0][startX 0] * c_Kernel[0]   //c_Kernel[] replaces hard-coded 'filter' parameter in provided code.
        in[startY 0][startX 1] * c_Kernel[1]  
        in[startY 0][startX 2] * c_Kernel[2]  

        in[startY 1][startX 0] * c_Kernel[3]  
        in[startY 1][startX 1] * c_Kernel[4]  
        in[startY 1][startX 2] * c_Kernel[5]  

        in[startY 2][startX 0] * c_Kernel[6]  
        in[startY 2][startX 1] * c_Kernel[7]  
        in[startY 2][startX 2] * c_Kernel[8]
    *out = tmp;
}

What data formats are supported?

Our code supports 1-channel 32-bit floating-point images. We find that this format is very common for feature extraction algorithms like HOG and SIFT. You may be able to implement our techniques for other formats, such as 8-bit multi-channel images.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
results		results
BSD_LICENSE.md		BSD_LICENSE.md
Lena.pgm		Lena.pgm
Makefile		Makefile
README.md		README.md
codeGen_convRunner.sh		codeGen_convRunner.sh
codeGen_convolution.sh		codeGen_convolution.sh
codeGen_convolution_headers.sh		codeGen_convolution_headers.sh
convRunner.cu		convRunner.cu
convRunner.h		convRunner.h
convolution.cu		convolution.cu
convolution.h		convolution.h
doCodegen.sh		doCodegen.sh
helpers.cpp		helpers.cpp
helpers.h		helpers.h
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This fast convolution implementation accompanies the following paper

Recommended System Configuration

Getting started

Code structure

The crux of our convolution implementation (convolution.cu)

What data formats are supported?

About

Releases

Packages

Languages

forresti/convolution

Folders and files

Latest commit

History

Repository files navigation

This fast convolution implementation accompanies the following paper

Recommended System Configuration

Getting started

Code structure

The crux of our convolution implementation (convolution.cu)

What data formats are supported?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages