PPoPP 2016
Sat 12 - Wed 16 March 2016 Barcelona, Spain
Sun 13 Mar 2016 09:00 - 10:30 at Ibiza - CUDA tuning and new GPU trends
Sun 13 Mar 2016 11:00 - 12:30 at Ibiza - CUDA tuning and new GPU trends


CUDA (Compute Unified Device Architecture) is an architecture and software paradigm designed for generic computing. It was born in late 2006 for programming a GPU many-core architecture using SIMD extensions of C language, and it is available for Windows, Linux and MacOS users. A compiler generates executable code for the GPU, which is seen by the CPU as a many-core co-processor/accelerator.

Since its inception, CUDA has achieved extraordinary speed-up factors in a great range of grand-challenge applications and has continuously increased its popularity within the High Performance Computing community. To that extent, CUDA is a technology to be used in more than half a billion GPUs and being taught at more than 800 Universities worldwide, also sharing a range of computational interfaces with two competitors: OpenCL, championed by the Khronos Group, and DirectCompute, led by Microsoft. Third party wrappers are also available for Python, Perl, Java, Fortran, Ruby, Lua, Haskell, MatLab and IDL, among a long list.

This tutorial will review current achievements of many-core GPUs, recent and future hardware enhancements, and emerging challenges to leverage GPUs as accelerators within general-purpose exascale computing. Examples and case studies will be given of new software features (dynamic parallelism, unified memory) and hardware mechanisms (Hyper-Q, 3D-DRAM).


This tutorial is aimed to provide a solid background on new features and CUDA optimizations, with an emphasis on how to maximize performance on present and future GPU platforms for scientific computing.


The tutorial is organized into two parts: First, we briefly describe the CUDA architecture through hardware generations (Kepler, Maxwell, Pascal) and explain basic optimizations. Second, we spend more time to show the way CUDA applications are required to be transformed in the near future to take advantage of new features like dynamic parallelism and Hyper-Q (Kepler), unified memory (Maxwell) or 3DRAM (Pascal).


This tutorial is opened to programmers already familiarized with GPGPU and CUDA C computing who want to learn the new capabilities and get familiar with optimizations to be performend on typical CUDA C kernels for scientific codes on many application areas. Future trends will also be introduced. Previous experience on parallel programming (P-threads, MPI, OpenMP, OpenCL, …) is not required.


Manuel Ujaldón

  • CUDA Fellow @ Nvidia Corporation.
  • Associate Professor @ Univ. of Malaga (Spain).
  • Conjoint Senior Lecturer @ Univ. of Newcastle (Australia).

Manuel Ujaldon is Prof. of Computer Architecture at the Univ. of Malaga (Spain) and CUDA Fellow at Nvidia.

He worked in the 90’s on parallelizing compilers, finishing his PhD in 1996 by developing a data-parallel compiler for sparse matrix and irregular applications. Over this period, he was part of the HPF and MPI Forums, working as post-doc in the CS Dept. of the Univ. of Maryland (USA).

Last decade he started working on the GPGPU movement early in 2003 using Cg, and wrote the first book in spanish about programming GPUs for general purpose computing. He adopted CUDA when it was first released, then focusing on image processing and biomedical applications. Over the past five years, he has published more than 50 papers in journals and international conferences in these two areas.

Dr. Ujaldon has been awarded as NVIDIA Academic Partnership 2008-2011, NVIDIA Teaching Center since 2011, NVIDIA Research Center since 2012, and finally CUDA Fellow. Over the past four years, he has taught around 60 courses on CUDA programming worldwide sponsored by Nvidia, including keynotes and tutorials in ACM/IEEE conferences and academic programs in universities of 18 different countries.

For more information, you can visit:

Research interests:

  • GPGPU Computing on Image Processing (features extraction, segmentation, classifiers).

  • Biomedical Applications (large-scale image analysis, biomarkers, bio-inspired algorithms and genomic wide association studies).

  • Evolutionary Computation (Ant Colony Optimization).

List of publications: http://cms.ac.uma.es/ujaldon/index.php/en/research-papers

Sun 13 Mar

Displayed time zone: Belfast change

09:00 - 10:30
CUDA tuning and new GPU trendsTutorials at Ibiza
CUDA tuning and new GPU trends
Manuel Ujaldon Universidad de Málaga, Spain
11:00 - 12:30
CUDA tuning and new GPU trendsTutorials at Ibiza
CUDA tuning and new GPU trends
Manuel Ujaldon Universidad de Málaga, Spain