The Model of Computation of CUDA and its Formal Semantics

Habermaier, Axel

We formalize the model of computation of modern graphics cards based on the specification of Nvidia's Compute Unified Device Architecture (CUDA). CUDA programs are executed by thousands of threads concurrently and have access to several different types of memory with unique access patterns and latencies. The underlying hardware uses a single instruction, multiple threads execution model that groups threads into warps. All threads of the same warp execute the program in lockstep. If threads of the same warp execute a data-dependent control flow instruction, control flow might diverge and the different execution paths are executed sequentially. Once all paths complete execution, all threads are executed in parallel again. An operational semantics of a significant subset of CUDA's memory operations and programming instructions is presented, including shared and non-shared memory operations, atomic operations on shared memory, cached memory access, recursive function calls, control flowWe formalize the model of computation of modern graphics cards based on the specification of Nvidia's Compute Unified Device Architecture (CUDA). CUDA programs are executed by thousands of threads concurrently and have access to several different types of memory with unique access patterns and latencies. The underlying hardware uses a single instruction, multiple threads execution model that groups threads into warps. All threads of the same warp execute the program in lockstep. If threads of the same warp execute a data-dependent control flow instruction, control flow might diverge and the different execution paths are executed sequentially. Once all paths complete execution, all threads are executed in parallel again. An operational semantics of a significant subset of CUDA's memory operations and programming instructions is presented, including shared and non-shared memory operations, atomic operations on shared memory, cached memory access, recursive function calls, control flow instructions, and thread synchronization. Based on this formalization we prove that CUDA's single instruction, multiple threads execution model is safe: For all threads it is provably true that a thread only executes the instructions it is allowed to execute, that it does not continue execution after processing the last instruction of the program, and that it does not skip any instructions it should have executed. On the other hand, we demonstrate that CUDA's inability to handle control flow instructions individually for each thread can cause unexpected program behavior in the sense that a liveness property is violated.… show more

Author:	Axel Habermaier
URN:	urn:nbn:de:bvb:384-opus4-12564
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/1580
Series (Serial Number):	Reports / Technische Berichte der Fakultät für Angewandte Informatik der Universität Augsburg (2011-14)
Type:	Report
Language:	English
Publishing Institution:	Universität Augsburg
Release Date:	2011/10/17
Tag:	Speichermodell; GPU; SIMT formal semantics; CUDA; GPU; SIMT; memory model
GND-Keyword:	Formale Semantik; CUDA <Informatik>
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	Deutsches Urheberrecht

Open Access

The Model of Computation of CUDA and its Formal Semantics

Download full text files

Export metadata

Statistics

Additional Services