The Model of Computation of CUDA and its Formal Semantics

  • We formalize the model of computation of modern graphics cards based on the specification of Nvidia's Compute Unified Device Architecture (CUDA). CUDA programs are executed by thousands of threads concurrently and have access to several different types of memory with unique access patterns and latencies. The underlying hardware uses a single instruction, multiple threads execution model that groups threads into warps. All threads of the same warp execute the program in lockstep. If threads of the same warp execute a data-dependent control flow instruction, control flow might diverge and the different execution paths are executed sequentially. Once all paths complete execution, all threads are executed in parallel again. An operational semantics of a significant subset of CUDA's memory operations and programming instructions is presented, including shared and non-shared memory operations, atomic operations on shared memory, cached memory access, recursive function calls, control flowWe formalize the model of computation of modern graphics cards based on the specification of Nvidia's Compute Unified Device Architecture (CUDA). CUDA programs are executed by thousands of threads concurrently and have access to several different types of memory with unique access patterns and latencies. The underlying hardware uses a single instruction, multiple threads execution model that groups threads into warps. All threads of the same warp execute the program in lockstep. If threads of the same warp execute a data-dependent control flow instruction, control flow might diverge and the different execution paths are executed sequentially. Once all paths complete execution, all threads are executed in parallel again. An operational semantics of a significant subset of CUDA's memory operations and programming instructions is presented, including shared and non-shared memory operations, atomic operations on shared memory, cached memory access, recursive function calls, control flow instructions, and thread synchronization. Based on this formalization we prove that CUDA's single instruction, multiple threads execution model is safe: For all threads it is provably true that a thread only executes the instructions it is allowed to execute, that it does not continue execution after processing the last instruction of the program, and that it does not skip any instructions it should have executed. On the other hand, we demonstrate that CUDA's inability to handle control flow instructions individually for each thread can cause unexpected program behavior in the sense that a liveness property is violated.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Axel Habermaier
URN:urn:nbn:de:bvb:384-opus4-12564
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/1580
Series (Serial Number):Reports / Technische Berichte der Fakultät für Angewandte Informatik der Universität Augsburg (2011-14)
Type:Report
Language:English
Publishing Institution:Universität Augsburg
Release Date:2011/10/17
Tag:Speichermodell; GPU; SIMT
formal semantics; CUDA; GPU; SIMT; memory model
GND-Keyword:Formale Semantik; CUDA <Informatik>
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):Deutsches Urheberrecht