TY - RPRT A1 - Habermaier, Axel T1 - The Model of Computation of CUDA and its Formal Semantics N2 - We formalize the model of computation of modern graphics cards based on the specification of Nvidia's Compute Unified Device Architecture (CUDA). CUDA programs are executed by thousands of threads concurrently and have access to several different types of memory with unique access patterns and latencies. The underlying hardware uses a single instruction, multiple threads execution model that groups threads into warps. All threads of the same warp execute the program in lockstep. If threads of the same warp execute a data-dependent control flow instruction, control flow might diverge and the different execution paths are executed sequentially. Once all paths complete execution, all threads are executed in parallel again. An operational semantics of a significant subset of CUDA's memory operations and programming instructions is presented, including shared and non-shared memory operations, atomic operations on shared memory, cached memory access, recursive function calls, control flow instructions, and thread synchronization. Based on this formalization we prove that CUDA's single instruction, multiple threads execution model is safe: For all threads it is provably true that a thread only executes the instructions it is allowed to execute, that it does not continue execution after processing the last instruction of the program, and that it does not skip any instructions it should have executed. On the other hand, we demonstrate that CUDA's inability to handle control flow instructions individually for each thread can cause unexpected program behavior in the sense that a liveness property is violated. T3 - Reports / Technische Berichte der Fakultät für Angewandte Informatik der Universität Augsburg - 2011-14 KW - Formale Semantik KW - CUDA KW - Speichermodell KW - GPU KW - SIMT KW - formal semantics KW - CUDA KW - GPU KW - SIMT KW - memory model Y1 - 2011 UR - https://opus.bibliothek.uni-augsburg.de/opus4/frontdoor/index/index/docId/1580 UR - https://nbn-resolving.org/urn:nbn:de:bvb:384-opus4-12564 ER -