Real-time analysis of MPI programs for NoC-based many-cores using time division multiplexing
- Worst-case execution time (WCET) analysis is crucial for designing hard real-time systems. While the WCET of tasks in a single core system can be upper bounded in isolation, the tasks in a many-core system are subject to shared memory interferences which impose high overestimation of the WCET bounds. However, many-core-based massively parallel applications will enter the area of real-time systems in the years ahead. Explicit message-passing and a clear separation of computation and communication facilitates WCET analysis for those programs.
A standard programming model for message-based communication is the message passing interface (MPI). It provides an application independent interface for different standard communication operations (e.g. broadcast, gather, ...). Thereby, it uses efficient communication patterns with deterministic behaviour. In applying these known structures, we target to provide a WCET analysis for communication that is reusable for different applications if theWorst-case execution time (WCET) analysis is crucial for designing hard real-time systems. While the WCET of tasks in a single core system can be upper bounded in isolation, the tasks in a many-core system are subject to shared memory interferences which impose high overestimation of the WCET bounds. However, many-core-based massively parallel applications will enter the area of real-time systems in the years ahead. Explicit message-passing and a clear separation of computation and communication facilitates WCET analysis for those programs.
A standard programming model for message-based communication is the message passing interface (MPI). It provides an application independent interface for different standard communication operations (e.g. broadcast, gather, ...). Thereby, it uses efficient communication patterns with deterministic behaviour. In applying these known structures, we target to provide a WCET analysis for communication that is reusable for different applications if the communication is executed on the same underlying platform. Hence, the analysis must be performed once per hardware platform and can be reused afterwards with only adapting several parameters such as the number of nodes participating in that communication. Typically, the processing elements of many-core platforms are connected via a Network-on-Chip (NoC) and apply techniques such as time-division multiplexing (TDM) to provide guaranteed services for the network. Hence, the hardware and the applied technique for guaranteed service needs to facilitate this reusability of the analysis as well.
In this work we review different general-purpose TDM schedules that enable a WCET approximation independent of the placement of tasks on processing elements of a many-core which uses a NoC with torus topology. Furthermore, we provide two new schedules that show a similar performance as the state-of-the-art schedules but additionally serve situations where the presented state-of-the-art schedules perform poorly. Based on these schedules a procedure for the WCET analysis of the communication patterns used in MPI is proposed. Finally, we show how to apply the results of the analysis to calculate the WCET upper bound for a complete MPI program.
Detailed insights in the performance of the applied TDM schedules are provided by comparing the schedules to each other in terms of timing. Additionally, we discuss the exhibited timing of the general-purpose schedules compared to a state-of-the-art application specific TDM schedule to put in relation both types of schedules. We apply the proposed procedure to several standard types of communication provided in MPI and compare different patterns that are used to implement a specific communication. Our evaluation investigates the communications’ building blocks of the timing bounds and shows the tremendous impact of choosing the appropriate communication pattern. Finally, a case study demonstrates the application of the presented procedure to a complete MPI program.
With the method proposed in this work it is possible to perform a reusable WCET timing analysis for the communication in a NoC that is independent of the placement of tasks on the chip. Moreover, as the applied schedules are not optimized for a specific application but can be used for all applications in the same way, there are only marginal changes in the timing of the communication when the software is adapted or updated. Thus, there is no need to perform the timing analysis from scratch in such cases.…