• Deutsch
Login

Open Access

  • Home
  • Search
  • Browse
  • Publish/report a document
  • Help

Refine

Has Fulltext

  • yes (1) (remove)

Author

  • Haas, Florian (1)

Year of publication

  • 2019 (1)

Document Type

  • Doctoral Thesis (1)

Language

  • English (1)

Keywords

  • Fehlertoleranz (1) (remove)

Institute

  • Fakultät für Angewandte Informatik (1)
  • Institut für Informatik (1)
  • Lehrstuhl für Systemnahe Informatik und Kommunikationssysteme (1)

1 search hit

  • 1 to 1
  • 10
  • 20
  • 50
  • 100
Fault-tolerant Execution of Parallel Applications on x86 Multi-core Processors with Hardware Transactional Memory (2019)
Haas, Florian
To satisfy the enduring demand for increasing computational power, the processor manufacturers try to raise the performance per Watt of a chip, which can be achieved by minimizing the structure sizes of electronic circuits. However, the technological limits are about to be reached, and the further reduction of the supply voltages and rising frequencies will lead to increased error rates due to transient faults, which result from the miniaturization of transistors, and from the growing number of transistors on a chip. To mitigate such errors, techniques from dependable server systems and safety-critical embedded systems become attractive in commodity systems as well. However, the typically used cycle-by-cycle lockstep execution of a redundant processor is hardly feasible on a complex out-of-order CPU. Approaches that enable a loose coupling have been proposed, integrated in hardware as well as software-only approaches. Software mechanisms allow to run specific applications redundantly to detect errors on a COTS (commercial-off-the-shelf) processor without hardware modifications. In recent years, transactional memory gained interest in the research of fault-tolerant systems. Its property of isolation and the integrated checkpointing mechanism to guarantee atomicity spawned multiple approaches to utilize transactional memory for fault tolerance. With the availability of first hardware implementations, for example TSX in the more expensive processors of the Intel x86 Core family, software mechanisms that rely on hardware transactional for checkpointing became feasible. This thesis investigates a fail-operational execution with transactional memory on a COTS Intel CPU, based on loosely-coupled redundant execution. An instrumentation mechanism, which was developed as an optimization pass for the LLVM compilation toolchain, and a support library for POSIX compatible systems provide the functionality for error detection and recovery. The feasibility and the effectiveness of the approach were evaluated with benchmarks of the SPEC2017 benchmark suite. Results show that a fault-tolerant redundant execution can be achieved on an x86 CPU, and that specific enhancements to the hardware could further improve the overall performance. Multi-threaded applications require further consideration for redundant execution, since indeterminism can occur between redundant pairs of threads, due to diverging synchronization, for example on mutual exclusion. An interface to the Pthread synchronization functions is described, as well as an error recovery mechanism, to enable the redundant and fail-operational execution of multi-threaded applications. This was evaluated by means of benchmarks of the PARSEC suite to prove that redundant multi-threading is feasible on a COTS CPU. The impact of the additional layer on performance and speedup is shown to be minimal.
  • 1 to 1

OPUS4 Logo

  • Contact
  • Imprint
  • Sitelinks