Human motion analysis on visual data: shaping 2D human pose estimation into a tool for sport and medical applications

  • Whenever images or videos of humans are recorded, the intention is often to capture the behavior, the physical actions or the general body motion of the depicted persons. With the omnipresence of standalone or integrated cameras in all areas of daily life, creating such recordings has become trivial. But the extraction of information about human motion from visual data remains a difficult task that normally requires other persons to watch and understand the recordings. At the same time, a wide range of applications can benefit from human motion understanding in visual data. Surveillance systems, human-machine interfaces or the assessment of physical capabilities of humans are just some exemplary application domains that rely on the automatic retrieval of human motion information. In this work, we investigate different approaches and techniques for automated human motion analysis in video data. We build upon the huge progress in computer vision research over the past decade.Whenever images or videos of humans are recorded, the intention is often to capture the behavior, the physical actions or the general body motion of the depicted persons. With the omnipresence of standalone or integrated cameras in all areas of daily life, creating such recordings has become trivial. But the extraction of information about human motion from visual data remains a difficult task that normally requires other persons to watch and understand the recordings. At the same time, a wide range of applications can benefit from human motion understanding in visual data. Surveillance systems, human-machine interfaces or the assessment of physical capabilities of humans are just some exemplary application domains that rely on the automatic retrieval of human motion information. In this work, we investigate different approaches and techniques for automated human motion analysis in video data. We build upon the huge progress in computer vision research over the past decade. Specifically, we propose to use the long-standing task of 2D human pose estimation (HPE) as an ideal computer vision building block for human motion representation and understanding. The first main topic of this thesis are solutions for motion analysis that directly build upon 2D human pose estimates from videos. Video-based performance analysis in professional sports serves as our main testing ground. In particular, we consider the automated analysis of swimming motion in specialized swimming channels. We present crucial changes to existing 2D HPE models in order to include domain-dependent prior information about expected human motion and to ensure a temporally coherent motion representation. We also tackle the specific task of motion event detection, where we locate characteristic moments during swimming and athletics motion on video frame level. We present two different solutions that are designed for visually challenging environments and crowded scenes. We additionally consider an application for motion analysis in the context of rehabilitation medicine, where we detect speech impairments purely from facial motion. Our second topic is the more general task of reconstructing human posture and motion in 3D space from 2D poses alone. Providing access to 3D motion information is an even stronger foundation for subsequent domain-independent motion analysis. We consider two challenges in the 2D-to-3D pose estimation approach. First, we provide a theoretical analysis of existing 3D HPE models that are particularly relevant for domains with limited annotated 3D pose data. We identify commonly used camera models as a severe structural limitation to the maximal precision of 3D pose estimates. The second challenge is the computational demand of current 3D HPE models on video data. We leverage the state-of-the-art in Transformer architectures and develop a flexible model that can be freely adjusted in its inherent trade-off between 3D pose precision and overall processing efficiency. This enables effective 3D HPE in videos on consumer hardware.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Moritz EinfaltGND
URN:urn:nbn:de:bvb:384-opus4-1137421
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/113742
Advisor:Rainer Lienhart
Type:Doctoral Thesis
Language:English
Year of first Publication:2024
Publishing Institution:Universität Augsburg
Granting Institution:Universität Augsburg, Fakultät für Angewandte Informatik
Date of final exam:2024/04/26
Release Date:2024/08/09
Tag:Computer Vision; 2D Human Pose Estimation; 3D Human Pose Estimation; Human Pose Estimation in Sports
GND-Keyword:Bildverarbeitung; Merkmalsextraktion; Bewegungsablauf; Maschinelles Lernen; Objektverfolgung
Pagenumber:xiv, 259
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Maschinelles Lernen und Maschinelles Sehen
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):Deutsches Urheberrecht mit Print on Demand