Virtual Reconstruction of Hand-Torn Documents using Discriminative Models
- In numerous fields of computer vision such as in object detection, human pose estimation and image classification, machine learning has become an indispensable component for solving application-specific tasks. This thesis proposes and explores new ways of utilizing discriminative models for the virtual reconstruction of hand-torn documents. In this work, reassembling pieces into document pages is accomplished in a bottom-up fashion. We show that discriminative models are suitable to solve various key problems and discuss how they can be fused effectively into a graph-based algorithm. In essence, we use our models to infer different spatial configurations between pieces, which are encoded into the graph's link structure. In contrast to the widely spread heuristic solutions, supervised learning has a solid theoretical foundation and thus enables a rigorous in-depth analysis of all key components of our proposed method. We further investigate and thoroughly evaluate new methods for theIn numerous fields of computer vision such as in object detection, human pose estimation and image classification, machine learning has become an indispensable component for solving application-specific tasks. This thesis proposes and explores new ways of utilizing discriminative models for the virtual reconstruction of hand-torn documents. In this work, reassembling pieces into document pages is accomplished in a bottom-up fashion. We show that discriminative models are suitable to solve various key problems and discuss how they can be fused effectively into a graph-based algorithm. In essence, we use our models to infer different spatial configurations between pieces, which are encoded into the graph's link structure. In contrast to the widely spread heuristic solutions, supervised learning has a solid theoretical foundation and thus enables a rigorous in-depth analysis of all key components of our proposed method. We further investigate and thoroughly evaluate new methods for the representation of digital pieces. In order to deal properly with arbitrarily shaped pieces, we present a novel technique for the extraction of content-based features along their outer boundary. Our method allows an effortless integration of widely used features and therefore enables a highly discriminative, multimodal representation. We further propose a new color coding scheme based on the Fisher vector, which is extremely robust in the presence of noise and thus is ideally suited for real-world applications. Besides, we introduce two novel, fully annotated datasets. In order to obtain a ground truth, human experts were asked to reassemble all digitized pieces into pages. This not only lays the basis for supervised learning from annotated examples but also provides the means for a rigorous evaluation. Inspired by existing benchmarks in the aforementioned domains we introduce two novel performance measures that quantitatively assess the quality of reconstruction results. We extensively evaluate our proposed method and demonstrate its general applicability on three different datasets, where we achieve state-of-the-art results.…
Author: | Fabian Richter |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-33918 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/3391 |
Advisor: | Rainer Lienhart |
Type: | Doctoral Thesis |
Language: | English |
Publishing Institution: | Universität Augsburg |
Granting Institution: | Universität Augsburg, Fakultät für Angewandte Informatik |
Date of final exam: | 2015/11/16 |
Release Date: | 2015/12/30 |
Tag: | document reconstruction; computer vision; machine learning; graph algorithms; structural support vector machines |
GND-Keyword: | Bilderkennung; Dokument; Rekonstruktion; Maschinelles Lernen |
Institutes: | Fakultät für Angewandte Informatik |
Fakultät für Angewandte Informatik / Institut für Informatik | |
Dewey Decimal Classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Licence (German): | Deutsches Urheberrecht mit Print on Demand |