Detection of Control Structures in Spoken Utterances (Journal Article)
Authors: Weigelt, Sebastian, Hey, Tobias and Vanessa Steurer
Journal: International Journal of Semantic Computing
Abstract: State-of-the-art intelligent assistant systems such as Siri and Cortana do not consider control structures in the user input. They reliably react to ordinary commands. However, their architectures are not designed to cope with queries that require complex control flow structuring. We propose a system to overcome these limitations. Our approach models if-then-else, loop, and concurrency constructs in spoken utterances explicitly. The model bridges the gap between linguistic and programmatic semantics.
To demonstrate our concept, we apply a rule-based approach. We have implemented three prototypes that use keyphrases to discover potential control structures depending on the type of control structure. However, the full structures are determined differently. For conditionals we use chunk and part-of-speech (POS) tags provided by natural language processing tools; for loops and concurrency we make use of an action extraction approach based on semantic role labeling (SRL). Additionally, we use coreference information to determine the extent of the respective structure.
The explicit modeling of conditionals, loops, and concurrent sections allows us to evaluate the accuracy of our approaches independently from each other and from other language understanding tasks. We have conducted two user studies in the domain of humanoid robotics. The first focused on conditionals. Our prototype achieves F1 scores from 0.783 (automatic speech recognition) to 0.898 (manual transcripts) on unrestricted utterances. In the second, the prototypes for loop and concurrency detection also proved useful. F1 scores range from 0.588 (automatic speech recognition) to 0.814 (manual transcripts) for loops and from 0.622 (automatic speech recognition) to 0.842 (manual transcripts) for concurrent sections respectively.
@article{doi:10.1142/S1793351X18400159,
author = {Weigelt, Sebastian and Hey, Tobias and Steurer, Vanessa},
title = {Detection of Control Structures in Spoken Utterances},
journal = {International Journal of Semantic Computing},
volume = {12},
number = {03},
pages = {335-360},
year = {2018},
doi = {10.1142/S1793351X18400159},
URL = {https://doi.org/10.1142/S1793351X18400159},
eprint = { https://doi.org/10.1142/S1793351X18400159},
abstract = { State-of-the-art intelligent assistant systems such as Siri and Cortana do not consider control structures in the user input. They reliably react to ordinary commands. However, their architectures are not designed to cope with queries that require complex control flow structuring. We propose a system to overcome these limitations. Our approach models if-then-else, loop, and concurrency constructs in spoken utterances explicitly. The model bridges the gap between linguistic and programmatic semantics. To demonstrate our concept, we apply a rule-based approach. We have implemented three prototypes that use keyphrases to discover potential control structures depending on the type of control structure. However, the full structures are determined differently. For conditionals we use chunk and part-of-speech (POS) tags provided by natural language processing tools; for loops and concurrency we make use of an action extraction approach based on semantic role labeling (SRL). Additionally, we use coreference information to determine the extent of the respective structure. The explicit modeling of conditionals, loops, and concurrent sections allows us to evaluate the accuracy of our approaches independently from each other and from other language understanding tasks. We have conducted two user studies in the domain of humanoid robotics. The first focused on conditionals. Our prototype achieves F1 scores from 0.783 (automatic speech recognition) to 0.898 (manual transcripts) on unrestricted utterances. In the second, the prototypes for loop and concurrency detection also proved useful. F1 scores range from 0.588 (automatic speech recognition) to 0.814 (manual transcripts) for loops and from 0.622 (automatic speech recognition) to 0.842 (manual transcripts) for concurrent sections respectively.}
}