Visualization Model for Learning of Pronunciation with an Approach from Human Computer Interaction - Citegraph

Paper Info

Title
Visualization Model for Learning of Pronunciation with an Approach from Human Computer Interaction

Abstract
The fields of Human Computer Interaction (HCI) and Information Visualization (InfoVis) could complement one another, rather than being treated as two different disciplines, in order to provide visual and interactive aspects for improved information visualization. A model is proposed in this light that allows both researchers and users to analyze a pronunciation learning-oriented task, based on analysis of three reference models for information visualization Card, McKinlay-Shneiderman; Chi; and Wünsche. These models are linear and although they offer data representation-focused approaches, they recognize the role of the user in their different stages. The three visualization models highlight such important aspects as representation, interaction and perception - elements of study that are essential in the field of HCI - in order to create a suitable representation of the information that enables a detailed understanding to be acquired and developed, and experiences to be communicated, wherein the user can interact more easily with the information and get the most from the visualization tool. This article aims to address both research fields. Information visualization is focused on visual representation of acoustic signals of the voice, based upon multidimensional data, while HCI takes into account user-oriented visual and interactive aspects. It is sought to integrate the two research fields in the proposed model, so that acoustic voice signal data are represented on a two dimensional plane and the user can visually comprehend as many aspects of the voice as possible, being able to recognize the quality of the pronunciation through information succinctly represented by graphic attributes. The model is comprised of stages and interaction mechanisms that combine to represent four views. Each view has a different makeup, in order to structure the data and display them using graphical attributes, such as color, position, shape, size, text, orientation and texture. These help in analysis and evaluation and in how to represent the connections between the various visual components in order to establish relationships between data. The main objective of the model is to represent a large number of aspects of the voice, so that in pronunciation training tasks a person can understand pronunciation quality visually. The model represents the results of comparing an input signal recorded by a microphone with a correctly pronounced signal set. Views constitute the final stage of the model. Data are thus presented so that the user can see and interpret representations of phoneme pronunciations. The views enable the user to learn about different aspects of the voice and recognize pronunciation quality visually. Each view represents different information: the first reveals information rapidly using facial gestures to represent moods - a happy face for good pronunciation and a sad face when the pronunciation is poor, making use of facial attributes such as eyes, eyebrows, nose and mouth. These represent values of voice characteristics through their size, shape, position or orientation. The second view permits a quick inspection of data, making comparison between correct pronunciation prototypes and the test signal and employing graphical attributes such as color and shape to interpret degrees of similarity of characteristics between phonemes. A third view illustrates similarities between data associated with a color, position and shape, using a visualization technique called Self-organizing Maps. Finally, a fourth view shows a diagonal line to indicate correlations between a set of phonemes relative to the phoneme pronounced. Each view has a level of interpretation and understanding depending on the degree of experience of the user - from a first view oriented for a child to a fourth view for a person skilled in signal modeling. The four views offer different visual explorations and levels of interpretation, so that the user can understand aspects of the voice and gain knowledge about the quality of the pronunciation. The model involves interaction with the user, which takes account of key aspects that might contribute to an easy decoding of information, in order to provide a better understanding of the task. They also allow the user to learn about the different aspects of the acoustic signal of the voice by means of various combinations of visualization and help an understanding of the data, taking full advantage of visual perception skills to discover patterns. The proposed model considers aspects of design in the display interface, aspects that could reduce the cognitive effort required to understand the graphic representation, so that users can devote their cognitive ability to understanding that which is being represented. Information visualization and HCI are two areas of research that can support each another with the aim of creating a visual representation oriented to the user, in such a way that as much information as possible can be gained and interpreted with the minimum of effort.

Year	DOI	Venue
2014	10.1145/2662253.2662288	Interacción
Field	DocType	Citations
Pronunciation,Reference model,Information visualization,Gesture,Computer science,Visualization,Visual analytics,Human–computer interaction,Multimedia,Perception,Visual perception	Conference	2
PageRank	References	Authors
0.43	0	3

Authors (3 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Sandra P. Cano	1	30	14.40
Gloria Inés Alvarez	2	23	5.01
César A. Collazos	3	280	76.06

1