Archived Projects

Caterpillar Passage
A contemporary reading passage for the assessment of motor speech disorders

Kate Connaghan, Diana Franco, Rupal Patel

Evaluation of motor speech function includes a standard reading passage for assessing connected speech. A review of the salient characteristics of motor speech disorders and common assessment protocols revealed the need for a novel reading passage tailored specifically to differentiate between and amongst types of motor speech disorders. Based on these findings, we designed a novel reading passage called The Caterpillar. The passage was developed to provide a contemporary, easily read, contextual speech sample with specific tasks (e.g. prosodic contrasts, words of increasing length and complexity) targeted to inform the assessment of motor speech disorders. Analysis of performance across a subset of segmental and prosodic variables illustrated that The Caterpillar passage showed promise for extracting individual profiles of impairment that could augment current assessment protocols and inform treatment planning in motor speech disorders.

Development of Prosodic Control in Children
Acquisition of the question-statement and emphatic stress contrasts

Maria Grigos, Mariam Syeda, Rupal Patel

This project was aimed at understanding how prosodic control develops in childhood. Kinematic and acoustic measures were analyzed for speech samples produced by typically developing children aged 4, 7 and 11. Children were asked to produce prosodic contrasts such as questions versus statements and sentences that varied in contrastive stress. This work had implications for building developmental models of prosodic control and for understanding prosodic control in speech impairment.

A real-time, large-button AAC system that supports continuous motion input on a number pad

Karl Wiegand, Rupal Patel

DigitCHAT is a prototype AAC system that allows for real-time communication via continuous motion input. The system uses large, visually separated buttons to assist users who may have difficulties making precise movements, especially on mobile devices with smaller screens. The buttons are organized in a telephone number pad to provide familiarity and reduce the amount of time required to learn the layout. Message construction proceeds on a word-by-word basis, but most likely words are spoken aloud immediately to increase communication speed. DigitCHAT could potentially be used on commercially available mobile phones, lowering costs and social barriers to assistive communication. We are currently conducting research on the potential applications of this work and how to improve both its usability and accuracy.

DIVA Prosody
An extension of the DIVA model of speech production to include prosodic control

Erin Archibald, Kate Connaghan, Rupal Patel

We have conducted a set of psychophysical and fMRI experiments that aim to extend the DIVA (Directions Into Velocities of Articulators) model of speech production to include prosodic control. Near real-time perturbations shift the fundamental frequency (F0) on the stressed word within phrases. Acoustic analyses examine whether participants compensate using F0 cues alone or in conjunction with intensity and duration changes. In addition, an intensity perturbation study is being conducted on neurologically normal individuals. This, combined with the pitch perturbation protocol, will yield a more in-depth understanding of prosodic control for implementation in the DIVA model. Our experiments have been leveraged to identify the neural correlates of prosodic compensations and neural modeling simulations within the DIVA model.

An image-based communication aid with context-sensitive vocabulary prediction

Rajiv Radhakrishnan, Katherine Schooley, Rupal Patel

We developed an image-based communication aid that enables individuals with severe communication impairments to interact efficiently and naturally with others. Within iconCHAT, we implemented a novel method of message construction based on semantic frames. We conducted usabilty tests with typically developing children to compare our method with conventional methods that focus on syntactic conventions. We also leveraged the user's physical and social context to improve communication efficiency when using iconCHAT. The user's geographical location and time of day are captured using sensors and this information is used to predict likely vocabulary choices given past usage in the sensed context.

Incorporating prosodic modifications from speech produced in noise into speech synthesis

Michael Everett, Eldar Sadikov, Rupal Patel

Individuals with severe speech impairments often rely on assistive devices that use computer generated synthetic speech to communicate. In everyday noise situations such as classrooms, restaurants, and busy streets, speech synthesizers are difficult to understand. Current synthesizers are merely output devices that are unaware of their acoustic environment. We built an adaptive speech synthesizer that can alter its speaking style in response to ambient noise.

Peer Attitudes Toward Personalized TTS
Intelligibility of synthesized speech vocoded using dysarthric vocalizations

Michael Everett, Anna Roden, Rupal Patel

Speakers with dysarthria who use assistive communication aids rely on Text-to-Speech (TTS) synthesis as a primary means of interaction. Although many devices offer a finite set of voices, clinicians tend to use the adult male voice given its superior intelligibility. Vocoding techniques were used to build a prototype synthesizer that incorporates the residual prosodic cues in dysarthric vocalizations to convey speaker identity. A group of typically developing children were recruited to assess the intelligibility of the personalized synthesizer compared to an unmodified version of the synthesizer. Word intelligibility was assessed using the Map task. Additionally, children were asked to complete an attitude survey in order to determine the effect of personalization on social acceptability and technology adoption.

Perception of Typical Children's Prosody
Identifying prosodic contrasts in utterances produced by children

Julie Brayton, Rupal Patel

While children at different stages in development may attempt to signal prosodic contrasts such as question versus statement or contrastive stress, it is unclear whether the attempted acoustic signals are sufficient for listeners to perceive their intentions. This study aimed to determine if adult listeners can accurately identify prosodic contrasts produced by 4, 7, and 11-year-old children despite differences in the acoustic cues used by each age group.

Prosodic Strategies
Comparing local vs. global strategies for treating childhood motor speech disorders

Kate Connaghan, Rupal Patel

Prosodic modulation strategies are commonly used to improve intelligibility in motor speech disorders (dysarthria and apraxia of speech). These strategies may be implemented globally, across the entire utterance (e.g., slowed rate, increased loudness) or locally, with a focus on specific words or phrases (e.g. emphatic stress). A common goal of these strategies is to increase vowel space area (VSA) as reduced VSA due to vowel centralization has been documented in a number of populations with dysarthria and VSA is positively correlated with intelligibility. To date however, little is known about the impact of such strategies when used by children with motor speech impairment.

Using an audio-visual elicitation technique, this project investigates the impact of prosodic modulation on vowel acoustic and perceptual intelligibility in this population. These findings will inform both our understanding of the underlying nature of childhood motor speech disorders, as well as our understanding of the role of prosodic modulation in their treatment. Preliminary research conducted in our lab suggests that children with motor speech programming deficits improve vowel intelligibility when implementing some prosodic modulation strategies. This research is on-going, with data collection continuing with children with dysarthria secondary to motor disorders such as Down syndrome, cerebral palsy, traumatic brain injury and with children with apraxia of speech (CAS).

Prosody of Speech in Noise
The use of interactive dialog to assess speech modifications in noisy environments

Kevin Schell, Elyes Yaich, Rupal Patel

When people talk in noisy situations, they adapt many aspects of their speaking style. Little is known about the role of semantic information on these acoustic modifications. We studied whether healthy speakers differentially modify semantically salient versus non-salient words in noise using acoustic and perceptual measures. Participants were immersed in varying noise environments while playing in an interactive computer game to elicit naturalistic dialog.

Rate-Prosody Interaction
The effect of rate manipulation on prosodic cues in dysarthria

Katherine Alexander, Pamela Campellone, Rupal Patel

Traditional dysarthria interventions focus on improving speech sound articulation through rate reduction. While reduced rate has been noted to improve speech intelligibility, less is known regarding how it impacts the prosodic features of speech, including fundamental frequency (F0), intensity, and duration. A group of healthy controls and speakers with dysarthria were asked to produce short utterances at both their habitual rate and at a slow rate. Speech stimuli consisted of questions, statements and contrastive stress elicited in a naturalistic game. Acoustic analyses were conducted to determine the effect of rate reduction on the amount of prosodic variation in the speech signal. This line of work has implications for the design of effective assessments and interventions for speakers with dysarthria.

An icon-based AAC system for users with neurological impairments and severely limited mobility

Karl Wiegand, Rupal Patel

Icon-based assistive communication devices typically present users with arrays of semantic concepts that are concatenated to formulate messages. For users with motor impairment, navigating through these multilayered hierarchical arrays is slow and fatiguing. RSVP-iconCHAT is a system that leverages Rapid Serial Visual Presentation (RSVP) and frame-based semantics toward the design of a small-footprint, icon-based communication system that can be controlled with a single input signal without sacrificing vocabulary size.

Initial comparisons of message construction speed and complexity with a traditional, mouse-controlled array system showed that message complexity was comparable in both systems and construction speed was only twice as slow using a one-key system. We are currently replacing the one-key input with EEG-BCI detection of P300 brainwaves to further reduce motor fatigue and increase communication speed.

A continuous motion overlay module for icon-based assistive communication

Karl Wiegand, Ingrid Villalta, Rupal Patel

Individuals with severe speech impairments rely on augmentative and alternative communication (AAC) systems to convey the needs and desires. For touch-screen devices, users construct messages by touching words, letters, or icons. Since many of these users also have accompanying motor impairments, repetitive and precise movements can be slow and effortful. SymbolPath aims to enhance message formulation ease and speed by using continuous motion icon selection rather than discrete input.

SymbolPath is an overlay module that can be integrated with existing icon-based AAC systems to enable continuous motion icon selection. Message formulation using SymbolPath consists of drawing a continuous path through a set of desired icons. The system then determines the most likely subset of desired icons on that path and rearranges them to form a meaningful and grammatically complete messages.

A systematically designed icon library for daily communication

Ryan Ma, Ingrid Villalta, Rupal Patel

Over the past few decades, display technologies have become ubiquitous, which has contributed to a dramatic shift away from textual interfaces and toward graphical user interfaces. These trends may have serendipitous consequences for assistive communication, in which users rely on graphical symbols to convey their needs and desires. While numerous assistive communication symbol libraries exist, they vary in iconicity, style and breadth, and few have been systematically designed. We sought to address the need for a systematically designed, contemporary and aesthetically appealing symbol library of line drawings for everyday concepts.

The Symbonyms symbol set was designed to adhere to the principles of simplicity and learnability. Rather than using colors and cartoon metaphors, Symbonyms are black and white line drawings with sparse use of colors. Line weight, composition, and perspective are carefully considered design elements. Although the symbol library is continually being refined, an initial user study with the first 180 symbols was conducted to assess comprehensibility. User responses provided feedback for iterative design modifications. We are currently exploring ways to scale the design and assessment process using crowd-sourcing methods.

Tongue Twisters
Comparing speech errors produced by people with and without dysarthria

Heather Kember, Kate Connaghan, Anthony Formicola, Tara Srirangarajan, Rupal Patel

Although speech errors have been used as evidence for mapping the architecture of the healthy speech system, few have utilized them for investigating disordered speech. Speech errors can be produced spontaneously but also induced in a laboratory setting. This project is taking a methodology used extensively with healthy talkers, tongue twisters, and applying it to a new population: people with dysarthria. Given that diagnostic tools aim to define the constraints of the individual’s speech production system, tongue twisters may be an effective way to differentiate between communication disorders. One notable finding from recent tongue twister studies of healthy speakers is that words spoken with prominence are less error prone. Similar findings from individuals with dysarthria may provide the basis for speech interventions. We aim to compare speech errors produced by speakers with and without dysarthria, and to assess whether prominence protects against error in individuals with dysarthria.

Using Machine Learning to Classify Prosody
Statistical classification and clustering of prosodic control in dysarthric speakers

Tom DiCicco, Rupal Patel

Interfaces that utilize the (albeit limited) speech/sound producing abilities of dysarthrics would enable them to engage in richer interactions with the individuals around them. Prior to attempting to develop such interfaces, a greater understanding of the abilities of dysarthric speakers to consistently produce speech sounds was required. In an attempt to gain this understanding, we created two projects. In the first project, we applied several machine classification techniques (k-nearest neighbor and support vector machines with various kernel functions) to assess the ability of speakers to control certain prosodic features, specifically pitch and duration. In the second project, we made use of statistical clustering techniques with the goal of grouping and visualizing the data in a manner that provided novel information to clinicians. We have made use of a standard k-means clustering algorithm where the optimal number of clusters was estimated using the gap statistic. We also incorporated supervised clustering in order to make use of known class labels.

An adaptive Text-To-Speech (TTS) synthesizer that conveys user identity

Rupal Patel

Click here for

The goal of this project is to advance computerized speech synthesis methods so that they can better approximate the unique vocal characteristics of individual human speakers. To date, even state-of-the art text-to-speech (TTS) synthesis cannot capture the flexibility of the natural human voice. While voice quality may not matter for many TTS applications, it is essential for assistive communication aids which are meant to be an extension of the user. Over two million Americans have severe speech and motor impairments that require the use of an assistive communication aid with TTS-based output.

Synthetic voices on commercially available devices are not representative of the user along basic dimensions such as age, gender, rate of speech, and voice quality thus drawing unnecessary attention and detracting from the spoken message as well as impeding social integration. This project aims to harness the residual vocal control in the productions of individuals with severe speech impairment in order to adapt a text-to-speech synthesizer such that the resultant voice resembles that of the user.

Wizard of Oz Games to Assess Prosody
Harnessing residual vocal control in children with severe speech impairment

Alexia Salata, Bethany Schroeder, Elyes Yaich, Rupal Patel

This study examined whether children with severe speech impairment due to cerebral palsy can reliably and consistently control prosodic features and how they differ from healthy peers. An interactive computer game was used to engage children in producing vocalizations that drove the movement of characters on the screen. Perceptual and acoustic measures were taken to identify reliable prosodic cues. This work guided the development of therapies that target prosodic features to improve the effectiveness of natural communication, and the design of novel communication devices.

Word-Level Stress Contrasts in Dysarthria
Prosodic control of contrastive stress in severe speech impairment

Pamela Campellone, Rupal Patel

Despite severe speech impairment, many individuals continue to use vocalizations when interacting with familiar communication partners suggesting that reliable information is embedded in the speech signal. Previous work assumed that most of this information is encoded in speech sound segments. Our approach focused on identifying consistent prosodic features such as fundamental frequency (perceived as pitch), intensity (perceived as loudness), and duration. This study examined whether people with severe speech impairments could successfully manipulate syllable-level prosody in sentences that vary in contrastive stress using acoustic and perceptual measures.