Speech Processing & Conversational AI

A Peek Into the Startup Scene

Wednesday, March 7th, 2pm @EEB 132

Three distinguished alumni of the Ming Hsieh Department of Electrical Engineering return home to share their experiences in the startup world. Each presentation will give attendees insights into what they do and how they got there. Presentations will be followed by a discussion addressing the challenges that startups face. 

Dr. Samuel Kim – Senior Speech Specialist at Gridspace


The Gridspace Sift conversational search system is designed to search through billions of minutes of long-form, conversational speech data. The core technology allows for complex searches that combine semantic and signal information, and a method for executing constraints on time, logical structure, and metadata. The system utilizes specialized word and signal embeddings and methods to transform compound queries (boolean, set operations, and temporal operators) into matrix operations performed in a highly-distributed manner.


Samuel Kim is a speech scientist at Gridspace. His research focuses on understanding human behaviors and perceptions in human-computer and/or human-human interactions. In that regard, he studies signal processing and machine learning algorithms including video, audio, and text with applications to classification, detection, and information retrieval. He received his Ph.D. on Electrical Engineering from University of Southern California, Los Angeles, California in 2010 and has been with several research institutes and startups since then.

Dr. Kyu Jeong Han – Principal Machine Learning Scientist at Capio Inc.


Capio has been one of the leading research forces in industry lately, competing with IBM and Microsoft, in getting the word error rate of conversational speech recognition systems closer to that of human transcriptions. In this talk, we share the latest results of the Capio 2017 conversational speech recognition system on the industry standard Switchboard/CallHome test sets and how they square off with those from the other state-of-the-art systems. In addition, the perspectives to compare and contrast human and machine transcriptions will be given, leading open discussions on how far or close machines are compared to human ability in transcribing conversational speech.

Dr. Kyu Han is a Principal Machine Learning Scientist at Capio Inc., a Silicon Valley startup focusing on brining human-quality speech recognition to analytics, transcription and human computer interaction products. He has more than a decade experience in researching and developing speech technologies. Before joining Capio, Dr. Han held research positions at IBM T. J. Watson Research Center and Ford Research and Innovation Center.

Dr. Jangwon Kim – VP of Research at Canary Speech


Speech signal has rich information about the speaker’s state. I at Canary Speech develop healthcare applications, using speech and language technologies, in order to improve the quality of healthcare services. Specifically, we design speech data collection protocols, build mobile/web applications, and build back-end classification or regression/ranking models for the automatic judgment (or early warning) of cognitive diseases and impairment, motor control disorder and mental state. This talk will describe speech and language technology used in our system, as well as various issues and challenges in R&D in the healthcare industry.

Jangwon Kim is an expert in speech and language technology. Specifically, Kim is interested in multimodal signal processing, speech recognition, machine learning for speech processing, natural language processing, and speech production. Interested applications include robust Automatic Speech Recognition (ASR), affective computing, Human-Computer Interface (HCI), computational paralinguistics, healthcare, security, and defense.

Jangwon Kim received his M.S. and Ph.D. in Electrical Engineering from USC in 2010 and 2015, respectively. From 2015-2016, he was with Cobalt Speech and Language Inc. as a research scientist, where he worked on robust speech recognition, keyword spotting, speaker recognition, telephone speech analytics (entrainment), and healthcare applications of speech processing. Currently, he is the Vice President of Research at Canary Speech LLC, a start-up company focused on healthcare applications of speech technology, language technology, and machine learning. He is also a technical advisor at Ryencatchers from June 2017.

Kim was the winner of Interspeech challenges (machine learning challenge focused on speech signal) in 2012, 2014 and 2015. Kim received the Northern Digital Excellence Awards in 2014. Kim has been a program committee at a variety of speech research conferences, e.g., Speech Prosody, ACII, ISSP, IEEE TASLP, IEEE TAC, Interspeech. Kim has published over 30 papers in the area of speech processing, speech production, HCI, machine learning for speech signal, speech recognition, speaker recognition, and healthcare.