Beyond Automatic Speech Recognition: Who Said What in Massive Spoken Conversation Data Collections

D. Rao and B. McMahan
Joostware AI Research Corporation, California, United States

Keywords: Speaker Recognition, Spoken Conversation Understanding, Deep Learning, Metadata, Analysis

From meetings to court hearings to triage data gathered by SIGINT, the amount of spoken conversational data generated grows every day. To help meet the rising needs of processing and navigating these massive collections, our latest solution, DeepListener, focuses on understanding spoken conversational data. For both audio and video, DeepListener provides a system for ingesting, organizing, and querying large volumes of conversational data with both efficiency and precision. With DeepListener, we go beyond automatic speech recognition (ASR) to discover speakers, their affect states, and other valuable metadata. Through the application of machine learning, natural language processing, and deep learning techniques this metadata is automatically extracted from conversational content generating an additional layer of information on top of the transcript data allowing for an advanced level of analysis and archival. Developed by Joostware, an AI Research Consulting Company focused on human language technologies, DeepListener is currently being applied to build a radio/television news search service for journalists and fact-checkers via a grant by the Knight-Foundation.