DEEP OC Speech to Text

By DEEP-Hybrid-DataCloud Consortium | Created: - Updated:

services, docker

License: Apache 2.0

Build Status

This is a plug-and-play tool to train and evaluate a speech to text tool using deep neural networks. The network architecture is based in one of the tutorials provided by Tensorflow1. The architecture used in this tutorial is based on some described in the paper Convolutional Neural Networks for Small-footprint Keyword Spotting1. There are lots of different approaches to building neural network models to work with audio including recurrent networks or dilated convolutions. This model is based on the kind of convolutional network that will feel very familiar to anyone who's worked with image recognition. That may seem surprising at first though, since audio is inherently a one-dimensional continuous signal across time, not a 2D spatial problem. We define a window of time we believe our spoken words should fit into, and converting the audio signal in that window into an image. This is done by grouping the incoming audio samples into short segments, just a few milliseconds long, and calculating the strength of the frequencies across a set of bands. Each set of frequency strengths from a segment is treated as a vector of numbers, and those vectors are arranged in time order to form a two-dimensional array. This array of values can then be treated like a single-channel image and is known as a spectrogram. These spectograms will be the input for the training.

Run locally on your computer

Using Docker

You can run this module directly on your computer, assuming that you have Docker installed, by following these steps:

$ docker pull deephdc/deep-oc-speech-to-text-tf
$ docker run -ti -p 5000:5000 deephdc/deep-oc-speech-to-text-tf

Using udocker

If you do not have Docker available or you do not want to install it, you can use udocker within a Python virtualenv:

$ virtualenv udocker
$ source udocker/bin/activate
$ git clone
$ cd udocker
$ pip install .
$ udocker pull deephdc/deep-oc-speech-to-text-tf
$ udocker create deephdc/deep-oc-speech-to-text-tf
$ udocker run -p 5000:5000  deephdc/deep-oc-speech-to-text-tf

Once running, point your browser to and you will see the API documentation, where you can test the module functionality, as well as perform other actions (such as training).

For more information, refer to the user documentation.