Hey Jarvis, what's up? A voice assistant to monitor a target application // NeuroSnippets

Can’t you just tell me?

Let’s be frank: we are always running a lot of stuff, we constantly juggle multiple flows of data processing, on top of hundreds of tabs in the browser, email clients, drafts, honestly my head starts spinning just if I keep following this train of thought. I always feel that this kind of ordinary and apparent multi-tasking approach always takes its toll on focus. When it comes to deep focus, I personally believe that there is no solution: one has to avoid any context switching. But for shallow focus, wouldn’t it be nice to check out how the data processing is doing without looking for the right terminal, finding it, and forgetting what you were doing before? Why you can’t just ask?

Is it a crazy idea though? It has been already five years from those times when making a personal assistant could sound like a crazy challenge. These days, with so many names that TV and videos are forbidden to say aloud ("Alexa?", “Cortana?”, “Ok Google?"), making a simple voice assistant is straightforward as ever (while we still may be quite far from intelligent conversations). And as a matter of fact, in this post I show how to put together a voice assistant to get updates from some script running! No more context switching between browser, terminal, Illustrator, VNC, … (the list is long) All the related scripts are on the NeuroSnippets repository.

Hear no evil, speak no evil

As we are looking for a quick way to implement a voice assistant, we will rely on two packages that makes the experience painless (or almost painless, if sometimes your voice is not recognize correctly). To vocalize the messages, we will use pyttsx3. To recognize the text from the sound, we will use SpeechRecognition. There are several examples online showing how to use this duo to make a virtual assistant that does all sorts of things (see the references at the end!).

For a change (not really), let’s start by importing the necessary packages and defining the two main functions:

import speech_recognition as sr
import pyttsx3


def speak(text):
    engine = pyttsx3.init()
    engine.say(text)
    engine.runAndWait()
    
    
def get_audio():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        audio = r.listen(source)
        said = ""

        try:
            said = r.recognize_google(audio)
            print(said)
        except Exception as e:
            print("Exception: " + str(e))

    return said.lower()

The speak() function cannot be more straightforward than this: you give it some text, it initializes the pyttsx3 engine, say() queues up the text, and then runAndWait() makes you hear it! The other function, get_audio() has a bit more going on: using the Microphone() as a source, a Recognizer() object temporarily records some sounds and then tries to make sense of it through recognize_google(); if it manages to, we write that on screen (it is useful to understand mispronunciation cases) and we return it as a lower-case string.

Next, we need to define some keywords and catchphrases:

wake_word = "hey jarvis"
queries = ["what is the status", "what is going on",
           "how is going with", "any update with"]
quit = "thank you"

I couldn’t resist the Iron Man fashion. We can of course have multiple lists of queries, depending on the tasks we want to implement, but here we are keeping things simple. We can now pull everything together:

while True:
    print("Listening...")
    text = get_audio()

    if text.count(wake_word) > 0:
        for phrase in queries:
            if phrase in text:
                filename = text.split(' ')[-1]+'.txt'
                try:
                    file_handle = open(filename,'r')
                    last_update = file_handle.readlines()[-1]
                    file_handle.close()
                    speak("The current status is: " + last_update)
                except IOError:
                    speak("I cannot find " + filename)
        else:
            if quit in text:
                speak("Goodbye")
                break

The use of an infinite loop will avoid the need to restart the script each time. For each iteration, we collect some sounds, and we check for the wake_word: if we called Jarvis, the next step will be seeing if it was followed by any of the expected queries. This is an important point for how this will work: implemented like this, Jarvis expects that we ask what we want in one sentence (e.g. “Hey Jarvis, what is going on with …?"). If a sentence is matched with an expected one, the last word of the sentence is retrieved and use to open a text file. Again, another important point for how this works: this simplification means that everytime we ask for updates, the name of the output file for our target application needs to go last. Once that file is read, the last line is retrieved and we finally get the update we want. If by any chance we referred to a file that does not exist, Jarvis will let us know. After the forloop is done, if by any chance we said “Thank you”, an else statement will release Jarvis from his duty. We’re done!

Jarvis, can you hear me?

To test Jarvis, I wrote down a short script that downloads some diffusion data using dipy and then process everything using MRtrix3. Importantly, I made sure that at each step the script prints on the screen what is going on:

#!/bin/bash

echo "Downloading data"
python -c "from dipy.data import fetch_sherbrooke_3shell;fetch_sherbrooke_3shell()"

echo "Converting files"
mrconvert -fslgrad ~/.dipy/sherbrooke_3shell/HARDI193.bvec \
    ~/.dipy/sherbrooke_3shell/HARDI193.bval ~/.dipy/sherbrooke_3shell/HARDI193.nii.gz \
    dwi.mif

echo "Creating a mask"
dwi2mask dwi.mif mask.mif

echo "Estimating response function"
dwi2response dhollander dwi.mif wm.txt gm.txt csf.txt

echo "Reconstructing fiber orientation distribution"
dwi2fod -mask mask.mif msmt_csd dwi.mif wm.txt wm.mif \
    gm.txt gm.mif csf.txt csf.mif

echo "Tracking"
tckgen -select 100000 -seed_image mask.mif wm.mif track.tck

echo "Done!"

To run it in a way that Jarvis can monitor it, we need to redirect the output to a text file:

./run.sh > pipeline.txt

It’s running! We can now test Jarvis running:

python jarvis.py

Now you can just say something like “Hey Jarvis, what is going on with the pipeline?” while browsing Twitter!

NeuroSnippets

Matteo Mancini

Hey Jarvis, what's up? A voice assistant to monitor a target application

Can’t you just tell me?

Hear no evil, speak no evil

Jarvis, can you hear me?

Useful references