Working Out Loud! Speech to Text in Storyline 360

This blog post is my experimental exploration of integrating speech recognition in Storyline 360 using JavaScript. The content has been created with the assistance of ChatGPT, an AI language model, and as such, I cannot provide direct support or troubleshooting for the code presented. However, I hope this post inspires you to delve into the possibilities of speech recognition and sparks your creativity in building interactive e-learning experiences.

What was my goal?

This experiment aimed to see if I could populate a Storyline text variable by having learners record themselves speaking into their computer’s microphone instead of asking them to type in something from their keyboard.

What worked and what didn’t?

Issue #1: ChatGPT did not include Storyline variables (var player = GetPlayer(); and player.SetVar when generating the JavaScript code. – Fixed

Initially, there were some issues with ChatGPT not recognizing it had to pull and push information into Storyline variables when creating the JavaScript code. After a few corrections in my prompts and providing Storyline-specific information to ChatGPT, it understood what code it needed to properly push information recorded by the microphone into the SL_transcript variable.

Issue #2: Overwriting the SL_transcript variable – Fixed

I discovered that recording my voice would update the SL_transcript variable, but if I paused for a moment, it would overwrite what was already there. I wanted the script to allow me to pause and then just append text to what I had previously recorded.

Issue #3: Stopping the Recording – Unable to Fix

I originally had a Stop Recording button with JavaScript code, but I could never get this to work. No matter which code ChatGPT provided, it would break the prompt to allow access to the microphone. In other words, after clicking the Stop Recording button, it would continue to record what was spoken. Due to time and my lack of JavaScript knowledge, I decided to stop spending more time on this and settle with the ability to append what was spoken.

Issue #4: Real-time display of transcription as it was being spoken – Unable to Fix. Not sure if this is possible at this time from within Storyline. I know it is possible to use an API when executed from a website page.

I would have liked to record what I was saying in real-time, but I could not figure out how to make that work. So, the text is generated after you finish speaking into the microphone and pause.

It Starts with a Single Storyline Variable

To create this example, I created a single text variable in Storyline 360 that was named SL_transcript. I added a Start Recording button to execute the JavaScript, which I added via a Storyline trigger. I then attached the following code to the Start Recording button.

// Get the player object
var player = GetPlayer();

// Define the variable name in Storyline
var recognizedSpeechVar = "SL_transcript";

// Create a new SpeechRecognition object
var recognition = new webkitSpeechRecognition(); // Chrome uses the 'webkit' prefix

// Set the language for speech recognition
recognition.lang = 'en-US'; // Specify the language

// Enable continuous speech recognition
recognition.continuous = true;

// Create a new SpeechSynthesisUtterance object for updating the transcript
var utterance = new SpeechSynthesisUtterance();

// Variable to store the speech-to-text transcript
var transcript = '';

// Event handler for capturing interim results (partial transcriptions)
recognition.onresult = function(event) {
  var interimTranscript = '';
  for (var i = event.resultIndex; i < event.results.length; ++i) {
    if (event.results[i].isFinal) {
      transcript += event.results[i][0].transcript + ' ';
    } else {
      interimTranscript += event.results[i][0].transcript;
    }
  }
  player.SetVar(recognizedSpeechVar, transcript);
  utterance.text = interimTranscript;
  speechSynthesis.speak(utterance);
};

// Start speech recognition
recognition.start();

Explanation of JavaScript Code

// Get the player object
var player = GetPlayer();

This line retrieves the reference to the player object in the Storyline environment. It allows you to interact with the Storyline course and access its variables and functions.

// Define the variable name in Storyline
var recognizedSpeechVar = “SL_transcript”;

This line assigns the name of the variable in Storyline where the speech-to-text transcript will be stored. In this case, it is set to “SL_transcript.”

// Create a new SpeechRecognition object
var recognition = new webkitSpeechRecognition();

This line creates a new instance of the SpeechRecognition object, which provides speech recognition capabilities. In this case, it uses the ‘webkit’ prefix because this code is intended to work with Chrome.

// Set the language for speech recognition
recognition.lang = ‘en-US’;

This line sets the language for speech recognition. In this case, it is set to US English (‘en-US’).

// Enable continuous speech recognition
recognition.continuous = true;

This line enables continuous speech recognition, allowing the recognition to continue listening for speech even after the user pauses.

// Create a new SpeechSynthesisUtterance object for updating the transcript
var utterance = new SpeechSynthesisUtterance();

This line creates a new instance of the SpeechSynthesisUtterance object, which is used for speech synthesis. It allows you to generate speech from text.

// Variable to store the speech-to-text transcript
var transcript = ”;

This line declares a variable called ‘transcript’ that will store the speech-to-text transcript. It is initially set to an empty string.

// Event handler for capturing interim results (partial transcriptions)
recognition.onresult = function(event) {
var interimTranscript = ”;
for (var i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
transcript += event.results[i][0].transcript + ‘ ‘;
} else {
interimTranscript += event.results[i][0].transcript;
}
}
player.SetVar(recognizedSpeechVar, transcript);
utterance.text = interimTranscript;
speechSynthesis.speak(utterance);
};

This block of code sets up an event handler for capturing speech recognition results. When the recognition engine detects speech, this event handler is called.

Inside the event handler is a loop that iterates over the speech recognition results. If the result is final (completed utterance), the transcript variable appends the new speech to the existing transcript with a space character. If the result is not final (interim result), the interimTranscript variable captures the partial transcriptions.

After updating the transcript, the code sets the value of the Storyline variable SL_transcript using player.SetVar(recognizedSpeechVar, transcript). Then, it assigns the partial transcriptions to the utterance.text property.

Finally, speechSynthesis.speak(utterance) triggers the speech synthesis engine to speak the partial transcriptions using the browser’s built-in text-to-speech capabilities.

// Start speech recognition
recognition.start();

This line starts the speech recognition process, initiating the microphone prompt and enabling the recognition of speech input.

Final Result

Overall, I think this code worked and achieved the goal of converting what I said into my microphone to text and storing it in the SL_transcript text variable I created in Storyline 360. It appears to work in Google Chrome and Microsoft Edge at the moment.

I would have liked to figure out how to stop the recording, but that is something that others who read this post might be able to solve. In the meantime, I hope you found this useful.

Final Result (In All It’s Glory)!

Turn on your audio for this demonstration, during which I will show you how this experiment turned out.

Comments

Gabriel Barrientos

May 31, 2024 at 9:43 am

Hi Richard! I think this is brilliant. Thank you so much for sharing your knowledge. I have a question regarding the visualization of the transcript. Did you created a box and then just print the transcript variable on it? I’d love to know how you made the speech appear on the screen. Thanks!

- Richard
  
  May 31, 2024 at 10:38 am
  
  Gabriel,
  
  Thanks for stopping by and comment. This text that appears on the screen is a Storyline variable (text). Hope that helps!
  
  Richard
  
Njoud

June 9, 2024 at 2:44 am

Thank you so much
It’s very helpful and your explanation is very clear

- Richard
  
  June 12, 2024 at 2:48 pm
  
  Njoud,
  You are welcome.
  Richard
  
Stephen

June 11, 2024 at 8:35 pm

Can you share a Storyline file? I can’t figure out why I can’t get to work like yours. Thanks!

- Richard
  
  June 12, 2024 at 2:49 pm
  
  Stephen,
  
  I’ll do my best to see if I can find this older project. I am working on a few large projects, so it might be a few weeks before I can dig thru my archives. My apologies!
  
  Richard
  
Taner Yiğit

September 18, 2024 at 5:19 am

hi mr. richards,
Is there a way you can elobrate on showing the results on the screen?

Roland

October 1, 2024 at 7:47 am

Hi Richard,

Thanks for the inspiration. Would you happen to know how you can stop the recording function afterwards? The ‘abort’ or ‘stop’ executions in Javascript do not seem to work.

Regards,
Roland