Deep Chat Speech

Slug: deepchat

24054 characters 1901 words

#Speech

Demo Video

#textToSpeech

  • Type: true {
         voiceName?: string,
         lang?: string,
         pitch?: number,
         rate?: string,
         volume?: number
    }

When the chat receives a new text message - your device will automatically read it out.

voiceName is the name of the voice that will be used to read out the incoming message. Please note that different Operating Systems support different voices. Use the following code snippet to see the available voices for your device: window.speechSynthesis.getVoices()

lang is used to set the utterance language. See the following QA for the available options.

pitch sets the pitch at which the utterance will be spoken at.

volume set the volume at which the utterance will be spoken at.

[!INFO] Text to speech is using SpeechSynthesis Web API which is supported differently across different devices.

[!INFO] Your mouse needs to be focused on the browser window for this to work.

#Example

<deep-chat textToSpeech='{"volume": 0.9}'></deep-chat>

#speechToText

Transcribe your voice into text and control chat with commands.

webSpeech utilises Web Speech API to transcribe your speech.

azure utilises Azure Cognitive Speech Services API to transcribe your speech.

textColor is used to set the color of interim and final results text.

displayInterimResults controls whether interim results are displayed.

translations is a case-sensitive one-to-one mapping of words that will automatically be translated to others.

commands is used to set the phrases that will trigger various chat functionality.

button defines the styling used for the microphone button.

stopAfterSubmit is used to toggle whether the recording stops after a message has been submitted.

submitAfterSilence configures automated message submit functionality when the user stops speaking.

[!WARNING] Web Speech API is not supported in this browser.

#Example

<deep-chat speechToText='{ "webSpeech": true, "translations": {"hello": "goodbye", "Hello": "Goodbye"}, "commands": {"resume": "resume", "settings": {"commandMode": "hello"}}, "button": {"position": "outside-left"} }' ></deep-chat>

[!INFO] If the microphone recorder is set - this will not be enabled.

[!INFO] Speech to text functionality is provided by the Speech To Element library.

[!CAUTION] Support for webSpeech varies across different browsers, please check the Can I use Speech Recognition API section. (The yellow bars indicate that it is supported)

#Types

Object types for speechToText:

#WebSpeechOptions

  • Type: {language?: string}

language is used to set the recognition language. See the following QA for the full list.

[!WARNING] Web Speech API is not supported in this browser.

#Example

<deep-chat speechToText='{"webSpeech": {"language": "en-US"}}'></deep-chat>

[!NOTE] This service stops after a brief period of silence due to limitations in its API and not Deep Chat.

#AzureOptions

  • Type: {
         region: string,
         retrieveToken?: () => Promise<string>,
         subscriptionKey?: string,
         token?: string,
         language?: string,
         stopAfterSilenceMs?: number
    }
  • Default: {stopAfterSilenceMs: 25000 (25 seconds)}

This object requires region and either retrieveToken, subscriptionKey or the token properties to be defined with it:

region is the location/region of your Azure speech resource.

retrieveToken is a function used to retrieve a new token for the Azure speech resource. It is the recommended property to use as it can retrieve the token from a secure server that will hide your credentials. Check out the retrieval example below and starter server templates.

subscriptionKey is the subscription key for the Azure speech resource.

token is a temporary token for the Azure speech resource.

language is a BCP-47 string value to denote the recognition language. You can find the full list here.

stopAfterSilenceMs is the milliseconds of silence required for the microphone to automatically turn off.

[!INFO] To use the Azure Speech To Text service - please add the Speech SDK to your project. See EXAMPLES.

#Example

<deep-chat speechToText='{ "azure": { "subscriptionKey": "resource-key", "region": "resource-region", "language": "en-US", "stopAfterSilenceMs": 5000 } }' ></deep-chat>

Location of speech service credentials in Azure Portal:

Azure Credentials

[!CAUTION] The subscriptionKey and token properties should only be used for local/prototyping/demo purposes ONLY. When you are ready to deploy your application, please switch to using the retrieveToken property. Check out the example below and starter server templates.

#Retrieve token example

speechToText.speechToText = { region: 'resource-region', retrieveToken: async () => { return fetch('http://localhost:8080/token') .then((res) => res.text()) .then((token) => token); }, };

#TextColor

  • Type: {interim?: string, final?: string}

This object is used to set the color of interim and final results text.

#Example

<deep-chat speechToText='{"textColor": {"interim": "green", "final": "blue"}}'></deep-chat>

#Commands

  • Type: {
         stop?: string,
         pause?: string,
         resume?: string,
         removeAllText?: string,
         submit?: string,
         commandMode?: string,
         settings?: {substrings?: boolean, caseSensitive?: boolean}
    }
  • Default: {settings: {substrings: true, caseSensitive: false}}

This object is used to set the phrases which will control chat functionality via speech.

stop is used to stop the speech service.

pause will temporarily stop the transcription and will re-enable it after the phrase for resume is spoken.

removeAllText is used to remove all input text.

submit will send the current input text.

commandMode is a phrase that is used to activate the command mode which will not transcribe any text and will wait for a command to be executed. To leave the command mode - you can use the phrase for the resume command.

substrings is used to toggle whether command phrases can be part of spoken words or if they are whole words. E.g. when this is set to true and your command phrase is “stop” - when you say “stopping” the command will be executed. However if it is set to false - the command will only be executed if you say “stop”.

caseSensitive is used to toggle if command phrases are case sensitive. E.g. if this is set to true and your command phrase is “stop” - when the service recognizes your speech as “Stop” it will not execute your command. On the other hand if it is set to false it will execute.

#Example

<deep-chat speechToText='{ "commands": { "stop": "stop", "pause": "pause", "resume": "resume", "removeAllText": "remove text", "submit": "submit", "commandMode": "command", "settings": { "substrings": true, "caseSensitive": false }}}' ></deep-chat>

#ButtonStyles

This object is used to define the styling for the microphone button.

It contains the same properties as the MicrophoneStyles object and an additional commandMode property which sets the button styling when the command mode is activated.

#Example

<deep-chat speechToText='{ "button": { "commandMode": { "svg": { "styles": { "default": { "filter": "brightness(0) saturate(100%) invert(70%) sepia(70%) saturate(4438%) hue-rotate(170deg) brightness(92%) contrast(98%)" }}}}, "active": { "svg": { "styles": { "default": { "filter": "brightness(0) saturate(100%) invert(10%) sepia(97%) saturate(7495%) hue-rotate(0deg) brightness(101%) contrast(107%)" }}}}, "default": { "svg": { "styles": { "default": { "filter": "brightness(0) saturate(100%) invert(77%) sepia(9%) saturate(7093%) hue-rotate(32deg) brightness(99%) contrast(83%)" }}}}}, "commands": { "removeAllText": "remove text", "commandMode": "command" } }' ></deep-chat>

[!TIP] You can use the CSSFilterConverter tool to generate filter values for the icon color.

#SubmitAfterSilence

  • Type: true number

Automatically submit the input message after a period of silence.

This property accepts the value of true or a number which represents the milliseconds of silence required to wait before a messaget is submitted. If this is set to true the default milliseconds is 2000.

#Example

<deep-chat speechToText='{"submitAfterSilence": 3000}'></deep-chat>

[!CAUTION] When using the default Web Speech API - the recording will automatically stop after 5-7 seconds of silence, please take care when setting the ms property.

#Demo

This is the example used in the demo video. When replicating - make sure to add the Speech SDK to your project and add your resource properties.

<!-- This example is for Vanilla JS and should be tailored to your framework (see Examples) --> <div style="display: flex"> <deep-chat speechToText='{ "azure": { "subscriptionKey": "resource-key", "region": "resource-region" }, "commands": { "stop": "stop", "pause": "pause", "resume": "resume", "removeAllText": "remove text", "submit": "submit", "commandMode": "command" }}' errorMessages='{ "overrides": {"speechToText": "Azure Speech To Text can not be used in this website as you need to set your credentials."} }' style="margin-right: 30px" demo="true" ></deep-chat> <deep-chat speechToText='{ "commands": { "azure": { "subscriptionKey": "resource-key", "region": "resource-region" }, "stop": "stop", "pause": "pause", "resume": "resume", "removeAllText": "remove text", "submit": "submit", "commandMode": "command" }}' errorMessages='{ "overrides": {"speechToText": "Azure Speech To Text can not be used in this website as you need to set your credentials."} }' demo="true" ></deep-chat> </div>
URL: https://ib.bsb.br/deepchat
Reference: https://raw.githubusercontent.com/OvidijusParsiunas/active-chat/main/website/docs/docs/speech.mdx