Besides the plain text input, it is possible to use Speech Synthesis Markup Language (SSML) in the input text for prompts in OCP miniApps®, thanks to the Omilia TTS engine.
The SSML of the TTS engine is based on the W3C SSML specification, however, not all the SSML elements and/or their attributes are supported. This document defines the SSML elements and the attributes that can be used.
Table of Supported SSML Elements
Here is a table containing the SSML elements that are currently supported:
Description of SSML elements
speak
The speak element is the root element of SSML text. No attributes of speak are currently supported.
The use of the speak element is optional for the TTS engine, as it will be added in case it is missing. Note that generally this is not the case, because usually, it is a required element from other text-to-speech solutions.
Example
<speak>Hello world. How are you?</speak>
break
The break element is used to manually insert appropriate pauses or breaks in the speech output of the TTS Engine. The use of the break element is optional.
The <break> tag can take two attributes: time and strength. The time attribute specifies the duration of the pause in seconds or milliseconds, and the strength attribute specifies the relative strength of the pause. If no attributes are given, a medium strength pause (<break strength="medium"/>) will be assumed by default.
|
Attributes |
Description |
|---|---|
|
|
The duration of the break in seconds or milliseconds (e.g. "1.5s" or "300ms") |
|
|
The relative strength of the pause. Valid values are: |
Example
<speak>Let me think... <break time="0.8s"/> Ok... <break strength="medium"/> I think, I am ready</speak>
say-as
The say-as element is used to specify how text should be pronounced or interpreted in speech synthesis of TTS engine. The use of the say-as element is optional.
The <say-as> tag has a required attribute interpret-as, which is the main indicator of how the text should be verbalized. Currently, this is the only attribute supported.
|
Attributes |
Description |
|---|---|
|
|
Provides the main indication of how to verbalize the text |
|
|
Used in conjunction with some values of |
"interpret-as" values
You can check out the interpret-as values in detail below.
|
Value |
Description |
Example |
|---|---|---|
|
|
Both |
|
|
|
Both |
|
|
|
The |
|
|
|
The The optional attribute |
|
|
|
The Daytime marker can also be included at the end of the text: |
|
|
|
The |
|
|
|
The |
|
|
|
The Valid telephone numbers are considered up to fifteen(15) digit numbers without a plus( |
|
|
|
The |
|
prosody
The prosody element is used to specify the speaking rate of the tagged text in speech synthesis of the TTS engine. The use of the prosody element is optional.
The <prosody> tag currently has attributes that control the rate and the volume of the speech, which are both optional.
|
Attributes |
Description |
|---|---|
|
|
Controls the rate of the speech |
|
|
Controls the volume of the speech |
"rate" values
You can check out the rate values in detail below.
|
Value |
Description |
Example |
|---|---|---|
|
|
The |
|
|
|
A set of constant values that affect the speech rate. Valid values are:
|
|
"volume" values
You can check out the volume values in detail below.
|
Value |
Description |
Example |
|---|---|---|
|
|
The |
|
|
|
A set of constant values that control the speech volume. Valid values are:
|
|