Besides the plain text input, it is possible to use Speech Synthesis Markup Language (SSML) in the input text for prompts in OCP miniApps®, thanks to the Omilia TTS engine.
The SSML of the TTS engine is based on the W3C SSML specification, however, not all the SSML elements and/or their attributes are supported. This document defines the SSML elements and the attributes that can be used.
Table of Supported SSML Elements
Here is a table containing the SSML elements that are currently supported:
Description of SSML elements
speak
The speak element is the root element of SSML text. No attributes of speak are currently supported.
The use of the speak element is optional for the TTS engine, as it will be added in case it is missing. Note that generally this is not the case, because usually, it is a required element from other text-to-speech solutions.
Example
<speak>Hello world. How are you?</speak>
break
The break element is used to manually insert appropriate pauses or breaks in the speech output of the TTS Engine. The use of the break element is optional.
The <break> tag can take two attributes: time and strength. The time attribute specifies the duration of the pause in seconds or milliseconds, and the strength attribute specifies the relative strength of the pause. If no attributes are given, a medium strength pause (<break strength="medium"/>) will be assumed by default.
|
Attributes |
Description |
|---|---|
|
|
The duration of the break in seconds or milliseconds (e.g. "1.5s" or "300ms") |
|
|
The relative strength of the pause. Valid values are: |
Example
<speak>Let me think... <break time="0.8s"/> Ok... <break strength="medium"/> I think, I am ready</speak>
say-as
The say-as element is used to specify how text should be pronounced or interpreted in speech synthesis of TTS engine. The use of the say-as element is optional.
The <say-as> tag has a required attribute interpret-as, which is the main indicator of how the text should be verbalized. Currently, this is the only attribute supported.
|
Attributes |
Description |
|---|---|
|
|
Provides the main indication of how to verbalize the text |
|
|
Used in conjunction with some values of |
"interpret-as" values
You can check out the interpret-as values in detail below.
|
Value |
Description |
Example |
|---|---|---|
|
|
Both |
|
|
|
Both |
|
|
|
The |
|
|
|
The The optional attribute |
|
|
|
The Daytime marker can also be included at the end of the text: |
|
|
|
The |
|
|
|
The |
|
|
|
The Valid telephone numbers are considered up to fifteen(15) digit numbers without a plus( |
|
|
|
The |
|
phoneme
Use the <phoneme> tag to provide a phonetic pronunciation for specific words or phrases. This is useful for words with ambiguous pronunciation, proper names, or acronyms where standard pronunciation rules might fail.
The <phoneme> tag supports two attributes:
|
Attributes |
Description |
|---|---|
|
|
Specifies the phonetic alphabet to use. Supported values are |
|
|
Specifies the phonetic string that represents the pronunciation. |
IPA examples
The following examples demonstrate how to use the International Phonetic Alphabet (IPA):
This is <phoneme alphabet="ipa" ph="ˌmænɪˈtoʊbə">manitoba</phoneme>
This is <phoneme alphabet="ipa" ph="ˈpɑːpjəlɚ">popular</phoneme>
This is <phoneme alphabet="ipa" ph="ˈbʌbəl">bubble</phoneme>
This is <phoneme alphabet="ipa" ph="ˈkɹaʊn">crown</phoneme>
This is <phoneme alphabet="ipa" ph="ˈɡɹeɪvliː">gravely</phoneme>
This is <phoneme alphabet="ipa" ph="ˈmæpəŋ">mapping</phoneme>
This is <phoneme alphabet="ipa" ph="ˈliːʒɚ">leisure</phoneme>
This is <phoneme alphabet="ipa" ph="juːˈniːk">unique</phoneme>
This is <phoneme alphabet="ipa" ph="ˈtʃɔɪs">choice</phoneme>
This is <phoneme alphabet="ipa" ph="ˈvɪʒən">vision</phoneme>
This is <phoneme alphabet="ipa" ph="həˈloʊ">hello</phoneme>
This is <phoneme alphabet="ipa" ph="ˈbʌtər">butter</phoneme>
X-SAMPA examples
The following examples demonstrate how to use the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA):
This is <phoneme alphabet="x-sampa" ph='m@"hA:g@%ni:'>mahogany</phoneme>
This is <phoneme alphabet="x-sampa" ph='"pApj@l@r'>popular</phoneme>
This is <phoneme alphabet="x-sampa" ph='bVb@l"'>bubble</phoneme>
This is <phoneme alphabet="x-sampa" ph='kr\\aUn"'>crown</phoneme>
This is <phoneme alphabet="x-sampa" ph='gr\\eIvli:"'>gravely</phoneme>
This is <phoneme alphabet="x-sampa" ph='m{p@N"'>mapping</phoneme>
This is <phoneme alphabet="x-sampa" ph='li:Z3r'>leisure</phoneme>
This is <phoneme alphabet="x-sampa" ph='ju:"ni:k'>unique</phoneme>
This is <phoneme alphabet="x-sampa" ph='tSOIs'>choice</phoneme>
This is <phoneme alphabet="x-sampa" ph='vIZ@n"'>vision</phoneme>
This is <phoneme alphabet="x-sampa" ph='h@"loU'>hello</phoneme>
This is <phoneme alphabet="x-sampa" ph='"bVt@'>butter</phoneme>
prosody
The prosody element is used to specify the speaking rate of the tagged text in speech synthesis of the TTS engine. The use of the prosody element is optional.
The <prosody> tag currently has attributes that control the rate and the volume of the speech, which are both optional.
|
Attributes |
Description |
|---|---|
|
|
Controls the rate of the speech |
|
|
Controls the volume of the speech |
"rate" values
You can check out the rate values in detail below.
|
Value |
Description |
Example |
|---|---|---|
|
|
The |
|
|
|
A set of constant values that affect the speech rate. Valid values are:
|
|
"volume" values
You can check out the volume values in detail below.
|
Value |
Description |
Example |
|---|---|---|
|
|
The |
|
|
|
A set of constant values that control the speech volume. Valid values are:
|
|