Testing Studio User Guide
Logging in
To enter Testing Studio, follow the steps below:
Navigate to https://ocp.ai/ and click Console, which is a unified entry point for managing all OCP® services.
Select the regional OCP Console URL that you are using.
Enter your email or OCP® username and click Sign In.
Enter your password and click Sign In.
If the credentials are correct, you are forwarded to the OCP Console® landing page.
If you have entered the correct credentials but are still unable to log in, it's possible that your account has been suspended. For further details and steps to resolve this issue, please refer to the Account suspension section.
Click Testing Studio on the sidebar on the left.
Managing Projects
Projects page shows a list of all the created projects. A project is a remote repository where test cases reside.
At your first login to Testing Studio, the Projects page is empty and looks as follows:
Create a Project
To create a new project:
Click + Add Project.
2. Fill in the project information in the dialog-box.
By default, the project is stored in the Cloud storage! If selecting Cloud, the Test Suites can be uploaded to the cloud directly there. In case of GitLab, the Test Suites are stored in the GitLab repository.
Name: The name of your project.
Group: The group the project belongs to.
Orchestrator Application: The application created in Orchestrator you will use for your project.
The list of available Orchestrator Applications is based on the group the user has selected, so selecting the group first is necessary!
If you want to store it in the a Git provider (GitLab, GitHub), toggle the bar accordingly and fill in the following information.
Git provider: Select GitLab or GitHub as your Git provider
Git URL: The URL of the remote repository with tests. Omilia currently uses GitLab
Git Token: An access token to GitLab
For further instructions on how to generate an access token, refer to the original Gitlab documentation.
Cloud storage | GitLab storage |
---|---|
3. Click Create. Your created projects are listed accordingly.
For ease of searching within a lengthy project list, utilize the Search field by typing the project name and pressing <Enter>.
Delete a Project
To delete a project, follow the procedure below:
Open the Projects page.
Select a project to delete and click the three dots icon.
3. Once clicked, confirm the deletion as requested below.
The project is deleted from the projects list.
Project Overview
To open a project, just click on it. An opened project looks as shown below:
When you open a project, you can navigate through the following tabs:
Test Suites (only for Cloud storage)
General Tab
The General tab displays the general information about a selected project, such as its name, the remote repository URL and the access token. This is basically the information you’ve entered when creating the page.
It is possible to modify the project general information any time. Click Save to save the changes if modified.
Cloud storage | GitLab storage |
---|---|
Test Suites
Test Suites are collection of test cases that are grouped for test execution purposes. In Testing Studio, Test Suites can be uploaded to the Cloud from your computer as a .zip file by drag and dropping it onto the drop zone, or clicking the Upload button. Once uploaded, Test Suites are listed under the tab accordingly.
If you want to edit any of the Test Suites, click the Download button, edit the Test Suit, and upload it back.
If using a Mac (OS system) compress the Test Suite file to .zip over terminal before uploading back to Testing Studio, otherwise the file may be rejected.
To compress on terminal use the following command:zip -r {output_file} {folder
}
Run History Tab
The Run History tab displays the current and historical test runs of the project.
Run ID: Test case unique identifier
Status: Test case run status. The following statuses are possible:
Started: The test case is currently running
Pending: The test case is pending. This might happen if there are no free workers for the test execution.
Revoked: The test case has been cancelled
Success: The test execution was successful
Failure: The test case failed to run
Run Type: Defines the type of the test case - Classic for regular test cases or Augmented for test cases which use LLM.
Created: The test case creation timestamp (starting date)+
dd/MM/yyyy - HH:mm:
e.g.: 21/07/2020 - 17.32Finished: the test case completion timestamp (ending date)
dd/MM/yyyy - HH:mm:
e.g.: 21/07/2020 - 17.39
Run Test Cases
To run test cases, follow the guidelines below:
Open a project and click Run Test.
2. Select the test cases to run by filtering them based on the following attributes.
Group: A group the test cases belong to. Group basically serves as a tag.
Utterance : Utterances included in a test case.
Description : Test case description.
Voice E2E Testing: When activated, this feature enables the tests to be executed by simulating a voice call rather than a chat. Please note that running voice tests can be a lengthy process, as the duration of each call test matches the actual call duration. To find out a call duration, go to Insights → Monitor → select a call and navigate to the Dialog Review section. By clicking the Play button, the dialog gets played out. Every voice test operates by essentially replicating each dialog in a similar manner.
Filtering supports the % symbol as a filter wildcard. For example:
balance%: matches all test cases starting with the word balance
%card%: matches all test cases containing the word card
%description: matches all test cases which ending with the word description
To run the whole project (which is basically the entire list of test cases), leave all the filter fields blank.
Filter is case insensitive. The filter value desc% will match Desc, desc and DESC.
When multiple filters are provided, the behavior is that of an AND boolean expression. The filtered test cases are the ones that meet all the filter criteria.
Run Augmented Test Cases
The Run Augmented Test Cases allows to expand the number of test case utterances by using different variations of them in order to cover more variants of utterances in scope of existing test cases. These variations are produced by LLM which takes the original utterance and creates different versions of it, and then returns new utterances to the dialog. For instance, if your original utterance is "What is my balance?", the LLM can formulate different versions such as:
"Could you please tell me the amount in my account?"
"Could you please provide my current account balance?"
"What amount do I have in my account?"
"What's the current status of my account?"
To run the augmented test cases, proceed as follows:
1. Open a project and click the Run Augmented Test Cases button.
Fill in the form that provides context for LLM to make your request more precise, and click the Run button when finished.
The attributes below are the same as in the Run Test feature and are also optional:
Group: A group the test cases belong to. Group basically serves a tag.
Utterance: Utterances the test case contains.
Description: Test case description
Read more about Group, Utterance and Description attributes and filtering them out in the Run Test chapter.
Apart from the regular Run Test attributes, the specific Run Augmented Test Cases attributes are the following:
Location: Select the location from the dropdown list.
Age: Define the age threshold from the dropdown list.
Adjust Formality: Choose how formal the utterances should be.
Custom Instructions: Add some extra information LLM should take into account while creating new utterances.
Once created, the augmented test case appears in the Run History board with the corresponding run type.
Test Execution Description
Testing Studio asynchronously runs test cases located in the /tests folder of the master branch of your git repo.
Make sure that /tests folder is located under the root folder of your git repo or upload ZIP file.
The way that the test cases are executed during a run is the following:
All test cases in a test suite are loaded and set to run.
All test cases both with and without a golden dialog start executing:
The test cases with golden dialogs are asserted and validated normally.
Test cases without golden dialogs are not asserted and defined as unavailable.
When loading test cases, Testing Studio does not distinguish between scenarios created by users inside scenarios.yml, individual test cases or test cases in test_cases.yml. If a generated test case is missing a golden dialog file (no matter if moved accidentally or deleted), it is treated as unavailable.
If an unavailable test case execution fails because of the other side (DiaManT closed the dialog unexpectedly), the test case is considered failed.
3. After execution has finished:
Unavailable test cases are stored inside output/Unavailable folder placed under the test suite directory.
Failed test cases are stored inside output/Failed directory, placed under the test suite directory.
The dialog that has executed the most steps is the one that gets written in the file. The equivalent validation error message is reported.
Report files contain successful, failed as well as unavailable test cases. All of them can be downloaded as Artifacts.
Download Test Run Artifacts
The test run details created after the run has been completed are called Artifacts. Testing Studio allows for downloading the artifacts to check the outcome of the run in detail.
To download artifacts of a selected test run, follow the steps below:
Open a project.
Click on a test run to open the run page.
3. Click Download Artifacts. The download starts.
Artifacts are downloaded as a .ZIP file containing results folder with two CSV files.
test_suites_report.csv: results report for each test suite. For example:
suite name, total, passed, failed, unavailable, status
Test suite 1, 10, 10, 0, 0, pass
Test suite 2, 20, 0, 10, 10, fail
....
total, 30, 10, 10, 10, fail
test_cases_report.csv: results report for each test case. For example:
test suite, test case, status, dialog id, golden dialog id, message
suite 1, dialog 1, pass, dialogid1 1234,
suite 1, dialog 2, unavailable, dialogid2 Unavailable,
suite 2, dialog 3, fail, dialogid3, 123, error message
...
suite x, dialog x, fail, dialogidx, 12345, error message
Output folder: created after each test run for every test suite. The output folder contains candidate golden dialog json files for failed test cases or for unavailable ones (which mean they have no golden dialogs):
test_suite
└── output
├── Failed
└── Unavailable
Before every new run, the output folder gets deleted. Make sure you handle generated data before starting a new test case execution.
Generation History tab
Generate Golden Dialog
Golden Dialog is a dialog used within the application which is generated as a JSON file containing all its utterances which can be used for testing purposes. Having a generated Golden Dialog, it is possible to proceed with using it for generating test cases. Basically, after running a particular test, you compare a Golden Dialog with the original dialog to define differences.
To generate a Golden Dialog, proceed as follows:
1. Select the dialog you want to generate a Golden Dialog for, and copy its ID. For example, you can find it in the Insights tab.
2. Paste it into a Generate Golden Dialog field, and click the +Generate Golden button to generate and download the Golden Dialog’s JSON file.
Generate Test Cases
The Generation History displays the current and historical test case generations of your project. It allows you to download your generated test cases and add them to your test suites.
To generate test cases, follow the steps below:
Open Generation History tab and click the +Generate button.
2. Set the necessary parameters by filling in the fields.
Group: Group which serves as a tag. It is assigned to all the test cases being generated.
Simulation Data: Simulation data which is included in the golden dialog. It will be automatically inserted in the generated test cases. Available values are:
Ani
Dnis
Overwrite existing test case file: By default, the existing test cases are not replaced by new ones. To replace the existing test cases with the newly generated ones, switch the toggle on.
3. Click Generate. The test case generation starts. The generated test cases are displayed in the Generation History tab.
Generation ID: Test case unique identifier
Status: Test case generation status. Available options:
Success: The test case generation was successful
Running: The test case generation is currently running
Pending: The test case generation is pending
Revoked: The test case has been cancelled
Failed: The test case generation has failed to run
Created :the creation timestamp (starting date)
dd/MM/yyyy - HH:mm:
e.g.: 21/07/2020 - 17.32Finished: the completion timestamp (ending date)
dd/MM/yyyy - HH:mm:
e.g.: 21/07/2020 - 17.39
4. To download generated test cases and add them to your repository, select a test case by clicking on it and click the Download Test Cases button. The test cases are downloaded locally as a ZIP file.
Insights Tab
The Insights tab offers an intuitive representation of the test results of your project.
By default, the result of 10 latest runs are shown. You can can change this option by selecting 20 or 30 latest runs from the dropdown list:
Hover over a graph to see a percentage information:
By default the following test cases are included into the graph results:
Passed: Successfully executed test cases
Failed: Test cases that failed to run
Unavailable: Test cases generated by Testing Studio and having no golden dialogs
The Pending or Revoked test cases are not included into this statistics.
The graph gives you a visual idea of the test execution trend.
The goal to strive for is to have all test runs green (successfully passed) and none grey or red.
You can exclude some of the test cases from the result by clicking on its status as shown below:
As you see, the Passed test cases are not included.
Under the graph you can see the test run list. To get a detailed information about a test run, click on it and the corresponding Test Run page opens.
Test Run Details
The page header contains a detailed information about a selected test run.
Run number: Test run ID number
Created at: Test run creation timestamp
Test Duration: Test run duration
Finished at: Test run completion timestamp
Status: Test run status
Download Artifacts: This button allows for downloading Artifacts. You can read more about Artifacts.
The graph in the header shows a brief statistics on the executed test cases:
Passed: The percentage of successfully executed test cases
Failed: The percentage of test cases that failed to run
Unavailable: The percentage of test cases having no golden dialogs
Tasks Tab
The Tasks tab displays the tasks list performed during a test run.
Name: Tasks names performed during a test run. Each task has a detailed log which is available in the Logs tab. The following tasks are usually executed:
Clone repository
Load configuration
Load test suites
Run test suites
Aggregate results
Status: Status of a task
Created: Task creation timestamp
Finished: Task completion timestamp
Results Tab
The Results tab displays the information about all test cases (Succeeded, Failed, Unavailable) per each test suite.
The same information is contained in the output folder of artifacts.
Click on a selected test suite to unfold the information about the test cases. Then click on a test case to get the test result information as shown below:
The test result may contain the following information:
Property | Description |
---|---|
Dialog ID | The dialog ID number |
Golden Dialog ID | The golden dialog ID number |
Assertion Errors | Errors occurred during asserting the expected and return outcome. The following error information is available:
|
General Errors | Errors occurred due to the system failures. E.g. The dialog unexpectedly ended. The following error information is available:
|
Warnings | May contain warnings, e.g. a delay warning due to the system high load |
To check a detailed information about the dialog used in the selected test case, click the Insights button.
Thus, you will be forwarded to the Dialog Review in Monitor as shown on the image below. For more information, check up Monitor User Guide.
Logs Tab
Tests are run outside Testing Studio by distributed workers. The Logs tab allows you to see detailed logs of each task executed during a test run.
The tasks' names are marked in green. The task logs are written in white.
The information in the Logs tab becomes available right after the test run starts and is updated in the real time.
Explore
The Explore feature allows you to simulate a conversation between your dialog application and LLM to explore your application’s boundaries by identifying areas that might not be yet supported or possible issues. This enables you to prevent potential problems while using the dialog application in a real conversation.
Start Exploration
To create an exploration session, follow the steps below:
Click the New Exploration button in the upper right.
Fill in the form and hit the Explore button when finished:
Select Domain: choose the domain from the dropdown list. For example, pick Banking (mandatory).
Adjust Formality: select the formality level from the dropdown list. The formality level could be Informal, Normal or Formal (mandatory).
Key-Value: add the Key-Value pair to provide some additional information (optional).
Scenarios: define the number of scenarios LLM will run to check the dialog application.
After clicking the Explore button, it might take some time for LLM and dialog application to communicate.
Once finished, you are redirected to the Exploration Results chart.
Exploration Results
The Exploration Results provide a summary of the exploration sessions, offering detailed information about the conversation between the LLM and your application in a chart view. It allows you to assess the comprehensiveness of the dialog application and identify any potential failure reasons.
Dialogs List
The Dialogs List comprises the conversations that have taken place between your dialog application and LLM.
In this section you can perform the following actions:
Click the View Conversation Details button to assess the actual conversational steps between your dialog application and LLM, its status (Success or Failure), and find out the detailed failure description with some suggestions.
Download the conversation by hitting the Download Conversation button.
Areas of Improvement
The Areas of Improvement chart delivers a report on areas where enhancements can be made to enrich your conversational experience. In the future, the user will be able to deploy an application that supports the areas of improvement that the tool has identified.
The Deploy App button is currently disabled.
Log out from Testing Studio
To log out from Testing Studio, click the User icon and select Log out: