Our Technology

The Vocavio software analyses non-verbal cues from speech communication, such as tone of voice. This is then used to accurately assess the effectiveness of communication as well as coordination dynamics during training and operations.

Performance metrics include:

  • Communication performance
  • Effort levels
  • Speaker Ratio
  • Turn taking
  • Engagement levels
  • Workload flags
  • CRM event flags
  • Volatility levels (in the speech lab)

The performance data can then be visualised online or output via files for use with multiple tools available:

  1. Matlab Tool – a secure offline tool that is both portable and flexible. It integrates straight into the Maitlab environment.
  2. Webtool – a secure online webtool for fast processing and visualisation of speech dialog.
  3. Enterprise – the secure webtool application and Vocavio application interface can be deployed onto your own secured local network.

How it works

Speech analysis (non-verbal cues from speech dialog)

The speech analysis has been developed in a research lab based on Japanese telephone conversations.

There was a methodological breakthrough which looked at measuring speech prosody – which looks at the pitch, rhythm and tone of the speech there has been written. There has been algorithm development where the speech analysis is language-agnostic, so this means that it does not matter which language is being analysed.

There are high levels of ‘prosodic accommodation’*. This links in with speech prosody mentioned above. Prosodic accommodation is measured using the speech prosody scores and it is then compared with another speaker. If there is a higher level of engagement, affinity and balance, the speech prosody and prosodic accommodation will converge. Whereas if there is a lack of engagement (lack of understanding, stress, fatigue, workload, poor concentration, attitudinal shifts for example, changes in context of communication) then this will mean that the speech prosody and prosodic accommodation will diverge.   

*This aspect of speech science was first uncovered in the mid 1970s as part of research into CAT (communication and accommodation theory)

Effort levels

Effort levels are measured between two people by examining their pitch and energy. The values are determined depending how much these signals converge or diverge over the course of an interaction.

If a person is making more of an effort in their dialog (pitch and energy would be higher), this would mean that their accommodation value would be increased, resulting in an higher effort level. On the other hand, if a person is less engaged such as not showing the same urgency or desire to work in a team (pitch and energy would be lower), the accommodation value would be decreased, therefore leading to a decreased effort level.

Speaker Ratio

The speaker ratio is calculated by quantifying the speaker’s amount of speech during the dialog and relating to the total length of the dialog.

If there is no speaking, there is either listening or thinking time. The ‘optimal ratio’ is dependent on multiple factors such as the seniority of the team, experience level and the training task that is being undertaken.


The overlap is calculating using the percentage of time in a given dialog where the two speakers are talking at the same time.

If there are low levels of overlap, this can indicate that the two speakers are very established in taking turns in speak, resulting in better communication overall. Whereas, if there are high levels of overlap, this shows that there may be lots of talking and not enough listening, highlighting how communication could be lacking overall.