How To Scan Audio Files

You can use Hydra Scanners to extract data from audio files or categorize the files based on their content. When Hydra receives an audio file, it automatically converts the file's audio to text. This extracted text is shown on the trainer UI. 

If you are using this feature with our Box integration, you can pass in the following parameters in your Box Skill invocation URL to configure how speech-to-text should work. 


Please note that if you provide incorrect values (e.g. providing 2 for audio_channels parameter when the file actually contains one audio channel) Hydra will render an empty output in the trainer UI.

Parameter Description
audio_encoding The name of the audio codec.
Below values are supported:
  • MP3
  • FLAC
  • LINEAR16
  • AMR
  • AMR_WB

See here for more details.

Defaults to MP3

audio_channels The number of audio channels in the input files

Defaults to 2
audio_sample_rate  Audio sample rate in hertz

Defaults to 44100

Supported Audio Encoding Parameter Values

Param Value Name Lossless Usage Notes
MP3 MPEG Audio Layer III No
FLAC Free Lossless Audio Codec Yes 16-bit or 24-bit required for streams
LINEAR16 Linear PCM Yes 16-bit linear pulse-code modulation (PCM) encoding
MULAW μ-law No 8-bit PCM encoding
AMR Adaptive Multi-Rate Narrowband No The sample rate must be 8000 Hz
AMR_WB Adaptive Multi-Rate Wideband No The sample rate must be 16000 Hz
OGG_OPUS Opus encoded audio frames in an Ogg container No Sample rate must be one of 8000 Hz, 12000 Hz, 16000 Hz, 24000 Hz, or 48000 Hz
SPEEX_WITH_HEADER_BYTE Speex wideband No The sample rate must be 16000 Hz
Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us