Overview

Transcription Features

10 min

// // introduction accurate transcription is essential for downstream analytics, making speech to text (stt) one of the most valuable capabilities in modern contact center solutions yet call center audio presents unique challenges – such as background noise, overlapping conversations, and telephony limitations that affect quality nice elevateai addresses these challenges with decades of research and billions of real world contact center interactions our platform delivers high accuracy conversational transcripts through two purpose built models cx model – optimized for contact center environments echo model – next generation transcription for global, multilingual use cases transcript formats available get phrase by phrase transcript get punctuated transcript this flexibility ensures seamless integration into analytics , compliance , and customer experience (cx) workflows // // transcription models what's new november 2024 → we introduced echo , our next generation transcription model built for accuracy at scale, echo delivers up to 40% higher accuracy than our original cx model, ensuring superior transcription quality across a wide range of use cases july 2025 → we expanded echo with the launch of echo real time transcription , enabling enterprises to capture conversations instantly with the same industry leading accuracy use our new fast start program to try echo today – no contract, no credit card required explore both real time and post call transcription with 10 free interactions per day transcription models at a glance choose the elevateai transcription model that best suits your workflow model best for key strengths cx model contact center environments optimized for multi speaker conversations <10% wer on call center audio real time sentiment & compliance insights echo model global, multilingual transcription 40% accuracy improvement over benchmarks 50+ languages & dialects with auto detection scales for high volume workflows when to use cx → customer service conversations, compliance/qa, and real time insights monitoring when to use echo → multilingual transcription, global media (interviews, podcasts, videos), and enterprise workflows at scale pro tip many enterprises combine both models – using echo for fast, accurate real time transcription across global operations and cx for complex contact center interactions enriched with enlighten ai and cx ai insights selecting the echo model to use the echo transcription model, add the body parameter model to your post declare and set the value to echo when declaring an interaction note currently, the echo model does not perform redaction future releases will continue to move echo towards feature parity with the cx transcription model; access the latest elevateai release notes switching between echo and cx models our original, purpose built cx model can be used by omitting the model parameter or setting it to cx // // pii redaction personally identifiable information (pii) such as social security numbers, credit card numbers, and cvv/cvc numbers is automatically redacted before a transcript is stored or returned important pii redaction is available only with the cx transcription model the echo model does not currently support pii redaction, and any media processed with echo will not be redacted when retrieving either a get phrase by phrase transcript or get punctuated transcript , details on each redacted segment – including timing, type, and confidence score – are also returned remember all source data is deleted immediately and promptly upon processing please refer to our data retention & security guidelines for additional details uidelines for additional details uidelines for additional details sample transcription output { "redactionsegments" \[ { "starttimeoffset" 1467720, "endtimeoffset" 1468030 "result" "cvv", "score" 0 98745 }, ] } // // speaker labels elevateai automatically associates each phrase in a transcript with a participant ( up to two speakers ) in both get phrase by phrase transcript and punctuated transcript docid\ lubknfsfjj9za79mm kdn transcript formats speaker diarization with speaker diarization , elevateai distinguishes between participants in a conversation – delivering cleaner transcripts and enabling more reliable analytics cx model → supports automatic speaker diarization for both mono (single channel) and stereo (dual channel) audio inputs echo model → recently expanded to support automatic diarization for stereo (dual channel) audio, with additional capabilities in progress to bring feature parity with the cx model the system distinguishes between speakers – labeling them as participantone and participanttwo – and tags each phrase accordingly key benefits of speaker diarization include improved transcript clarity → each speaker's statements are clearly attributed, making transcripts easier to read and review simplified post processing → accurate labels enable downstream automation for analytics, summarization, and role assignment enhanced compliance & qa → clear speaker separation helps meet regulatory requirements and streamlines quality assurance workflows note diarization currently supports two speakers (participantone, participanttwo) and is automatically applied when using supported models and audio formats channel labels for dual channel audio files – e g , a phone recording with the agent and customer on separate channels – elevateai automatically associates each phrase with the correct channel using stereo recordings does not affect processing speed or fees , making it a flexible option for contact center workflows leveraging stereo recordings does not impact processing speed or fees view pricing information learn more about audio, chat, and transcript processing times need more help? contact the elevateai support team