Real-Time API Documentation

23 min

// // overview the elevateai real time api provides clients with low latency , highly accurate , real time transcription utilizing the secure websocket protocol the client application establishes a websocket connection with the real time api for each channel in a session one (1) websocket for a mono (single channel) audio stream two (2) websockets for a stereo (dual channel) audio stream communication occurs between the client application and the real time api via messages passed along the bidirectional websocket(s) at a high level, the client application sends the audio , in real time, across the bidirectional websocket(s) the real time api processes the audio as soon as a slight pause in speech is detected the real time api then sends the resulting transcription back to the client application , across each connected websocket, as a response message the client application should have a listener waiting to receive these response messages and consume them in the desired manner audio data is sent in binary form with pcm, g711u or g711a encoding all other websocket communication is done via text requests/responses in json format // // connecting to the elevateai real time real time api the request to establish websocket communication is done via a request to a wss uri the details required to establish the connection and properly communicate the binary audio data are included as route and query string parameters within the uri, with details provided below for authentication, the client's elevateai api token is passed via the encrypted header of the request the elements that make up the elevateai real time api websocket uri are below note the session identifier is critical for 2 channel audio handling the system depends on this parameter matching for the 2 websockets, which make up the 2 channels for the interaction additionally, it must be unique to the session base uri wss\ //api elevateai com // // full template wss\ //api elevateai com/v1/ {interactiontype} {interactiontype} / {languagetag} {languagetag} / {vertical} {vertical} ? session identifier session identifier = \<string> \<string> & channels channels = \<int> \<int> & channel index channel index = \<int> \<int> & participant role participant role = \<string> \<string> & participant name participant name = \<string> \<string> & codec codec = \<string> \<string> & sample rate sample rate = \<int> \<int> & bit depth bit depth = \<int> \<int> & bit rate bit rate = \<int> \<int> example agent channel wss\ //api elevateai com/v1/ audio audio / en en /default? session identifier session identifier = fe5f75b8 fa06 428b bxy2 7f0a0b6bf169 fe5f75b8 fa06 428b bxy2 7f0a0b6bf169 & channels channels = 2 2 & channel index channel index = 0 0 & participant role participant role = agent agent & codec codec = pcm pcm & sample rate sample rate = 16000 16000 example customer channel wss\ //api elevateai com/v1/ audio audio / en en /default? session identifier session identifier = fe5f75b8 fa06 428b bxy2 7f0a0b6bf169 fe5f75b8 fa06 428b bxy2 7f0a0b6bf169 & channels channels = 2 2 & channel index channel index = 1 1 & participant role participant role = customer customer & codec codec = pcm pcm & sample rate sample rate = 16000 16000 // // connection parameters headers parameter name path text valid inputs required? x api token attach as header any valid, activated elevateai api token required required route parameters all of these route parameters are of type string parameter name path text valid inputs required? interaction type v1/ {interactiontype} {interactiontype} audio required required language tag v1/audio/ {languagetag} {languagetag} any supported bcp 47 language identifier (w/o localization), i e en required required vertical v1/audio/en/ {vertical} {vertical} default required required query string parameters parameter name path text / description valid inputs required? session identifier ? session identifier session identifier = \<string> used to correlate one or more client connections related to the same interaction all connections for a single interaction must have matching session identifier guid recommended unique string required required channels & channels channels = \<int> the expected number of channels for a given session 1, 2 required required channel index & channel index channel index = \<int> used to indicate the channel associated with the client connection 0, 1 required required participant role & participant role participant role = \<string> the role label of the participant on the channel this label is used to identify the participant's role when processing with generative ai examples agent, customer, salesperson, client descriptive string required required participant name & participant name participant name = \<string> the name of the participant on the channel reserved for future use string optional optional codec & codec codec = \<string> the codec of the audio being transmitted the codec input is case insensitive pcm, g711u, g711a required required sample rate & sample rate sample rate = \<integer> sample rate of the audio being transmitted maximum value 48000 required required bit depth & bit depth bit depth = \<integer> bit depth of the audio being transmitted in bits 8, 16, 24 or 32 optional optional bit rate & bit rate bit rate = \<integer> audio bit rate of the audio being transmitted in bits per second (bps) any positive integer optional optional // // communication communications between the client application and the elevateai real time api occur via messages the client application sends request messages the elevateai real time api sends response messages the elevateai real time api processes the client application requests asyncronously as soon as the real time api has a response for the client application, it sends it along the websocket – meaning that a response will not necessarily be sent/received in response to the most recent request sequence diagram the following sequence diagram outlines the order and flow of the communication between the client application and elevateai real time api note the diagram is exemplary and is not meant to indicate all possible scenarios sequencediagram autonumber client application >>+elevateai rt api request establish websocket channel 0,1 elevateai rt api >>client application response channelconnected 0,1 elevateai rt api >>client application response sessionstarted client application >>elevateai rt api binary audio client application >>elevateai rt api request send metadata (optional) client application >>elevateai rt api binary audio loop every 30 seconds elevateai rt api >>client application response autosummary & sentiment end client application >>elevateai rt api binary audio client application >>elevateai rt api request sessionend elevateai rt api >>client application response sessionending elevateai rt api >>client application response sessionended sequence # format description 1 the initial request to establish a websocket connection with the real time api this is done once per channel – either 1 or 2 query parameters should be adjusted in the uri to indicate the details of each channel 2 text / json the real time api responds after a websocket connection request with a response indicating the connection has been established this is sent after each connection to the connected channels at that time 3 text / json once all channels for a session – 1 or 2 – have been established, the real time api responds with a sessionstarted response, which includes the assigned interactionidentifier 4, 6, 8 binary the client sends raw binary audio data to the real time api via the websocket in binary form data transmission continues throughout the duration of the session 5 text / json if desired, metadata can be submitted at any point after the session is established and before the sessionend request is sent metadata is retained when the session is migrated to the post call api environment after real time processing completes this can be sent on any session channel 7 text / json approximately every 30 seconds throughout the call, the real time api will provide an autosummary of the interaction up to that point the real time api will also generate a sentiment score across the interaction, indicating negative, neutral or positive sentiment a brief description of why the sentiment score was received is also included 9 text / json when the client is ready to terminate the transcription session, the sessionend request is submitted this can be sent on any session channel 10 text / json in response to the sessionend request, the real time api will reply with a sessionending response 11 text / json once the final autosummary is complete, the real time api will send a sessionended response this response will include the transcript and autosummary of the complete session // // request messages sending audio send the audio data as a binary message across the websocket the encoding and bit rate of the binary data must match the specifics included in the query parameters of the uri used to connect to the real time api typical blocks of audio are 20 250ms in duration all other request messages are of type text in json format metadata send this message as a text message in the format below once the session has been connected and initialized provided metadata is stored with the session and ingested into our post call system after the session is finalized for use in elevateai explore only the first metadata request message received by the real time api is processed additional metadata requests are ignored this message applies at the session level and only needs to be sent on one channel { 	"type" "metadata", 	"content" { 	 "category" "call centers", 	 "direction" "inbound", 	 "recorded" "2023 08 03t15 27 09z", 	 "audio" { 	 "extension" "3", 	 "dnis" "dnis123" 	 }, 	 "agent" { 	 "id" "716", 	 "name" "stephen jones", 	 "supervisor" { 	 "id" "49", 	 "name" "sabrina jones" 	 }, 	 "group" { 	 "id" "93", 	 "name" "jones group" 	 } 	 }, 	 "site" { 	 "id" "7741", 	 "name" "atlanta" 	 }, 	 "customer" { 	 "id" "844135", 	 "name" "xander", 	 "city" "los angeles", 	 "state" "california" 	 }, 	 "custom" { 	 "datetimes" { 	 "datetime1" "2023 08 03t15 27 09z" 	 }, 	 "texts" { 	 "text1" "california callers" 	 }, 	 "integers" { 	 "integer1" 84416 	 }, 	 "decimals" { 	 "decimal1" 344 15 	 } 	 } 	} } keep alive send this message in the format below during periods of no audio to ensure no timeout/disconnect occurs the elevateai real time api will timeout when there is no audio data transmitted on a websocket for 1 minute when sent with the required frequency, this feature also ensures that network components along the route do not timeout due to inactivity this message can be sent as frequently as needed for the client’s needs this message applies at the channel level { 	"type" "keepalive" } session end send this message as a text message in the format below during the session to initiate the session end procedure elevateai will respond to the client with a sessionending response indicating we are finalizing the session elevateai will finish transcribing any audio sent thus far, gather final transcript and summary, and send the client a sessionended response message this message applies at the session level and only needs to be sent on one channel { 	"type" "sessionend" } // // response messages the response messages are the means by which the elevateai real time api sends information to the client application the client application should have a listener on the websocket with a message handler for these various message response types channel connected this message is sent as soon as the parameters are validated and the connection is established this response messages is sent on all connected channels of a session { 	"type" "channelconnected", 	"content" { 	 "channelindex" "0", 	 "connected" "2025 04 23t06 42 45 3914884z" 	} } session started the session started message is sent to the client immediately the client connects all expected channels to the elevateai real time api all channels for a session must be connected before this message will be sent this response messages is sent on all channels of a session { 	"type" "sessionstarted", 	"content" { 	 "interactionidentifier" "e7b61ef8 4f92 4fc4 a75d d1b6309f2d83", 	 "started" "2025 04 23t06 42 45 3914884z" 	} } transcript results these results messages are sent as a text json message via the websocket as soon as the results have been processed by the elevateai real time api this response messages is sent on all channels of a session { 	"type" "punctuatedtranscript", 	"content" { 	 "sentencesegments" \[ 	 { 	 "participant" "participantone", 	 "starttimeoffset" 3397, 	 "endtimeoffset" 8487, 	 "phrase" "thank you for calling the customer service help line how may i be of service?", 	 "score" 1 	 } 	 ], 	 "redactionsegments" \[] 	} } generative ai autosummary the generative ai powered autosummary results are returned every 30 seconds once the transcription has begun this response message is sent across all channels of a session { 	"type" "summary", 	"content" { 	 "summary" "the autosummary of the call to this point appears here ", 	 "sentimentscore" "positive/neutral/negative", 	 "sentimentreason" "brief reason for the assigned sentiment score " 	} } session ending the sessionending message is sent as a response to the sessionend request from the client application the response indicates that the audio streaming and transcription are terminating for the session – and indicates that the elevateai real time api is starting the autosummary of the final transcript session ended the session finalized message is sent as the final message before the elevateai real time api closes the websocket connection the message is in response to the sessionend message and contains the full final transcript and autosummary this response messages is sent on all channels of a session { 	"type" "sessionended", 	"content" { 	 "interactionidentifier" "e7b61ef8 4f92 4fc4 a75d d1b6309f2d83", 	 "started" "2025 04 23t06 42 45 3914884z", 	 "ended" "2025 04 23t06 42 45 3914884z", 	 "durationinms" 19832, 	 "punctuatedtranscript" { 	 "sentencesegments" \[ 	 { 	 "participant" "participantone", 	 "starttimeoffset" 3397, 	 "endtimeoffset" 8487, 	 "phrase" "thank you for calling the customer service help line how may i be of service?", 	 "score" 1 	 }, 	 { 	 "participant" "participanttwo", 	 "starttimeoffset" 9538, 	 "endtimeoffset" 14487, 	 "phrase" "hello, i would like to change the features of my account ", 	 "score" 1 	 }, 	 { 	 "participant" "participantone", 	 "starttimeoffset" 15203, 	 "endtimeoffset" 19784, 	 "phrase" "i would be happy to review those options with you ", 	 "score" 1 	 } 	 ], 	 "redactionsegments" \[] 	 }, 	 "summary" "the final autosummary of the entire call appears here " 	} } need more help? contact the elevateai support team