Asterisk Integration โ€” Developer Guide

Comprehensive technical documentation for the Asterisk PBX voice AI integration in the InComIT Solution customer support system. This guide covers architecture, protocols, services, configuration, and deployment.

Last updated: January 2025  |  Platform: .NET 10, Blazor Server  |  Branch: with_incom_cloud

1. Architecture Overview

The system integrates Asterisk PBX with a Blazor Server application to provide AI-powered voice customer support. When a customer calls the Asterisk PBX, the call is routed to our application which handles real-time speech processing.

High-Level Architecture

Asterisk Integration Architecture Diagram

The diagram above shows all components and how they connect. Two parallel communication paths link the Asterisk PBX to the Blazor application: ARI (WebSocket + REST) for call control, and AudioSocket (raw TCP) for bidirectional audio streaming.

Two Communication Channels

Channel Protocol Purpose Port
ARI (Asterisk REST Interface) WebSocket + REST HTTP Call control โ€” answer, bridge, hangup, events 8088
AudioSocket Raw TCP Bidirectional audio streaming โ€” raw PCM voice data 9092
๐Ÿ’ก Why Two Channels? ARI handles call signaling (who called, when to answer, when to hang up), while AudioSocket handles the actual voice audio. Think of ARI as the phone's screen and AudioSocket as the speaker/microphone.

2. Technology Stack

ComponentTechnologyVersion
Runtime.NET10.0
UI FrameworkBlazor Server (Interactive SSR)โ€”
AI EngineMicrosoft Semantic Kernel + OpenAI1.67.1
Speech-to-TextGoogle Cloud Speech V2โ€”
Text-to-SpeechGoogle Cloud TextToSpeech V1โ€”
DatabaseMySQL/MariaDB via MySqlConnector2.5.0
PBXAsterisk PBX18+ (AudioSocket support)
Audio ProtocolAudioSocket (Asterisk-specific)โ€”
Call ControlARI (Asterisk REST Interface)โ€”

3. File Map

All files related to the Asterisk integration:

SpeechToTextWithGoogle/ โ”œโ”€โ”€ Models/ โ”‚ โ”œโ”€โ”€ AsteriskSettings.cs NEW Configuration POCO โ”‚ โ”œโ”€โ”€ CallSession.cs NEW Session model + CallState enum โ”‚ โ””โ”€โ”€ AriModels.cs NEW ARI event JSON models โ”œโ”€โ”€ Services/ โ”‚ โ”œโ”€โ”€ CallSessionManager.cs NEW Singleton Thread-safe session tracking โ”‚ โ”œโ”€โ”€ SpeechPipelineService.cs NEW Singleton STT โ†’ AI โ†’ TTS orchestrator โ”‚ โ”œโ”€โ”€ AsteriskDbQueryPlugin.cs NEW Semantic Kernel DB plugin โ”‚ โ”œโ”€โ”€ AudioSocketListener.cs NEW BackgroundService TCP server โ”‚ โ”œโ”€โ”€ AsteriskAriService.cs NEW BackgroundService ARI WebSocket client โ”‚ โ””โ”€โ”€ ConversationStorageService.cs NEW Singleton Saves conversations with AI classification โ”œโ”€โ”€ Controllers/ โ”‚ โ””โ”€โ”€ AsteriskController.cs NEW REST API (7 endpoints) โ”œโ”€โ”€ Components/Pages/ โ”‚ โ”œโ”€โ”€ AsteriskTestCall.razor NEW Test call simulator UI โ”‚ โ”œโ”€โ”€ AsteriskTestCall.razor.cs NEW Test call code-behind โ”‚ โ”œโ”€โ”€ Guide.razor NEW Developer documentation page โ”‚ โ”œโ”€โ”€ InComDBVoiceChat.razor MODIFIED Added call monitor panel โ”‚ โ””โ”€โ”€ InComDBVoiceChat.razor.cs MODIFIED Added monitor logic โ”œโ”€โ”€ Models/ โ”‚ โ””โ”€โ”€ ConversationModels.cs NEW Conversation storage entities โ”œโ”€โ”€ Data/ โ”‚ โ””โ”€โ”€ conversation_tables_migration.sql NEW DB migration (4 tables, seed data) โ”œโ”€โ”€ Program.cs MODIFIED DI registrations โ”œโ”€โ”€ appsettings.json MODIFIED Asterisk config section โ””โ”€โ”€ wwwroot/ โ””โ”€โ”€ asterisk-integration-architecture.jpg NEW Architecture diagram

4. AudioSocket Protocol

AudioSocket is a lightweight TCP-based protocol created specifically for Asterisk. It is NOT a physical socket โ€” it's a simple binary framing protocol for streaming raw audio over a standard TCP connection.

Frame Format

Every AudioSocket frame follows this structure:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Type (1B)   โ”‚  Payload Length (3B)   โ”‚  Payload (N bytes)   โ”‚
โ”‚  0x00/10/01  โ”‚  Big-endian uint24     โ”‚  Raw data            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Frame Types

Type ByteNameDirectionPayload
0x00 UUID Asterisk โ†’ App 36-byte ASCII UUID identifying the call channel
0x10 Audio Bidirectional Raw PCM audio: signed linear 16-bit, 8kHz mono (slin16)
0x01 Hangup Asterisk โ†’ App Empty (length = 0). Call has ended.
0xFF Error Either Error description (rarely used)

Audio Format

PropertyValue
EncodingSigned Linear PCM (slin16)
Sample Rate8,000 Hz (8 kHz)
Bit Depth16-bit (2 bytes per sample)
ChannelsMono (1 channel)
Byte Rate16,000 bytes/second
Frame SizeTypically 320 bytes (20ms of audio)

Reading Frames in C#

// Read the 4-byte header
var header = new byte[4];
await ReadExactAsync(stream, header, 0, 4, ct);

byte frameType = header[0];
int payloadLength = (header[1] << 16) | (header[2] << 8) | header[3];

// Read the payload
var payload = new byte[payloadLength];
if (payloadLength > 0)
    await ReadExactAsync(stream, payload, 0, payloadLength, ct);

switch (frameType)
{
    case 0x00: // UUID โ€” identify the call
        var uuid = Encoding.ASCII.GetString(payload).Trim('\0');
        break;
    case 0x10: // Audio โ€” raw PCM data
        ProcessAudioData(payload);
        break;
    case 0x01: // Hangup โ€” call ended
        break;
}

Writing Frames in C#

// Build an audio response frame
var header = new byte[4];
header[0] = 0x10; // TypeAudio
header[1] = (byte)((audioData.Length >> 16) & 0xFF);
header[2] = (byte)((audioData.Length >> 8) & 0xFF);
header[3] = (byte)(audioData.Length & 0xFF);

await stream.WriteAsync(header, ct);
await stream.WriteAsync(audioData, ct);
await stream.FlushAsync(ct);

Silence Detection

// Check if a PCM frame is silent (all samples within ยฑ200)
private static bool IsSilentFrame(byte[] frame)
{
    const short silenceThreshold = 200;
    for (int i = 0; i + 1 < frame.Length; i += 2)
    {
        var sample = (short)(frame[i] | (frame[i + 1] << 8));
        if (Math.Abs(sample) > silenceThreshold) return false;
    }
    return true;
}
๐Ÿ“Š Buffer Thresholds (AudioSocketListener.cs)
  • AudioBufferThreshold = 48000 bytes = ~3 seconds of audio (8kHz ร— 2 bytes ร— 3s)
  • SilenceFrameThreshold = 50 frames = ~1.6 seconds of silence (at 320-byte frames)
  • Audio is processed when EITHER the buffer threshold is reached OR silence is detected after receiving audio

5. ARI (Asterisk REST Interface)

ARI provides two interfaces for call control:

5.1 WebSocket โ€” Real-time Events

The AsteriskAriService connects to the ARI WebSocket to receive real-time call events.

Connection URL

ws://ASTERISK_HOST:8088/ari/events?api_key=USERNAME:PASSWORD&app=STASIS_APP_NAME

Events We Handle

EventWhenAction
StasisStart New call enters our Stasis app Create session โ†’ Answer โ†’ Create bridge โ†’ Create AudioSocket channel
StasisEnd Call leaves our Stasis app End session โ†’ Cleanup pipeline
ChannelDestroyed Channel is destroyed End session โ†’ Cleanup pipeline
ChannelHangupRequest Caller hangs up Delete bridge โ†’ End session โ†’ Hangup channel
ChannelStateChange Channel state changes Log only

5.2 REST API โ€” Call Control

ARI also provides a REST API (same host:port) for actively controlling calls. Authentication is HTTP Basic with the ARI username/password.

REST Calls We Make

MethodEndpointPurpose
POST /ari/channels/{id}/answer Answer an incoming call
POST /ari/bridges?type=mixing Create a mixing bridge
POST /ari/bridges/{id}/addChannel Add a channel to a bridge
POST /ari/channels/externalMedia Create an AudioSocket external media channel
DELETE /ari/channels/{id} Hang up a channel
DELETE /ari/bridges/{id} Delete a bridge

ExternalMedia Channel Creation (AudioSocket)

POST /ari/channels/externalMedia
    ?app=incomdb-voice-ai
    &external_host=127.0.0.1:9092
    &format=slin16
    &encapsulation=audiosocket
    &transport=tcp
    &connection_type=client

This tells Asterisk: "Create a channel that connects via TCP to our AudioSocket server at port 9092, sending/receiving audio in slin16 format using the AudioSocket protocol."

6. Speech Pipeline

The speech pipeline is the core processing engine. It takes raw PCM audio from the caller, converts it to text, processes through AI, and returns spoken audio.

Full Pipeline (ProcessAudioAsync)

Raw PCM Audio (8kHz slin16)
โ†“
Google Cloud Speech V2 (SpeechToTextAsync)
โ†“ transcript text
Semantic Kernel + OpenAI (ProcessTextAsync)
โ†“ AI response text [may invoke DB plugin via function calling]
Google Cloud TTS (TextToSpeechAsync)
โ†“
MP3 Audio Bytes

Per-Session Isolation

Each active call gets its own:

  • Semantic Kernel instance โ€” separate AI context
  • ChatHistory โ€” conversation memory isolated per call
  • DB Plugin instance โ€” database query tool for function calling

These are stored in ConcurrentDictionary<string, Kernel> and ConcurrentDictionary<string, ChatHistory>, keyed by session ID.

AI Function Calling

The AI uses ToolCallBehavior.AutoInvokeKernelFunctions to automatically call database query functions when needed. Two functions are registered:

FunctionDescriptionUse Case
look_up_customer_info Query customer data from MySQL "เฆ†เฆฎเฆพเฆฐ เฆฌเฆฟเฆฒ เฆ•เฆค?" โ†’ Generates SQL โ†’ Queries DB โ†’ Returns results
submit_complaint Insert a support ticket "เฆ†เฆฎเฆพเฆฐ เฆ‡เฆจเงเฆŸเฆพเฆฐเฆจเง‡เฆŸ เฆ•เฆพเฆœ เฆ•เฆฐเฆ›เง‡ เฆจเฆพ" โ†’ Inserts into ticket table
โš ๏ธ Audio Format Mismatch Google TTS returns MP3 audio, but AudioSocket expects slin16 PCM. Currently, the MP3 is sent directly through AudioSocket, which works for the browser test page but will NOT work with a real Asterisk server. You will need to add MP3โ†’PCM transcoding (e.g., using NAudio or FFmpeg) for production Asterisk deployment.

7. Session Management

CallSession Model

public class CallSession
{
    public string SessionId { get; set; }     // Unique ID (GUID)
    public string ChannelId { get; set; }     // Asterisk channel ID
    public string CallerNumber { get; set; }  // Phone number
    public string BridgeId { get; set; }      // Asterisk bridge ID
    public CallState State { get; set; }      // Current state
    public DateTime StartTime { get; set; }
    public DateTime? EndTime { get; set; }
    public List<CallConversationTurn> Turns { get; set; }
    public string? LastTranscript { get; set; }
    public string? LastAiResponse { get; set; }
}

Call State Machine

Ringing โ†’ Answered โ†’ Streaming โ‡† Processing โ†’ Speaking โ†’ Streaming
                                                               โ†“
                                                             Ended
StateMeaning
RingingCall created, not yet answered
AnsweredCall answered by ARI, pipeline initializing
StreamingListening to caller โ€” receiving audio frames
ProcessingAudio sent to STT+AI โ€” waiting for response
SpeakingPlaying TTS response audio back to caller
EndedCall terminated
ErrorAn error occurred during the call

Thread Safety

CallSessionManager uses ConcurrentDictionary for all session storage. Two dictionaries are maintained:

  • _sessions โ€” maps Session ID โ†’ CallSession
  • _channelToSession โ€” maps Asterisk Channel ID โ†’ Session ID

The OnSessionsChanged event is fired on every state change, allowing the Blazor UI to update in real-time.

8. AudioSocketListener Service

Type: BackgroundService (hosted service)

File: Services/AudioSocketListener.cs

Port: Configured via AsteriskSettings.AudioSocketPort (default: 9092)

What It Does

  1. Starts a TCP listener on the configured port
  2. Accepts incoming connections from Asterisk (one per call)
  3. Reads the UUID frame to identify which session this connection belongs to
  4. Sends a greeting audio frame back immediately
  5. Reads audio frames, accumulates PCM data in a buffer
  6. When silence is detected OR buffer threshold is reached โ†’ sends to SpeechPipelineService.ProcessAudioAsync()
  7. Sends the response audio frame back to Asterisk
  8. On hangup frame โ†’ cleans up session

Connection Lifecycle

1. Asterisk opens TCP connection to our port 9092
2. Asterisk sends: [0x00][UUID] โ€” identifies the call
3. We send: [0x10][greeting MP3] โ€” greeting audio
4. Loop:
   a. Asterisk sends: [0x10][PCM audio frames]
   b. We accumulate until silence or threshold
   c. We process: PCM โ†’ STT โ†’ AI โ†’ TTS
   d. We send: [0x10][response MP3]
5. Asterisk sends: [0x01] โ€” hangup
6. We cleanup and close connection

Concurrency

Each TCP connection is handled in a separate Task.Run(), so multiple concurrent calls are supported. The ConcurrentDictionary in CallSessionManager ensures thread-safe access to session data.

9. AsteriskAriService

Type: BackgroundService (hosted service)

File: Services/AsteriskAriService.cs

Also registered as: Singleton (for IsConnected property access)

What It Does

  1. Connects to the Asterisk ARI WebSocket
  2. Listens for call events (StasisStart, StasisEnd, etc.)
  3. On new call: creates session โ†’ answers โ†’ creates bridge โ†’ creates AudioSocket channel
  4. On hangup: cleans up bridge, session, and pipeline
  5. Auto-reconnects on connection failure (5-second delay)

StasisStart Handler โ€” The Main Entry Point

private async Task HandleStasisStart(AriEvent ariEvent, CancellationToken ct)
{
    // 1. Create a session
    var session = _sessionManager.CreateSession(channelId, callerNumber);

    // 2. Initialize AI pipeline
    _pipeline.InitializeSession(session.SessionId, callerNumber);

    // 3. Answer the call
    await AnswerChannelAsync(channelId, ct);

    // 4. Create a mixing bridge
    var bridgeId = await CreateBridgeAsync(session.SessionId, ct);

    // 5. Add caller to bridge
    await AddChannelToBridgeAsync(bridgeId, channelId, ct);

    // 6. Create AudioSocket channel (connects to our TCP server)
    var audioSocketChannelId = await CreateAudioSocketChannelAsync(channelId, ct);

    // 7. Add AudioSocket channel to bridge
    await AddChannelToBridgeAsync(bridgeId, audioSocketChannelId, ct);

    // Now audio flows: Caller โ†” Bridge โ†” AudioSocket โ†” Our TCP Server
}
๐Ÿ’ก Bridge Concept An Asterisk bridge connects multiple channels together so audio flows between them. We create a mixing bridge with two channels:
  1. The caller's SIP/PSTN channel
  2. An AudioSocket externalMedia channel pointed at our TCP server
This means the caller's voice goes through the bridge to our AudioSocket listener, and our responses go back through the bridge to the caller.

10. SpeechPipelineService

Type: Singleton

File: Services/SpeechPipelineService.cs

Public Methods

MethodInputOutputDescription
InitializeSession() sessionId, callerNumber void Creates Kernel + ChatHistory for a new call
SpeechToTextAsync() byte[] pcmAudio string transcript PCM โ†’ Google Cloud Speech V2 โ†’ text
ProcessTextAsync() sessionId, userText string aiResponse Text โ†’ Semantic Kernel + OpenAI โ†’ AI response
TextToSpeechAsync() text, languageCode byte[] mp3Audio Text โ†’ Google Cloud TTS โ†’ MP3 bytes
ProcessAudioAsync() sessionId, byte[] pcmAudio byte[] mp3Audio Full pipeline: STT โ†’ AI โ†’ TTS
GetGreetingAudioAsync() sessionId byte[] mp3Audio Generates the welcome greeting audio
CleanupSession() sessionId void Removes Kernel + ChatHistory from memory

System Prompt

The AI is configured as an InComIT customer care representative. The system prompt:

  • Sets the AI identity as InComIT service assistant
  • Enforces Bengali language responses only
  • Includes the caller's phone number for automatic lookup
  • Instructs to use look_up_customer_info and submit_complaint tools
  • Limits response length to 2-3 sentences (for phone TTS)

11. CallSessionManager

Type: Singleton

File: Services/CallSessionManager.cs

Key Methods

MethodDescription
CreateSession(channelId, callerNumber)Creates and stores a new CallSession
GetSession(sessionId)Retrieves session by ID
GetSessionByChannel(channelId)Looks up session by Asterisk channel ID
UpdateSessionState(sessionId, state)Changes state, fires OnSessionsChanged
AddConversationTurn(sessionId, user, ai)Adds a conversation turn
EndSession(sessionId)Marks ended, removes channel mapping
GetActiveSessions()Returns non-ended sessions (for UI)
ActiveCallCountCount of active calls (property)

Events

// Subscribe to session changes in Blazor components:
SessionManager.OnSessionsChanged += () => InvokeAsync(StateHasChanged);

12. AsteriskDbQueryPlugin

File: Services/AsteriskDbQueryPlugin.cs

A Semantic Kernel plugin that allows the AI to query the IncomDB MySQL database during phone calls. It works through AI function calling โ€” when the AI determines it needs customer data, it automatically invokes these functions.

How SQL Generation Works

  1. AI calls look_up_customer_info("check bill for 01711234567")
  2. Plugin sends the question + database schema to a separate OpenAI call
  3. OpenAI generates a SQL query (e.g., SELECT * FROM reign_invoice WHERE...)
  4. Plugin validates the SQL (safety check โ€” no DROP, ALTER, etc.)
  5. Plugin executes the SQL against MySQL
  6. Plugin formats results and returns to the AI
  7. AI uses the results to answer the caller in Bengali
๐Ÿ”ด Security Note The SQL generation relies on AI-generated queries with basic safety checks. For production:
  • Use parameterized queries or a query builder
  • Create a read-only MySQL user for the plugin
  • Whitelist specific tables and columns
  • Add query auditing/logging

13. REST API Endpoints

Base URL: /api/asterisk

File: Controllers/AsteriskController.cs

MethodEndpointDescriptionBody
POST /session/start Manually start a call session { "phoneNumber": "017...", "channelId": "optional" }
POST /session/end End a session { "sessionId": "..." }
POST /session/chat Send text to AI (testing without audio) { "sessionId": "...", "text": "..." }
GET /sessions/active List all active call sessions โ€”
GET /sessions List all sessions (including ended) โ€”
GET /session/{id} Get detailed session info with conversation โ€”
GET /status ARI connection status + call counts โ€”

Example: curl Testing

# Start a session
curl -X POST https://incom.zam.asia/api/asterisk/session/start \
  -H "Content-Type: application/json" \
  -d '{"phoneNumber":"01711234567"}'

# Chat with the AI
curl -X POST https://incom.zam.asia/api/asterisk/session/chat \
  -H "Content-Type: application/json" \
  -d '{"sessionId":"SESSION_ID_HERE","text":"เฆ†เฆฎเฆพเฆฐ เฆฌเฆฟเฆฒ เฆ•เฆค?"}'

# Check status
curl https://incom.zam.asia/api/asterisk/status

14. Test Call Page

Route: /test-call

Files: AsteriskTestCall.razor + AsteriskTestCall.razor.cs

The test page simulates a phone call entirely in the browser, allowing you to test the full voice AI pipeline without Asterisk hardware.

Two Modes

ModeHow It WorksPipeline
๐ŸŒ Browser STT Uses the Web Speech API in the browser for speech recognition Browser STT โ†’ ProcessTextAsync โ†’ Google TTS โ†’ Browser Audio
๐Ÿ”Œ AudioSocket Captures raw PCM from mic, packages into AudioSocket frames, sends via WebSocket Mic PCM โ†’ AudioSocket WS โ†’ Server STT โ†’ AI โ†’ TTS โ†’ AudioSocket WS โ†’ Speaker
โœ… AudioSocket Mode is the True Test AudioSocket mode exercises the exact same server-side code path that real Asterisk calls use. The only difference is the transport (WebSocket instead of raw TCP).

Test Page Features

  • ๐Ÿ“ž Call controls โ€” start/end call with phone number and language selection
  • ๐ŸŽค Voice input โ€” speak naturally using your microphone
  • โŒจ๏ธ Text input โ€” type messages as alternative to voice
  • ๐Ÿ’ฌ Conversation panel โ€” chat bubbles with timestamps
  • ๐Ÿ”Š Audio playback โ€” hear AI responses with replay button
  • ๐Ÿ“Š AudioSocket stats โ€” frames/bytes sent/received, connection status
  • ๐Ÿ“œ Event log โ€” real-time log of all operations
  • โฑ๏ธ Call timer โ€” live duration counter
  • ๐ŸŽ›๏ธ Voice visualizer โ€” animated bars showing listen/speak state

15. Call Monitor Panel

The main voice chat page (/ or /calls) includes a slide-out call monitor panel that shows all active Asterisk calls in real-time.

Features

  • ARI connection status indicator (green/red dot)
  • Active call count
  • Per-session details: caller number, state, duration, last transcript
  • Auto-refresh via 2-second timer when monitor is open
  • Real-time updates via OnSessionsChanged event

16. Installing Asterisk

Ubuntu/Debian

# Install Asterisk (Ubuntu 22.04+)
sudo apt update
sudo apt install -y asterisk

# Or build from source (for AudioSocket support โ€” recommended)
cd /usr/src
sudo wget https://downloads.asterisk.org/pub/telephony/asterisk/asterisk-20-current.tar.gz
sudo tar xzf asterisk-20-current.tar.gz
cd asterisk-20.*/
sudo contrib/scripts/install_prereq install
sudo ./configure --with-jansson-bundled
sudo make menuselect  # Enable res_audiosocket and chan_audiosocket
sudo make -j$(nproc)
sudo make install
sudo make samples
sudo make config
โš ๏ธ AudioSocket Module Required AudioSocket support requires the res_audiosocket and app_audiosocket modules. These are available in Asterisk 16+ but may need to be explicitly enabled during compilation (make menuselect).

Verify Installation

# Check Asterisk is running
sudo systemctl status asterisk

# Connect to Asterisk CLI
sudo asterisk -rvvv

# Verify AudioSocket module is loaded
CLI> module show like audiosocket
# Should show: res_audiosocket.so and app_audiosocket.so

17. Asterisk Configuration Files

17.1 ARI Configuration โ€” /etc/asterisk/ari.conf

[general]
enabled = yes
pretty = yes
allowed_origins = *

[asterisk]
type = user
read_only = no
password = asterisk

17.2 HTTP Server โ€” /etc/asterisk/http.conf

[general]
enabled = yes
bindaddr = 0.0.0.0
bindport = 8088

; For TLS (production):
; tlsenable = yes
; tlsbindaddr = 0.0.0.0:8089
; tlscertfile = /etc/asterisk/keys/asterisk.pem
; tlsprivatekey = /etc/asterisk/keys/asterisk.key

17.3 PJSIP Configuration โ€” /etc/asterisk/pjsip.conf

; Transport for SIP
[transport-udp]
type = transport
protocol = udp
bind = 0.0.0.0:5060

; Example SIP trunk (adjust for your provider)
[trunk-provider]
type = endpoint
context = from-trunk
disallow = all
allow = ulaw
allow = alaw
direct_media = no

17.4 Modules โ€” /etc/asterisk/modules.conf

[modules]
autoload = yes

; Ensure AudioSocket modules are loaded
load = res_audiosocket.so
load = app_audiosocket.so
load = res_ari.so
load = res_ari_channels.so
load = res_ari_bridges.so
load = res_stasis.so
load = res_http_websocket.so

18. Asterisk Dialplan

/etc/asterisk/extensions.conf

[from-trunk]
; Route incoming calls to our Stasis application
; When a call comes in, it enters the "incomdb-voice-ai" Stasis app
; which triggers a StasisStart event to our AsteriskAriService

exten => _X.,1,NoOp(Incoming call from ${CALLERID(num)} to ${EXTEN})
 same => n,Answer()
 same => n,Stasis(incomdb-voice-ai)
 same => n,Hangup()

; Alternative: Direct AudioSocket (without ARI bridge)
; This sends audio directly to our AudioSocket server
; exten => _X.,1,Answer()
; same => n,AudioSocket(127.0.0.1:9092,${CHANNEL(uniqueid)})
; same => n,Hangup()

[default]
exten => _X.,1,NoOp(Default context - routing to Stasis)
 same => n,Goto(from-trunk,${EXTEN},1)
๐Ÿ’ก Two Approaches
  • Stasis + ARI (recommended): The call enters Stasis(incomdb-voice-ai), our AriService receives the event, creates a bridge, and connects an AudioSocket channel. This gives us full call control.
  • Direct AudioSocket: The AudioSocket() dialplan application connects directly to our TCP server. Simpler but less control.

Reload After Changes

# In Asterisk CLI:
CLI> dialplan reload
CLI> module reload res_ari.so
CLI> ari show status

19. Application Configuration

appsettings.json โ€” Asterisk Section

{
  "Asterisk": {
    "Host": "localhost",        // Asterisk server hostname/IP
    "AriPort": 8088,            // ARI HTTP/WebSocket port
    "AriUsername": "asterisk",   // ARI username (from ari.conf)
    "AriPassword": "asterisk",   // ARI password (from ari.conf)
    "StasisApp": "incomdb-voice-ai",  // Stasis application name
    "AudioSocketPort": 9092,    // Port for our AudioSocket TCP server
    "UseTls": false             // Use wss:// and https:// for ARI
  }
}

Other Required Configuration

{
  "GoogleCloud": {
    "ProjectId": "your-project-id",
    "CredentialsPath": "credentials/service-account.json"
  },
  "OpenAI": {
    "ApiKey": "sk-...",
    "Model": "gpt-4o",
    "MaxTokens": 8192,
    "Temperature": 0.7,
    "TimeoutSeconds": 180
  },
  "ConnectionStrings": {
    "IncomDatabase": "Server=...;Database=incomdb;..."
  }
}

Environment Variables

Google Cloud credentials can also be set via environment variable:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"

20. Dependency Injection Registration

All Asterisk services are registered in Program.cs:

// === Asterisk Integration Services ===

// Configuration
builder.Services.Configure<AsteriskSettings>(
    builder.Configuration.GetSection("Asterisk"));

// Session tracking โ€” singleton for cross-service access
builder.Services.AddSingleton<CallSessionManager>();

// Speech pipeline โ€” singleton (manages per-session state internally)
builder.Services.AddSingleton<SpeechPipelineService>();

// ARI service โ€” singleton + background service
builder.Services.AddSingleton<AsteriskAriService>();
builder.Services.AddHostedService(sp => sp.GetRequiredService<AsteriskAriService>());

// AudioSocket TCP listener โ€” background service
builder.Services.AddHostedService<AudioSocketListener>();
๐Ÿ’ก Why is AsteriskAriService registered twice? AddSingleton makes it injectable (e.g., in AsteriskController to check IsConnected). AddHostedService using the service provider ensures the same instance runs as a background service.

Service Lifecycle

ServiceLifetimeStarts When
CallSessionManagerSingletonFirst injection
SpeechPipelineServiceSingletonFirst injection
AsteriskAriServiceSingleton + HostedServiceApp startup (auto)
AudioSocketListenerHostedServiceApp startup (auto)

21. Complete Call Flow

Here's what happens step-by-step when a real phone call comes in:

Phase 1: Call Setup (ARI)

1. Customer dials the ISP phone number
2. SIP trunk delivers call to Asterisk
3. Dialplan routes to Stasis(incomdb-voice-ai)
4. ARI WebSocket sends StasisStart event to AsteriskAriService
5. AsteriskAriService.HandleStasisStart():
   a. Creates CallSession via SessionManager
   b. Initializes SpeechPipeline (Kernel + ChatHistory + DB Plugin)
   c. Answers the channel via ARI REST
   d. Creates a mixing bridge via ARI REST
   e. Adds caller channel to bridge
   f. Creates AudioSocket externalMedia channel โ†’ Asterisk connects to our TCP:9092
   g. Adds AudioSocket channel to bridge
   h. Audio now flows: Caller โ†” Bridge โ†” AudioSocket โ†” Our TCP Server

Phase 2: AudioSocket Connection

6. Asterisk opens TCP connection to AudioSocketListener on port 9092
7. Asterisk sends UUID frame [0x00][channelId]
8. AudioSocketListener looks up session by channelId
9. AudioSocketListener requests greeting audio from Pipeline
10. Greeting MP3 sent back as audio frame [0x10][mp3Data]
11. Customer hears: "เฆ†เฆธเฆธเฆพเฆฒเฆพเฆฎเง เฆ†เฆฒเฆพเฆ‡เฆ•เงเฆฎ! InComIT Solution เฆ เฆ†เฆชเฆจเฆพเฆ•เง‡ เฆธเงเฆฌเฆพเฆ—เฆคเฆฎ..."

Phase 3: Conversation Loop

12. Customer speaks โ†’ Asterisk sends PCM audio frames [0x10][pcmData]
13. AudioSocketListener buffers PCM data
14. Silence detected (50 frames of silence OR 48KB buffer reached)
15. Pipeline.ProcessAudioAsync(sessionId, pcmData):
    a. SpeechToTextAsync โ†’ Google Cloud Speech V2 โ†’ "เฆ†เฆฎเฆพเฆฐ เฆฌเฆฟเฆฒ เฆ•เฆค?"
    b. ProcessTextAsync โ†’ Semantic Kernel + OpenAI:
       - AI determines it needs customer data
       - AI calls look_up_customer_info("check bill for 01711234567")
       - Plugin generates SQL, queries MySQL, returns results
       - AI formulates response: "เฆ†เฆชเฆจเฆพเฆฐ เฆฌเฆฟเฆฒเง‡เฆฐ เฆชเฆฐเฆฟเฆฎเฆพเฆฃ เงซเงฆเงฆ เฆŸเฆพเฆ•เฆพ..."
    c. TextToSpeechAsync โ†’ Google Cloud TTS โ†’ MP3 bytes
16. Response MP3 sent back as audio frame [0x10][mp3Data]
17. Customer hears the AI response
18. Repeat from step 12

Phase 4: Call End and Conversation Storage

19. Customer hangs up
20. Asterisk sends Hangup frame [0x01] to AudioSocket
21. ARI sends ChannelHangupRequest event
22. AsteriskAriService deletes bridge, hangs up channels
23. FinalizeAndCleanupSessionAsync() is called:
    a. Inserts conversation record into conversations table
    b. Inserts all message turns into conversation_messages table
    c. AI classifies the conversation:
       - Assigns category (Billing, Technical, General, etc.)
       - Assigns sub-category (Bill Inquiry, No Internet, etc.)
       - Generates Bengali + English summary
       - Determines resolution status
       - Analyzes customer sentiment
    d. Updates conversation record with classification
    e. Cleans up in-memory state (Kernel, ChatHistory)
24. SessionManager.EndSession() โ†’ marks session as Ended
25. TCP connection closes
๐Ÿ’ก Data is only saved when the conversation ends During the call, conversation turns are accumulated in memory only (via CallSessionManager.AddConversationTurn). No database writes happen until the call terminates. FinalizeAndCleanupSessionAsync is called at 7 end-of-call points: AudioSocket hangup, StasisEnd, ChannelDestroyed, HangupRequest, API EndSession, browser EndCall, and component DisposeAsync.

22. Troubleshooting

ARI Connection Fails

# Check Asterisk is running and HTTP is enabled
sudo asterisk -rvvv
CLI> http show status
CLI> ari show status

# Test ARI from command line
curl -u asterisk:asterisk http://ASTERISK_HOST:8088/ari/asterisk/info

# Check firewall
sudo ufw allow 8088/tcp

AudioSocket Connection Fails

# Check our TCP server is listening
netstat -tlnp | grep 9092

# Check Asterisk can reach our server
# From Asterisk host:
telnet BLAZOR_APP_HOST 9092

# Check AudioSocket module is loaded in Asterisk
CLI> module show like audiosocket

# Check firewall
sudo ufw allow 9092/tcp

No Audio / Empty Transcripts

  • Verify Google Cloud credentials are valid and have Speech/TTS APIs enabled
  • Check the audio format โ€” Asterisk should send slin16 (8kHz, 16-bit, mono)
  • Check the AudioBufferThreshold โ€” if too high, short phrases may not trigger processing
  • Check the SilenceFrameThreshold โ€” if too low, speech may be cut off
  • Look at server logs for STT errors

AI Not Calling Database Functions

  • Verify Data/incomdb_schema.txt exists and contains the database schema
  • Verify IncomDatabase connection string is set in appsettings.json
  • Check that ToolCallBehavior.AutoInvokeKernelFunctions is set in ProcessTextAsync
  • Check OpenAI model supports function calling (gpt-4o, gpt-4-turbo, etc.)

Useful Log Filters

# In appsettings.json, enable detailed logging:
{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "SpeechToTextWithGoogle.Services": "Debug"
    }
  }
}

# Key log messages to watch for:
"AudioSocket listener started on port {Port}"
"AudioSocket connection accepted from {Remote}"
"AudioSocket UUID received: {Uuid}"
"Processing {Bytes} bytes of audio for session {SessionId}"
"STT result: {Text}"
"AI response for session {SessionId}: {Response}"
"TTS generated {Bytes} bytes of audio"
"Connected to ARI WebSocket"
"New call: Channel={ChannelId}, Caller={CallerNumber}"

23. Known Issues & TODOs

Critical for Production

IssueImpactSolution
๐Ÿ”ด MP3โ†’PCM transcoding missing Google TTS returns MP3, but AudioSocket expects slin16 PCM. Audio won't play on real Asterisk. Add transcoding in AudioSocketListener.SendAudioFrameAsync() using NAudio or FFmpeg. Example: decode MP3 to PCM, resample to 8kHz, convert to 16-bit signed linear.
๐Ÿ”ด SQL injection risk in DB Plugin AI-generated SQL is executed directly. Malicious prompts could cause data issues. Use parameterized queries, a read-only DB user, query auditing, and table whitelisting.
๐ŸŸก No authentication on REST API Anyone can start/end sessions via /api/asterisk/* Add API key authentication or JWT tokens.

Improvements

EnhancementDescription
Streaming STT Use Google Cloud Speech streaming recognition instead of batch. Would reduce latency significantly โ€” no need to buffer 3 seconds of audio.
Barge-in support Allow caller to interrupt the AI while it's speaking. Would require cancelling TTS playback when new audio is detected.
Call recording Save call audio and transcripts for quality monitoring. Could use Asterisk MixMonitor or capture at the AudioSocket level.
Multi-language detection Auto-detect caller's language from first few seconds of speech. Google STT supports language hints.
Call transfer Transfer to a human agent when AI can't handle the request. Use ARI to redirect the channel to a different extension.
Health checks Add /health endpoint checking ARI connection, AudioSocket port, Google Cloud credentials, and OpenAI API key.
Metrics/dashboard Track call volume, average duration, STT accuracy, AI response time, TTS latency, and customer satisfaction.

MP3โ†’PCM Transcoding Example (TODO)

// Using NAudio (add NuGet package: NAudio)
private byte[] ConvertMp3ToPcm8kHz(byte[] mp3Data)
{
    using var mp3Stream = new MemoryStream(mp3Data);
    using var mp3Reader = new Mp3FileReader(mp3Stream);
    using var resampler = new MediaFoundationResampler(mp3Reader,
        new WaveFormat(8000, 16, 1)); // 8kHz, 16-bit, mono

    using var outputStream = new MemoryStream();
    var buffer = new byte[4096];
    int bytesRead;
    while ((bytesRead = resampler.Read(buffer, 0, buffer.Length)) > 0)
    {
        outputStream.Write(buffer, 0, bytesRead);
    }
    return outputStream.ToArray();
}

24. How the API Connects with the System

This is the most important section for understanding how Asterisk talks to the Blazor application. There are two parallel communication paths, plus a supplementary REST API for monitoring and testing.

Path 1: Call Control (ARI WebSocket + REST)

When the application starts, AsteriskAriService (a BackgroundService) automatically opens a persistent WebSocket connection to the Asterisk server:

ws://ASTERISK_HOST:8088/ari/events?api_key=asterisk:asterisk&app=incomdb-voice-ai

This is configured in appsettings.json under the "Asterisk" section. When a phone call arrives at Asterisk, the dialplan routes it to the Stasis app incomdb-voice-ai, which sends a StasisStart event through this WebSocket. The HandleStasisStart() method then:

  1. Creates a CallSession via CallSessionManager
  2. Calls InitializeSessionAsync which looks up the caller phone number in reign_users and sets up the AI (Kernel + ChatHistory + DB plugin)
  3. Answers the call via ARI REST: POST /ari/channels/{id}/answer
  4. Creates a mixing bridge: POST /ari/bridges?type=mixing
  5. Adds the caller channel to the bridge
  6. Creates an AudioSocket external media channel: POST /ari/channels/externalMedia pointing to 127.0.0.1:9092
  7. Adds the AudioSocket channel to the bridge

All REST calls go to http://ASTERISK_HOST:8088/ari/... with HTTP Basic auth.

Path 2: Audio Streaming (AudioSocket TCP)

AudioSocketListener (another BackgroundService) runs a TCP server on port 9092. After step 6 above, Asterisk opens a TCP connection to this port. The flow is:

  1. Asterisk sends a UUID frame [0x00] identifying the call channel
  2. App sends back a greeting audio frame [0x10] -- the customer hears the welcome message
  3. Loop: Asterisk sends PCM audio [0x10] frames of the caller speaking. App buffers them, detects silence, runs STT then AI then TTS, and sends the response audio back as [0x10] frames
  4. When the caller hangs up, Asterisk sends [0x01] hangup frame. App calls FinalizeAndCleanupSessionAsync which saves the conversation to the database

The REST API (AsteriskController) -- Monitoring and Testing

The AsteriskController at /api/asterisk/ is a supplementary layer. It is NOT the main entry point for real Asterisk calls. Its purpose is:

EndpointPurposeWhen to use
POST /session/start Manually create a session for testing Testing without real Asterisk. Bypasses PBX, simulates a call.
POST /session/chat Send text to the AI without audio Testing the AI pipeline (STT bypassed, text goes directly to Semantic Kernel).
POST /session/end Manually end a session Cleanup after manual testing or force-ending a stuck session.
GET /sessions/active List active calls Operations dashboard. See what calls are in progress.
GET /session/{id} Get session details with full conversation Debugging a specific call. View all turns and timestamps.
GET /status ARI connection status + call counts Health monitoring. Check if ARI is connected.
โš ๏ธ For real phone calls, the REST API is NOT used Real calls flow entirely through the automatic path: AsteriskAriService (WebSocket events) and AudioSocketListener (TCP audio). The REST API is only for manual testing, monitoring, and operations. You do not need to call any REST endpoint to handle real phone calls.

Connection Requirements

For this to work with a real Asterisk server:

RequirementDetailsConfig location
Asterisk ARI enabled Port 8088 open, HTTP and WebSocket enabled /etc/asterisk/ari.conf and http.conf
AudioSocket modules loaded res_audiosocket.so and app_audiosocket.so /etc/asterisk/modules.conf
Blazor app reachable from Asterisk Port 9092 TCP must be open from Asterisk to Blazor app Firewall rules
Asterisk reachable from Blazor app Port 8088 must be open from Blazor app to Asterisk Firewall rules
Host configuration Set Asterisk.Host to Asterisk server IP appsettings.json
๐Ÿ”ด Important: external_host is hardcoded In AsteriskAriService.CreateAudioSocketChannelAsync(), the external_host parameter is set to 127.0.0.1:9092. This means Asterisk and the Blazor app must run on the same machine. If they are on different machines, update line 306 to use the Blazor app's actual IP address that Asterisk can reach.

Connection Sequence Diagram

App Startup:
  AsteriskAriService โ”€โ”€WebSocketโ”€โ”€โ–ถ Asterisk:8088  (persistent connection)
  AudioSocketListener listens on TCP :9092         (waiting for connections)

Incoming Call:
  Phone โ”€โ”€SIPโ”€โ”€โ–ถ Asterisk
  Asterisk โ”€โ”€StasisStartโ”€โ”€โ–ถ AsteriskAriService      (via WebSocket)
  AsteriskAriService โ”€โ”€RESTโ”€โ”€โ–ถ Asterisk:8088         (answer, bridge, externalMedia)
  Asterisk โ”€โ”€TCPโ”€โ”€โ–ถ AudioSocketListener:9092         (audio starts flowing)

Conversation:
  Caller audio โ”€โ”€TCPโ”€โ”€โ–ถ AudioSocketListener
  AudioSocketListener โ”€โ”€โ–ถ Google STT โ”€โ”€โ–ถ OpenAI โ”€โ”€โ–ถ Google TTS
  Response audio โ”€โ”€TCPโ”€โ”€โ–ถ Asterisk โ”€โ”€โ–ถ Caller

Hangup:
  Asterisk โ”€โ”€Hangup frameโ”€โ”€โ–ถ AudioSocketListener    (TCP)
  Asterisk โ”€โ”€HangupRequestโ”€โ”€โ–ถ AsteriskAriService     (WebSocket)
  FinalizeAndCleanupSessionAsync() saves conversation to DB

25. Conversation Storage

Every completed conversation is automatically saved to the database with AI-powered classification. This happens in FinalizeAndCleanupSessionAsync when the call ends.

Database Tables

TablePurpose
conversation_categories 6 top-level categories: Billing, Technical, Account, Package, Complaint, General
conversation_sub_categories 22 sub-categories under the main categories (e.g., Bill Inquiry, No Internet, Speed Issue)
conversations One record per completed call. Stores session ID, caller info, category, summaries, sentiment, resolution status.
conversation_messages Individual conversation turns (user speech and AI response, with timestamps).

AI Classification Pipeline

When a call ends, ConversationStorageService.SaveConversationAsync() runs:

  1. Insert a new record in conversations with status "active"
  2. Insert all message turns into conversation_messages
  3. Send the full conversation to OpenAI for classification. The AI returns:
    • Category name (e.g., "Billing")
    • Sub-category name (e.g., "Bill Inquiry")
    • Bengali summary (for local staff)
    • English summary (for management/reports)
    • Resolution status: resolved, unresolved, escalated, or info_provided
    • Customer sentiment: positive, neutral, negative, or frustrated
  4. Update the conversation record with the classification data and mark as "completed"

Key Files

FileRole
Services/ConversationStorageService.cs Orchestrates the save pipeline: insert, classify, update
Models/ConversationModels.cs Entity classes: ConversationCategory, ConversationSubCategory, ConversationRecord, ConversationMessage, ConversationClassification
Data/conversation_tables_migration.sql CREATE TABLE statements + seed data (6 categories, 22 sub-categories) + 4 views

Database Views

Four convenience views are created by the migration:

ViewShows
v_conversations_with_categoriesConversations joined with category/sub-category names
v_conversation_fullFull conversation detail including all message turns
v_conversation_statsStatistics: total count, avg turns, avg duration, per-category breakdown
v_recent_conversationsLast 50 conversations for quick dashboard view


๐Ÿ“ž InComIT Solution โ€” Voice AI Customer Support System
Built with .NET 10, Blazor Server, Semantic Kernel, Google Cloud, & Asterisk PBX

For questions, contact the development team or open an issue on GitHub.