Twilio Phone Calls with Node.js: AI Voice Agent Part 1 (2026 Tutorial)

I am building an AI that makes real phone calls. Automated sales calls. Appointment reminders. Voice agents that can hold a conversation. Part 1 of the series sets up the phone side: Twilio, a Node.js server, an outbound call, and a real-time audio stream we can hand off to an AI in the next video.
This is the foundation. Part 2 connects ElevenLabs so the AI can actually talk back. For now, we focus on the plumbing.
What This Video Covers
- Create a Twilio account (or use an existing one)
- Buy a phone number
- Set up a Node.js + Express server
- Make the first outbound call (your phone will ring)
- Stream the call audio into your server in real time
Step 1: Create a Twilio Account
Sign up at twilio.com. New accounts get free trial credit (a few dollars), enough to test outbound calls. You will need to verify your own phone number on the trial; outbound calls during trial mode can only go to verified numbers. Upgrade to a paid account once you are ready to call any number.
From the Twilio Console, grab these three values and save them in a .env file:
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1xxxxxxxxxxStep 2: Buy a Phone Number
In the Twilio Console go to Phone Numbers > Buy a Number. Pick a country, pick a number with voice capabilities, click Buy. Paid accounts can buy as many as you need. Free trial accounts get one number.
Save that number to your .env as TWILIO_PHONE_NUMBER.
Step 3: Set Up the Node.js Server
mkdir twilio-voice-agent
cd twilio-voice-agent
npm init -y
npm install express twilio ws dotenvCreate server.js:
require("dotenv").config();
const express = require("express");
const twilio = require("twilio");
const { WebSocketServer } = require("ws");
const http = require("http");
const app = express();
app.use(express.urlencoded({ extended: true }));
const client = twilio(process.env.TWILIO_ACCOUNT_SID, process.env.TWILIO_AUTH_TOKEN);
// 1. Endpoint to make an outbound call
app.post("/call", async (req, res) => {
const call = await client.calls.create({
url: `${process.env.PUBLIC_URL}/twiml`,
to: req.body.to,
from: process.env.TWILIO_PHONE_NUMBER
});
res.json({ sid: call.sid });
});
// 2. TwiML endpoint Twilio hits when the call is answered
app.post("/twiml", (req, res) => {
res.type("text/xml");
res.send(`
<Response>
<Connect>
<Stream url="wss://${req.hostname}/audio" />
</Connect>
</Response>
`);
});
const server = http.createServer(app);
// 3. WebSocket server to receive the live audio stream
const wss = new WebSocketServer({ server, path: "/audio" });
wss.on("connection", (ws) => {
console.log("Twilio stream connected");
ws.on("message", (msg) => {
const data = JSON.parse(msg);
if (data.event === "media") {
// data.media.payload contains base64 PCM audio. This is where part 2 will plug in ElevenLabs.
}
});
});
server.listen(3000, () => console.log("Listening on 3000"));Step 4: Expose the Server to Twilio (ngrok)
Twilio needs a public URL to hit when calls happen. Use ngrok for development:
brew install ngrok
ngrok http 3000Copy the https://xxxxx.ngrok-free.app URL ngrok prints. Add it to your .env:
PUBLIC_URL=https://xxxxx.ngrok-free.appRestart the Node server so it picks up the new env var.
Step 5: Make Your First Outbound Call
From another terminal, trigger the call:
curl -X POST http://localhost:3000/call \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "to=+1xxxxxxxxxx"Use your verified phone number as the to value. Your phone will ring. When you answer, you will hear silence (we have not added any audio yet), but the Twilio stream connects and your Node server logs "Twilio stream connected." That confirms the pipeline works end to end.
What the Audio Stream Contains
Each message from Twilio over the WebSocket is JSON. The interesting one has event: "media" and a base64-encoded payload of mu-law audio at 8kHz. That is what we will hand to ElevenLabs in Part 2 so the AI can listen and respond.
What to Watch Out For
- Trial account limits. Outbound calls only to verified numbers. Upgrade to call anyone.
- ngrok URLs change. Free ngrok gives you a new URL each session unless you set up a fixed subdomain. Re-set
PUBLIC_URLand restart your server every time. - Twilio charges per minute. Set a budget alert in the console so a runaway loop does not drain your balance.
- TwiML is XML, indentation matters less than tag correctness. The
<Connect>and<Stream>tags above are the minimum needed for bidirectional audio.
What Is Next
Part 2 takes the audio stream from this server and hands it to ElevenLabs Conversational AI. The AI listens, generates a spoken response, and streams the audio back through the same WebSocket so the person on the call hears it. That is when the real magic happens.
Part 2 Is Here
ElevenLabs + Twilio: Create an AI That Responds in Real-Time (Part 2)
Subscribe
Subscribe to AyyazTech if you want to follow the full series. Voice agents are one of the most underrated AI use cases right now.