Automatic Voice→Text Context: How Your MCP Tools Know Which Phone Call They're Serving

You've built a voice AI receptionist. A caller books an appointment, and your voice agent delegates the actual scheduling work to a text agent with MCP tools. But here's the question: how does your booking system know which phone call triggered that booking?

On Universal API, the answer is: automatically. No extra code, no manual ID passing, no configuration.

The Problem: Context Loss in Multi-Agent Systems

Multi-agent delegation is powerful — your voice agent handles the conversation while specialized text agents handle complex tool operations. But it creates a context gap:

📞 Caller: "I'd like to book a haircut for Saturday"
  → Voice Agent processes speech
    → Text Agent checks availability + creates booking
      → MCP Server writes to database
        → ❓ Which phone call was this for?

Without context propagation, your booking record just shows it was created by some text agent conversation. You can't trace it back to the specific phone call. For audit trails, billing reconciliation, or "interaction history" views, this is a dealbreaker.

The Solution: Automatic Parent Lineage

Universal API solves this at the platform level. When a voice agent delegates to a text agent (via the call_uapi_agent tool), the platform automatically:

Captures the voice session's conversationId and agentId
Injects them as parentConversationId and parentAgentId on the text agent's conversation record
Propagates the full lineage to every MCP tool call the text agent makes

Your MCP server receives the complete chain:

javascript

userContext.sessionContext = {
  conversationId: "text-conv-xyz",           // the text agent conversation
  agentId: "booking-assistant-456",          // text agent ID
  userId: "user-789",                        // the end-user
  parentConversationId: "voice-session-abc", // ← THE PHONE CALL
  parentAgentId: "receptionist-123",         // ← THE VOICE AGENT
}

Zero configuration required. This works out of the box for any voice agent that delegates to a text agent.

How to Use It (MCP Server)

In your MCP server, simply read parentConversationId from the session context:

javascript

function createMcpServer(userContext) {
  const server = new McpServer({ name: "scheduling", version: "1.0.0" });
  const session = userContext.sessionContext || {};

  server.registerTool("create_booking", {
    description: "Book an appointment",
    inputSchema: {
      customerName: z.string(),
      date: z.string(),
      time: z.string(),
      service: z.string(),
    }
  }, async ({ customerName, date, time, service }) => {
    const booking = await db.bookings.insert({
      customerName,
      date,
      time,
      service,
      // 🎯 Automatic lineage — no manual plumbing needed
      conversationId: session.conversationId,
      voiceCallId: session.parentConversationId,
      bookedVia: session.parentAgentId ? 'voice-delegation' : 'direct',
    });

    return {
      content: [{ type: "text", text: `Booked: ${date} at ${time}` }]
    };
  });

  return server;
}
module.exports = { createMcpServer };

Now every booking in your database carries the voice session ID. You can:

Display the booking alongside the phone call transcript
Build "call → actions taken" timelines
Audit which voice calls generated which business outcomes
Track conversion rates per voice agent version

The Voice Agent: Nothing Special Needed

Your voice agent just uses the standard call_uapi_agent pattern. No context passing, no special headers:

python

from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands import tool
import os, json, urllib.request

BOOKING_AGENT_ID = "your-booking-agent-uuid"

@tool
def delegate_to_booking_agent(task: str) -> str:
    """Delegate a booking task to the text-based booking assistant.
    
    Args:
        task: Description of what to book or check
    """
    bearer_token = os.environ.get("UNIVERSALAPI_BEARER_TOKEN", "")
    url = f"https://stream.api.universalapi.co/agent/{BOOKING_AGENT_ID}/chat"

    req = urllib.request.Request(
        url,
        data=json.dumps({"prompt": task}).encode("utf-8"),
        method="POST",
        headers={
            "Authorization": f"Bearer {bearer_token}",
            "Content-Type": "application/json",
        },
    )
    with urllib.request.urlopen(req, timeout=300) as resp:
        raw = resp.read().decode("utf-8")

    # Strip metadata
    lines = [l for l in raw.strip().split("\n")
             if not l.startswith(("__META__", "__METRICS__", "__COMPLETE__",
                                  "__TOOL__", "__ERROR__"))]
    return "\n".join(lines).strip() or "Done."


def create_bidi_agent():
    model = BidiNovaSonicModel(
        region="us-east-1",
        model_id="amazon.nova-sonic-v1:0",
        provider_config={
            "audio": {"input_sample_rate": 16000, "output_sample_rate": 24000, "voice": "tiffany"}
        }
    )

    return BidiAgent(
        model=model,
        system_prompt="""You are a friendly AI receptionist for Bright Smile Dental.
When a caller wants to book, check availability, or modify an appointment,
use delegate_to_booking_agent. Keep your voice responses to 1-3 sentences.""",
        tools=[delegate_to_booking_agent]
    )

That's it. When delegate_to_booking_agent makes the HTTP call to the streaming API, the platform intercepts it and injects the voice session's X-UAPI-Session-Context header automatically. The text agent receives it, stores parentConversationId, and passes it to downstream MCP tools.

Building an Interaction Timeline

With parent lineage stored, you can build rich "call history" views. Query all text agent conversations spawned from a voice session:

python

import boto3
from boto3.dynamodb.conditions import Attr

table = boto3.resource('dynamodb').Table('AgentConversationTable')

def get_call_details(voice_session_id: str):
    """Get all delegated work from a voice call."""
    response = table.scan(
        FilterExpression=Attr('parentConversationId').eq(voice_session_id)
    )
    return response['Items']

Each item shows the delegated text agent's conversation: what tools were called, what the outcome was, how long it took. Combine this with the voice transcript for a complete picture:

📞 Voice Call: May 12, 2:15 PM — 3 min 42 sec
│
├── Caller: "I need to reschedule my Thursday appointment"
├── Agent: "Let me check your current booking..."
│   └── 🔧 Delegated: booking-assistant (1.8s)
│       ├── Tool: get_booking → Found: Thursday 10:00 AM, Cleaning
│       └── Result: "You have a cleaning at 10 AM Thursday"
│
├── Agent: "You have a cleaning at 10 AM Thursday. When works better?"
├── Caller: "How about Friday afternoon?"
├── Agent: "Let me check Friday afternoon..."
│   └── 🔧 Delegated: booking-assistant (2.1s)
│       ├── Tool: check_availability → 2:00 PM, 3:30 PM available
│       ├── Tool: reschedule_booking → Confirmed Friday 2:00 PM
│       └── Result: "Rescheduled to Friday 2:00 PM"
│
├── Agent: "Done! I've moved you to Friday at 2 PM."
└── Caller: "Perfect, thanks!"

Under the Hood

Here's what the platform does behind the scenes:

Bidi runtime installs session context when a WebSocket connects:

python

install_session_context({
    'conversationId': session_id,  # voice session UUID
    'agentId': agent_id,
    'userId': user_id,
    'sessionType': 'bidi',
})

httpx monkey-patch injects X-UAPI-Session-Context header on any outbound HTTP request to UAPI domains (and only UAPI domains — third-party calls never receive this).
Text agent runtime parses the inbound header, stores parentConversationId and parentAgentId on the conversation record, then installs its own session context that includes the parent fields.
MCP runtime extracts the header and makes it available as userContext.sessionContext.

The entire propagation chain is transparent to your code. You just read the fields you need.

When is `parentConversationId` Present?

Scenario	`parentConversationId`
Voice agent → text agent → MCP	✅ Voice session UUID
Text agent → text agent (any agent-to-agent)	✅ Calling agent's conversation
Direct agent chat (no delegation)	❌ Not set
MCP called from external client (Cline, etc.)	❌ `sessionContext` is null

Always handle the case where it's not present:

javascript

const { parentConversationId } = userContext.sessionContext || {};
// parentConversationId may be undefined — handle gracefully

Key Takeaways

It's automatic — No configuration, no extra parameters, no headers to manage
It's platform-level — Works for any voice→text delegation, not just specific agents
It's secure — Only injected on UAPI domain requests; never leaks to third parties
It's non-sensitive — Only UUIDs, no secrets or PII
It enables real business value — Audit trails, interaction timelines, conversion tracking

Get Started

Docs: Voice-to-Text Delegation Guide
Session Context Reference: MCP Session Context
Voice Agents: Voice Agent Guide
Multi-Agent Patterns: Agents as Tools

Already using voice agents on Universal API? Your context propagation is live today — just read sessionContext.parentConversationId in your MCP tools and you're done.

Automatic Voice→Text Context: How Your MCP Tools Know Which Phone Call They're Serving ​

The Problem: Context Loss in Multi-Agent Systems ​

The Solution: Automatic Parent Lineage ​

How to Use It (MCP Server) ​

The Voice Agent: Nothing Special Needed ​

Building an Interaction Timeline ​

Under the Hood ​

When is parentConversationId Present? ​

Key Takeaways ​

Get Started ​

Automatic Voice→Text Context: How Your MCP Tools Know Which Phone Call They're Serving

The Problem: Context Loss in Multi-Agent Systems

The Solution: Automatic Parent Lineage

How to Use It (MCP Server)

The Voice Agent: Nothing Special Needed

Building an Interaction Timeline

Under the Hood

When is `parentConversationId` Present?

Key Takeaways

Get Started