Skip to content

Automatic Voice→Text Context: How Your MCP Tools Know Which Phone Call They're Serving

You've built a voice AI receptionist. A caller books an appointment, and your voice agent delegates the actual scheduling work to a text agent with MCP tools. But here's the question: how does your booking system know which phone call triggered that booking?

On Universal API, the answer is: automatically. No extra code, no manual ID passing, no configuration.

The Problem: Context Loss in Multi-Agent Systems

Multi-agent delegation is powerful — your voice agent handles the conversation while specialized text agents handle complex tool operations. But it creates a context gap:

📞 Caller: "I'd like to book a haircut for Saturday"
  → Voice Agent processes speech
    → Text Agent checks availability + creates booking
      → MCP Server writes to database
        → ❓ Which phone call was this for?

Without context propagation, your booking record just shows it was created by some text agent conversation. You can't trace it back to the specific phone call. For audit trails, billing reconciliation, or "interaction history" views, this is a dealbreaker.

The Solution: Automatic Parent Lineage

Universal API solves this at the platform level. When a voice agent delegates to a text agent (via the call_uapi_agent tool), the platform automatically:

  1. Captures the voice session's conversationId and agentId
  2. Injects them as parentConversationId and parentAgentId on the text agent's conversation record
  3. Propagates the full lineage to every MCP tool call the text agent makes

Your MCP server receives the complete chain:

javascript
userContext.sessionContext = {
  conversationId: "text-conv-xyz",           // the text agent conversation
  agentId: "booking-assistant-456",          // text agent ID
  userId: "user-789",                        // the end-user
  parentConversationId: "voice-session-abc", // ← THE PHONE CALL
  parentAgentId: "receptionist-123",         // ← THE VOICE AGENT
}

Zero configuration required. This works out of the box for any voice agent that delegates to a text agent.

How to Use It (MCP Server)

In your MCP server, simply read parentConversationId from the session context:

javascript
function createMcpServer(userContext) {
  const server = new McpServer({ name: "scheduling", version: "1.0.0" });
  const session = userContext.sessionContext || {};

  server.registerTool("create_booking", {
    description: "Book an appointment",
    inputSchema: {
      customerName: z.string(),
      date: z.string(),
      time: z.string(),
      service: z.string(),
    }
  }, async ({ customerName, date, time, service }) => {
    const booking = await db.bookings.insert({
      customerName,
      date,
      time,
      service,
      // 🎯 Automatic lineage — no manual plumbing needed
      conversationId: session.conversationId,
      voiceCallId: session.parentConversationId,
      bookedVia: session.parentAgentId ? 'voice-delegation' : 'direct',
    });

    return {
      content: [{ type: "text", text: `Booked: ${date} at ${time}` }]
    };
  });

  return server;
}
module.exports = { createMcpServer };

Now every booking in your database carries the voice session ID. You can:

  • Display the booking alongside the phone call transcript
  • Build "call → actions taken" timelines
  • Audit which voice calls generated which business outcomes
  • Track conversion rates per voice agent version

The Voice Agent: Nothing Special Needed

Your voice agent just uses the standard call_uapi_agent pattern. No context passing, no special headers:

python
from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands import tool
import os, json, urllib.request

BOOKING_AGENT_ID = "your-booking-agent-uuid"

@tool
def delegate_to_booking_agent(task: str) -> str:
    """Delegate a booking task to the text-based booking assistant.
    
    Args:
        task: Description of what to book or check
    """
    bearer_token = os.environ.get("UNIVERSALAPI_BEARER_TOKEN", "")
    url = f"https://stream.api.universalapi.co/agent/{BOOKING_AGENT_ID}/chat"

    req = urllib.request.Request(
        url,
        data=json.dumps({"prompt": task}).encode("utf-8"),
        method="POST",
        headers={
            "Authorization": f"Bearer {bearer_token}",
            "Content-Type": "application/json",
        },
    )
    with urllib.request.urlopen(req, timeout=300) as resp:
        raw = resp.read().decode("utf-8")

    # Strip metadata
    lines = [l for l in raw.strip().split("\n")
             if not l.startswith(("__META__", "__METRICS__", "__COMPLETE__",
                                  "__TOOL__", "__ERROR__"))]
    return "\n".join(lines).strip() or "Done."


def create_bidi_agent():
    model = BidiNovaSonicModel(
        region="us-east-1",
        model_id="amazon.nova-sonic-v1:0",
        provider_config={
            "audio": {"input_sample_rate": 16000, "output_sample_rate": 24000, "voice": "tiffany"}
        }
    )

    return BidiAgent(
        model=model,
        system_prompt="""You are a friendly AI receptionist for Bright Smile Dental.
When a caller wants to book, check availability, or modify an appointment,
use delegate_to_booking_agent. Keep your voice responses to 1-3 sentences.""",
        tools=[delegate_to_booking_agent]
    )

That's it. When delegate_to_booking_agent makes the HTTP call to the streaming API, the platform intercepts it and injects the voice session's X-UAPI-Session-Context header automatically. The text agent receives it, stores parentConversationId, and passes it to downstream MCP tools.

Building an Interaction Timeline

With parent lineage stored, you can build rich "call history" views. Query all text agent conversations spawned from a voice session:

python
import boto3
from boto3.dynamodb.conditions import Attr

table = boto3.resource('dynamodb').Table('AgentConversationTable')

def get_call_details(voice_session_id: str):
    """Get all delegated work from a voice call."""
    response = table.scan(
        FilterExpression=Attr('parentConversationId').eq(voice_session_id)
    )
    return response['Items']

Each item shows the delegated text agent's conversation: what tools were called, what the outcome was, how long it took. Combine this with the voice transcript for a complete picture:

📞 Voice Call: May 12, 2:15 PM — 3 min 42 sec

├── Caller: "I need to reschedule my Thursday appointment"
├── Agent: "Let me check your current booking..."
│   └── 🔧 Delegated: booking-assistant (1.8s)
│       ├── Tool: get_booking → Found: Thursday 10:00 AM, Cleaning
│       └── Result: "You have a cleaning at 10 AM Thursday"

├── Agent: "You have a cleaning at 10 AM Thursday. When works better?"
├── Caller: "How about Friday afternoon?"
├── Agent: "Let me check Friday afternoon..."
│   └── 🔧 Delegated: booking-assistant (2.1s)
│       ├── Tool: check_availability → 2:00 PM, 3:30 PM available
│       ├── Tool: reschedule_booking → Confirmed Friday 2:00 PM
│       └── Result: "Rescheduled to Friday 2:00 PM"

├── Agent: "Done! I've moved you to Friday at 2 PM."
└── Caller: "Perfect, thanks!"

Under the Hood

Here's what the platform does behind the scenes:

  1. Bidi runtime installs session context when a WebSocket connects:

    python
    install_session_context({
        'conversationId': session_id,  # voice session UUID
        'agentId': agent_id,
        'userId': user_id,
        'sessionType': 'bidi',
    })
  2. httpx monkey-patch injects X-UAPI-Session-Context header on any outbound HTTP request to UAPI domains (and only UAPI domains — third-party calls never receive this).

  3. Text agent runtime parses the inbound header, stores parentConversationId and parentAgentId on the conversation record, then installs its own session context that includes the parent fields.

  4. MCP runtime extracts the header and makes it available as userContext.sessionContext.

The entire propagation chain is transparent to your code. You just read the fields you need.

When is parentConversationId Present?

ScenarioparentConversationId
Voice agent → text agent → MCP✅ Voice session UUID
Text agent → text agent (any agent-to-agent)✅ Calling agent's conversation
Direct agent chat (no delegation)❌ Not set
MCP called from external client (Cline, etc.)sessionContext is null

Always handle the case where it's not present:

javascript
const { parentConversationId } = userContext.sessionContext || {};
// parentConversationId may be undefined — handle gracefully

Key Takeaways

  1. It's automatic — No configuration, no extra parameters, no headers to manage
  2. It's platform-level — Works for any voice→text delegation, not just specific agents
  3. It's secure — Only injected on UAPI domain requests; never leaks to third parties
  4. It's non-sensitive — Only UUIDs, no secrets or PII
  5. It enables real business value — Audit trails, interaction timelines, conversion tracking

Get Started


Already using voice agents on Universal API? Your context propagation is live today — just read sessionContext.parentConversationId in your MCP tools and you're done.

Universal API — The agentic entry point to the universe of APIs