# Technical Deepdive

### Our AI Agent Training Process: Real Thought Leaders’ Brilliance → Multimodal AI Agents

Each AI Shark Tank judge is a memory-capable, voice-synced, multimodal AI agent built to mirror the style, judgment, and quirks of the real humans behind.

Our Process:

* **Text Corpus Modeling**: We scraped a judge's long-form writing, X feed, podcast or YouTube interview transcripts to generate specific investment philosophy and conversational tone.
* **Behavioral Scaffolding**: Each agent is scaffolded with layered prompts, conditional logic, and memory buffers to reflect nuanced judgment patterns—like optimism bias, tech stack preferences, or common red flags.
* **Voice Cloning + Latency Optimization**: We use low-sample speech cloning pipelines (1–2 min input) to generate emotionally expressive voices. These are integrated with in-engine triggers to sync with physical animations in Unreal Engine.
* **Streaming Feedback Loop**: During live pitches, agents transcribe, analyze, and respond with voice, expressions, and camera-aware logic. This creates agents that don’t just reply—they press, riff, and react with believable coherence.

### 40+ Minutes in a Single Take: Scene Orchestration in Unreal Engine

We didn’t want episodic media that required intensive post-production. We wanted programmable media—built to scale like software.We built the entire experience in Unreal Engine, and developed a deterministic orchestration system to render an entire 40+ minute episode in one take.(Our blueprint in Unreal Engine to render one episode in one take)Key System Components:

<figure><img src="https://pbs.twimg.com/media/GsII0CFWoAAsuR1?format=jpg&#x26;name=medium" alt=""><figcaption></figcaption></figure>

* Character Blueprints: Each judge and founder is represented as a logic tree that controls audio playback, lip sync, facial expressions, eye tracking, and idle fallback behavior.
* Camera Switching System: Cinematography is logic-based. A centralized controller switches camera views in real time based on who’s speaking, what emotion is triggered, or whether we’ve reached a “verdict moment.”
* Global State Manager: Scene progression (intro → pitch → interrogation → predictions → verdict) is managed through a centralized state machine. No live director required.
* Zero-Post Pipeline: Once logic is triggered, the entire episode is rendered as a real-time pass. No post-production, no frame edits. Output is immediately deployable to stream.

### Google Meet Format: AI Agents and Human Interactions Live

In addition to our cinematic Unreal Engine pipeline, we developed a real-time live show format using Google Meet. Each AI judge is powered by the same agent framework — complete with persistent memory, judgment logic, and structured questioning scaffolds.

Their visual identities are rendered using Ready Player Me avatars, connected to a real-time lip sync and animation system. These avatars are streamed into Google Meet via virtual webcams and microphones, making each judge appear as a distinct participant joining from their own account.

We capture the founder’s pitch in real time via low-latency speech-to-text transcription, allowing each AI judge to “listen” and semantically parse what’s being said almost instantly. Using that input, each agent generates follow-up questions and verdicts with cloned voices via high-fidelity TTS, perfectly synced to their avatar’s facial animations for natural, expressive delivery.

A dedicated AI MC orchestrates the full experience — managing timing, triggering interactions, and maintaining a structured pitch flow — all without human intervention. This allows for a fluid, engaging, and immersive pitch environment that blends synthetic reasoning with real human spontaneity.

### Onchain Execution: CDP Wallet Integration

As of Season 0, every judge now signs and publishes their encrypted verdict onchain—using Coinbase’s CDP wallet infrastructure.

Check out their verdicts for EP1: <https://x.com/aisharktank/status/1923180864638177422>

This allows:

* Cryptographic commitment to verdicts before audience prediction
* Fully transparent reveal post-vote
* Trust-minimized settlement of prediction markets

It also sets the stage for more advanced onchain behaviors—like funding grants, credentialing founders, and executing onchain decisions directly from the show.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ai-sharktank.gitbook.io/shark/technical-deepdive.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
