Technical Deepdive

Our AI Agent Training Process: Real Thought Leaders’ Brilliance → Multimodal AI Agents

Each AI Shark Tank judge is a memory-capable, voice-synced, multimodal AI agent built to mirror the style, judgment, and quirks of the real humans behind.

Our Process:

  • Text Corpus Modeling: We scraped a judge's long-form writing, X feed, podcast or YouTube interview transcripts to generate specific investment philosophy and conversational tone.

  • Behavioral Scaffolding: Each agent is scaffolded with layered prompts, conditional logic, and memory buffers to reflect nuanced judgment patterns—like optimism bias, tech stack preferences, or common red flags.

  • Voice Cloning + Latency Optimization: We use low-sample speech cloning pipelines (1–2 min input) to generate emotionally expressive voices. These are integrated with in-engine triggers to sync with physical animations in Unreal Engine.

  • Streaming Feedback Loop: During live pitches, agents transcribe, analyze, and respond with voice, expressions, and camera-aware logic. This creates agents that don’t just reply—they press, riff, and react with believable coherence.

40+ Minutes in a Single Take: Scene Orchestration in Unreal Engine

We didn’t want episodic media that required intensive post-production. We wanted programmable media—built to scale like software.We built the entire experience in Unreal Engine, and developed a deterministic orchestration system to render an entire 40+ minute episode in one take.(Our blueprint in Unreal Engine to render one episode in one take)Key System Components:

  • Character Blueprints: Each judge and founder is represented as a logic tree that controls audio playback, lip sync, facial expressions, eye tracking, and idle fallback behavior.

  • Camera Switching System: Cinematography is logic-based. A centralized controller switches camera views in real time based on who’s speaking, what emotion is triggered, or whether we’ve reached a “verdict moment.”

  • Global State Manager: Scene progression (intro → pitch → interrogation → predictions → verdict) is managed through a centralized state machine. No live director required.

  • Zero-Post Pipeline: Once logic is triggered, the entire episode is rendered as a real-time pass. No post-production, no frame edits. Output is immediately deployable to stream.

Google Meet Format: AI Agents and Human Interactions Live

In addition to our cinematic Unreal Engine pipeline, we developed a real-time live show format using Google Meet. Each AI judge is powered by the same agent framework — complete with persistent memory, judgment logic, and structured questioning scaffolds.

Their visual identities are rendered using Ready Player Me avatars, connected to a real-time lip sync and animation system. These avatars are streamed into Google Meet via virtual webcams and microphones, making each judge appear as a distinct participant joining from their own account.

We capture the founder’s pitch in real time via low-latency speech-to-text transcription, allowing each AI judge to “listen” and semantically parse what’s being said almost instantly. Using that input, each agent generates follow-up questions and verdicts with cloned voices via high-fidelity TTS, perfectly synced to their avatar’s facial animations for natural, expressive delivery.

A dedicated AI MC orchestrates the full experience — managing timing, triggering interactions, and maintaining a structured pitch flow — all without human intervention. This allows for a fluid, engaging, and immersive pitch environment that blends synthetic reasoning with real human spontaneity.

Onchain Execution: CDP Wallet Integration

As of Season 0, every judge now signs and publishes their encrypted verdict onchain—using Coinbase’s CDP wallet infrastructure.

Check out their verdicts for EP1: https://x.com/aisharktank/status/1923180864638177422

This allows:

  • Cryptographic commitment to verdicts before audience prediction

  • Fully transparent reveal post-vote

  • Trust-minimized settlement of prediction markets

It also sets the stage for more advanced onchain behaviors—like funding grants, credentialing founders, and executing onchain decisions directly from the show.

Last updated