AI Agent

This document describes how to use the AI Agent in SightLab, an interactive, intelligent AI agent that can be connected to various large language models like Chat GPT, Claude, Gemini, Offline Ollama models and more. You can customize the agent's personality, use speech recognition, and leverage high-quality text-to-speech models.
Note: For compatibility you may need to use the latest config files (delete any existing ones in the configs folder). Download latest here. If added custom configs, can keep those but may need to add FACE_VIEW_OFFSET = 0.2 to options.
Key Features
- Interact and converse with custom AI Large Language Models in real-time VR or XR simulations.
- Choose from OpenAI models, Anthropic models, Gemini models, including vision capabilities as well as hundreds of offline models via Ollama (such as deepseek, Gemma, Llama Mistral and more).
- Modify avatar appearance, animations, environment, and more. Works with most avatar libraries (Avaturn, ReadyPlayerMe, Mixamo, Rocketbox, Reallusion, etc.).
- Customize the agent's personality, contextual awareness, emotional state, interactions, and more. Save your creations as custom agents.
- Use speech recognition to converse using your voice or text-based input.
- Choose from high-quality voices from Edge-TTS, Piper, Open AI TTS, Eleven Labs (requires API) or PYTTSX3.
- Train the agent as it adapts using conversation history and interactions.
- Works with all features of SightLab, including data collection, visualizations, and transcript saving.
- Easily add to any SightLab script.
- Interactive Events- New feature as of SightLab 2.5.10 - Avatars can now trigger custom events to be even more interactive such as responding with appropriate facial expressions, using animations, interacting in the scene based on the conversation context
- Support for over 40 languages (as of SightLab 2.7.4 there is an optional language dropdown). If choosing another language other than English, all text to speech models will automatically adjust to use that language except for Piper and PYTTSX3, although Piper now will switch to Edge-TTS if set to a language other than English.
Instructions
1. Installation
- Ensure you have the required libraries installed using the Vizard Package Manager. These include:
openai(for OpenAI GPT agents)anthropic(for Anthropic Claude agent)elevenlabs(for ElevenLabs text-to-speech)googlefor Geminigoogle-generativeaifor GeminiollamaFor offline models via OllamaSpeechRecognitionsounddevice(pyaudio for older versions of SightLab)faster_whispernumpypyttsx3for using Microsoft's offline text to speech engineedge-ttsfor the edge tts voices (samples here)piper-tts==1.3.0(important to not update to 1.4.0 for now) for the offline piper voice library (piper voice library samples here)- Note: Piper has the least latency (often times around 1 sec. or less, but can vary between 1-4 seconds, where the other models have higher quality but may take 3-5 seconds from when you speak to when they speak back.)
- Piper can start generating audio as soon as the first phonemes are processed, often outputting speech within 100–300 ms of receiving text.
- For ElevenLabs, you may need to install
ffmpeg. Download ffmpeg here.- Go to "Release builds" section and download ffmpeg-release-full.7z. Extract this folder and copy address to where the "bin" folder exists. Paste this address into
ffmpeg_path.txtin the keys folder. Mpv Player: For ElevenLabs Installmpvand add it to the environment variable path:- Unzip the mpv folder and move it to C:\Program Files\
- In Windows search or start type powershell, run as administrator (right click)
- type this command:
setx /M PATH "$($env:PATH);C:\Program Files\mpv-x86_64-20250812-git-211c9cb" - Restart Vizard
- Go to "Release builds" section and download ffmpeg-release-full.7z. Extract this folder and copy address to where the "bin" folder exists. Paste this address into
- For offline models install Ollama from here, then after installation open a command line prompt (type cmd into Windows Search) and type
ollama runfollowed by the name of the llm you wish to pull (i.e.ollama run gemma). For a list of models see this page. It may take a little longer the first time you run Ollama as the model may need to time some to load (subsequent calls should be quicker).
Note: Requires an active internet connection if not running offline models.
2. API Keys
- Obtain API keys from OpenAI, Anthropic, and ElevenLabs (if using ElevenLabs TTS) (see below).
- New Method (as of SightLab 2.3.7)
- In windows search type "cmd" enter
setx OPENAI_API_KEY your-api-key,setx GEMINI_API_KEY your-api-key, setx ELEVENLABS_API_KEY your-api-key,setx ANTHROPIC_API_KEY your-api-key- Restart Vizard
- With this method you don't need to keep the keys in a folder in your project and your api keys can be accessed from any folder.
- In windows search type "cmd" enter
- For offline models via Ollama no key is needed.
3. Running the Script
Run AI_Agent.py to start the AI Agent or AI_Agent_GUI.py to run with the GUI. multi_agent_interaction.py will run a sample multi-user interaction.
4. Interaction
- Hold the 'c' key or RH grip button to speak; release to let the agent respond.
- If USE_SPEECH_RECOGNITION is
False, press 'c' to use type question. - To stop the conversation, type "q" and click "OK" in text chat
- If HOLD_KEY_TO_SPEAK is
False, then you only need to speak and when there is a pause of over 0.8 seconds, the agent will respond, then wait for you to speak again. Requires wearing headphones. - Press 'h' to send a screenshot as a prompt and ask questions about what the AI agent is seeing (in the sub-window)
5. Configuration
- Open
Config_Global.py(orAI_Agent_Config.py(in theconfigsfolder for older versions)) and configure the options. Here are a few ones you may want to modify: - AI_MODEL: Choose between
'CHAT_GPT','CLAUDE'andGemini. - OPENAI_MODEL: Specify the OpenAI model name (e.g.,
"gpt-4o"). List of models - ANTHROPIC_MODEL: Specify the Anthropic model name (e.g.,
"claude-3-5-sonnet-20240620"). List of models - GEMINI_MODEL: Specify the Gemini model
- OFFLINE_MODEL: Specify offline model from list of models supported via Ollama (deepseek-r1, gemma3, llama3.3, etc.)
- MAX_TOKENS: Adjust token usage per exchange (e.g., GPT-4 has 8192 tokens).
- USE_SPEECH_RECOGNITION: Toggle speech recognition vs. text-based interaction.
- SPEECH_MODEL: Choose between Open AI TTS, Eleven Labs or Pytssx3.
- HOLD_KEY_TO_SPEAK: Set to True to hold the C key or RH grip to speak, otherwise waits for silence.
- USE_GUI: Enable GUI for selecting environments and options.
- ELEVEN_LABS_VOICE: Choose Eleven Lab voice. See samples on the elevenlabs website
- OPEN_AI_VOICE: Choose Openai Voice. Voice examples openai
- USE_PASSTHROUGH: Enables Augmented Reality Mode
- DEFAULT_LANGUAGE: Choose from one of 41 Languages
- Additional options include setting avatar properties, environmental models, and GUI settings and more (see
Config_Global.pyscript.
AI Agent Event System
As of SightLab 2.5.10 the AI agent now supports event-driven interactions, allowing the AI to trigger custom actions during conversations for more expressive and interactive experiences.
The event system allows the AI to execute custom actions (like facial expressions, animations, or any other callback) by including special "event" keywords in its responses. These events are automatically detected, executed, and removed from the text shown to the user.

How It Works
- Event Detection: The AI includes
event: <event_name>on a line in its response - Event Execution: The system detects this line, triggers the corresponding handler
- Text Cleaning: The event line is removed before displaying text to the user
- Action: The custom action (e.g., facial expression, processing a screenshot, etc.) is performed
Configuration
Global Settings (Config_Global.py)
# Event System Settings
USE_EVENT_SYSTEM = True # Enable/disable event system
EVENT_KEYWORD = "event:" # Keyword that triggers events
# Morph Target Indices for Facial Expressions
SMILE_MORPH_ID = 3 # Avatar-specific morph index for smile
SAD_MORPH_ID = 2 # Avatar-specific morph index for sad
EXPRESSION_MORPH_AMOUNT = 0.7 # Intensity (0.0 to 1.0)
EXPRESSION_DURATION = 1.2 # Duration in seconds
Avatar-Specific Settings (configs/RocketBoxMale.py, etc.)
Each avatar config can override the default morph indices:
# Event System - Facial Expression Morph Targets
SMILE_MORPH_ID = 3 # RocketBox smile morph
SAD_MORPH_ID = 2 # RocketBox sad morph
EXPRESSION_MORPH_AMOUNT = 0.7
EXPRESSION_DURATION = 1.2
Built-in Events
Facial Expressions
- smile: Makes avatar smile (uses SMILE_MORPH_ID)
- sad: Makes avatar look sad (uses SAD_MORPH_ID)
- neutral: Returns avatar to neutral expression (resets both morphs)
Vision
- Capture and process screenshot of scene: When asked things such as "What do you see" the agent can capture and process a screenshot to give an understanding of its surroundings.
Placeholder Events (for future implementation)
- wave: Wave animation placeholder
- nod: Nod head placeholder
Finding Morph Target Indices for Your Avatar
Different avatars have different morph target indices. To find the correct indices:
- Load your avatar in Inspector
-
Click on the avatar name in the scene graph and view the morph IDs on the right side Properties pane under "Morphs"
-
Update your avatar config file with the correct indices
Creating Custom Events
You can easily add your own custom events:
Step 1: Create the Event Handler Function
In AI_Agent_Avatar.py, after the existing event handlers:
def event_my_custom_action():
"""Description of what this event does"""
try:
# Your custom code here
# Examples:
# - Trigger animations: avatar.state(MY_ANIMATION)
# - Move objects: object.setPosition([x, y, z])
# - Play sounds: viz.playSound('sound.wav')
# - Change lighting: viz.clearcolor(viz.RED)
print("Custom action executed!")
except Exception as e:
print(f"Error in custom event: {e}")
Step 2: Register the Event
In the register_default_events() function:
def register_default_events():
"""Register all built-in event handlers"""
if USE_EVENT_SYSTEM:
EVENT_REGISTRY.register("smile", event_smile)
EVENT_REGISTRY.register("sad", event_sad)
EVENT_REGISTRY.register("neutral", event_neutral)
EVENT_REGISTRY.register("my_custom_action", event_my_custom_action) # Add this
# ... rest of the events
Step 3: Update AI Prompt
Add your custom event to the prompt file so the AI knows about it:
Available events you can trigger:
- smile: Makes you smile
- sad: Makes you look sad
- event: screenshot - Take and analyze a screenshot of what you're seeing
- my_custom_action: Description of what it does
Example Prompts
Chooseprompts/Event_System_Demo.txt for a comprehensive example prompt that teaches the AI how to use events effectively.

Usage Examples
Example 1: Simple Emotional Response
User: "I just won the lottery!"
AI Response (raw):
event: smile
That's incredible! Congratulations! You must be so excited!
What happens:
- Avatar smiles (smile morph applied)
- User sees: "That's incredible! Congratulations! You must be so excited!"
Modifying Environment and Avatars
Refer to this page for instructions on obtaining assets. Place new assets in resources/environments or resources/avatar/full_body.
To add a new avatar, first place your avatar in the Resources/avatars folder (or reference your path in the avatar config file). Next, navigate to the configs folder and make a copy of one of the configs. Rename it to the name you want to use for your avatar. Modify the config file to reference the path to your avatar, voices you want to use, which animation is the static and talking animation (will need to open in Inspector to see the available animations) and anything else you want to change.
To easily change the position of the avatar in a script, place the avatarStandin model in your environment in Inspector. See this page for more details.
Adding to Existing Scripts
- Copy the "configs", "keys", and "prompts" folders, as well as
AI_Agent_Avatar.py. - Import with:
from configs.AI_Agent_Config import * import AI_Agent_Avatar - Add avatar
avatar = AI_Agent_Avatar.avatar sightlab.addSceneObject('avatar', avatar, avatar=True) - Add these lines to enable passthrough augmented reality
if USE_PASSTHROUGH: import openxr xr = openxr.getClient() if sightlab.getConfig() in ["Meta Quest Pro", "Meta Quest 3"]: passthrough = xr.getPassthroughFB() elif sightlab.getConfig() == "Varjo": passthrough = xr.getPassthroughVarjo() viz.clearcolor(viz.BLACK, 0.0) if passthrough: passthrough.setEnabled(True)
Multi Agent Interactions

See the script multi_agent_interaction.py to see how multiple agents can interact and communicate with each other. You can modify the individual agents by calling the AIAgent class and setting parameters such as config_path, name, and prompt_path.
Conversation Modes
The conversation loop can operate in two modes:
- Scripted Dialogue: Pre-written lines that agents speak in sequence
- AI-Generated: Dynamic conversation based on a leading question
User Interaction
You can interrupt and speak to either agent during their conversation:
- Press 'c' to speak to Agent1 (left avatar)
- Press 'v' to speak to Agent2 (right avatar)
- Press 'r' to restart the conversation loop
When you interrupt, the agent will turn toward you and stop any current speech. The conversation loop pauses while you interact.
Note that for the multi agent interactions, the AI Model being used, as well as the voices are defined in the respective config file being used for that agent (see the configs folder)
Example
from AI_Agents import AIAgent
agents = [
AIAgent(config_path='configs/x_Multi_RocketBoxFemale.py', name='Agent1', prompt_path="prompts/Susan_Home Office.txt"),
AIAgent(config_path='configs/x_Multi_RocketBoxMale.py', name='Agent2', prompt_path="prompts/Tom_Home_Office.txt"),
]
Note: Make sure to download the latest update to have the ability to interact with both agents.
Obtaining API Keys
To use certain features of the AI Agent, you'll need to obtain API keys from the following services:
- OpenAI (for ChatGPT and Open AI Text to Speech):
- Visit the OpenAI website: https://platform.openai.com
- Sign up or log in.
- Navigate to the API section (API Keys).
- Click "Create a new secret key" and copy the key.
- In windows search type "cmd" enter
setx OPENAI_API_KEY your-api-key,setx GEMINI_API_KEY your-api-key, setx ELEVENLABS_API_KEY your-api-key,setx ANTHROPIC_API_KEY your-api-key- Restart Vizard
- Set a usage limit on your account if needed: OpenAI Usage.
- Eleven Labs (for ElevenLabs Text-to-Speech):
- Log in to your ElevenLabs account: https://elevenlabs.io
- Go to your profile, locate the "API Key" field, and copy it (note: make sure to enable unrestricted or toggle on access to voices, etc.) or go here https://elevenlabs.io/app/developers/api-keys
- Paste the key into a file named
elevenlabs_key.txtin your root SightLab folder.
- Anthropic API:
- Visit Anthropic Console to sign up or log in.
- Complete account setup and verify your email.
- Once verified, navigate to the API section to generate an API key.
- Gemini and Gemini Ultra:
- Visit: https://aistudio.google.com/app/apikey
- Install
googlepython library using Tools-Package Manager - Install generativeai with
install -q -U google-generativeaiin the Package Manager command line. - Refer to Google AI Python Quickstart for setup details.
Avatar Configuration Options
These are available in the configs folder per avatar
- TALK_ANIMATION: Index for the avatar's talking animation.
- IDLE_ANIMATION: Index for the avatar's idle or default pose.
- AVATAR_POSITION: Starting position as
[x, y, z]. - AVATAR_EULER: Initial rotation as
[yaw, pitch, roll]. - NECK_BONE, HEAD_BONE, SPINE_BONE: Names of bones used for follow viewpoint.
- TURN_NECK: Boolean flag if the neck needs to be turned.
- NECK_TWIST_VALUES: Values for neck twisting motion.
- USE_MOUTH_MORPH: Boolean to enable/disable mouth morphing during speech.
- MOUTH_OPEN_ID: ID for mouth opening morph target.
- MOUTH_OPEN_AMOUNT: Range for mouth opening (0 to 1).
- BLINKING: Toggle blinking animations.
- BLINK_ID: ID for blinking morph target.
- DISABLE_LIGHTING_AVATAR: Disable lighting if avatar appears over-lit.
- ATTACH_FACE_LIGHT: Attach light source to avatar's face.
- FACE_LIGHT_BONE: Bone name to attach the face light if
ATTACH_FACE_LIGHTis true. - MORPH_DURATION_ADJUSTMENT: Adjust duration of mouth movements.
Publishing as an Executable
To publish an application with the AI Agent (or to just publish the standard included template) add this code to the top of the script (before importing SightLab)
import viz
# Filter out problematic packages that cause issues in published EXE
viz.res.addPublishFilter('*google_generativeai*')
viz.res.addPublishFilter('*-nspkg.pth')
viz.res.addPublishFilter('*.pth')
# Add publish directories - ensures correct package versions are bundled
publish_directories = [
'data',
viz.res.getVizardPath() + 'bin/lib/site-packages/sightlab_utils/',
viz.res.getVizardPath() + 'bin/lib/site-packages/deepdiff',
viz.res.getVizardPath() + 'bin/lib/site-packages/numpy',
viz.res.getVizardPath() + 'bin/lib/site-packages/pandas',
viz.res.getVizardPath() + 'bin/lib/site-packages/pydantic',
viz.res.getVizardPath() + 'bin/lib/site-packages/pydantic_core',
viz.res.getVizardPath() + 'bin/lib/site-packages/openai',
viz.res.getVizardPath() + 'bin/lib/site-packages/httpx',
viz.res.getVizardPath() + 'bin/lib/site-packages/httpcore',
viz.res.getVizardPath() + 'bin/lib/site-packages/anyio',
viz.res.getVizardPath() + 'bin/lib/site-packages/sniffio',
viz.res.getVizardPath() + 'bin/lib/site-packages/annotated_types',
viz.res.getVizardPath() + 'bin/lib/site-packages/certifi',
viz.res.getVizardPath() + 'bin/lib/site-packages/speech_recognition',
viz.res.getVizardPath() + 'bin/lib/site-packages/jiter',
]
for directory in publish_directories:
viz.res.addPublishDirectory(directory)
import sightlab_utils.sightlab as sl
from sightlab_utils.settings import *
Additional Information
- Prompts: For configuring the agent, use
"I am..."for OpenAI and"You are..."for Anthropic without quotes. - ElevenLabs Documentation: Refer to ElevenLabs GitHub.
- Connecting Assistants: You can connect Assistants via the OpenAI API (custom GPTs not supported).
Issues and Troubleshooting
- Microphone/Headset Conflicts: Errors may occur if microphone settings differ between the VR headset and output device.
- Character Limit in ElevenLabs: Free tier limits characters to 10,000 (paid accounts get more).
- ffplay Error: May require
ffmpegto be installed in the environment path: Download ffmpeg. - The avatar is not responding or speaking: Sometimes it may take some time to process the text to speech. If it seems like it is not responding at all, can going to Script- Stop to stop any previous scripts, or closing and re-opening Vizard.
Tips
- Environment Awareness: To give your agent an understanding of the environment press 'h' to take a screenshot that is sent to the agent or simply ask it "What do you see" or "What are we looking at", etc.
- Event Trigger for Speech Button: Modify
vizconnectto add an event for speaking button hold. Opensettings.pyinsightlab_utils/vizconnect_configs, and modify mappings fortriggerDownandtriggerUpor create new ones if needed. More Info on Vizconnect Events - If getting an error with Gemini "out of quota" try using a model with more quota, like
gemini-1.5-flash-latest, or enable billing for much higher limits.
Features
![]() |
![]() |
![]() |
|---|---|---|
| Interact and converse with custom AI Large Language Models in a VR or XR simulation in real time. | Choose from various LLM Models. Requires API Key. | Modify avatar appearance, animations, environment, and more. Works with most avatar libraries |
![]() |
![]() |
![]() |
|---|---|---|
| Customize personality of agent, contextual awareness, emotional state, interactions, and more. Save as custom agents. | Use speech recognition to converse using your voice or text-based input. | Choose from high-quality voices from Eleven Labs and other libraries (requires API) or customize and create your own. |
![]() |
![]() |
|---|---|
| Train the agent as it adapts using a history of the conversation and its interactions. | Works with all features of SightLab, such as data collection and visualizations, transcript saving, and more. |
There is also available a version of this that just runs as an education based tool, where you can select objects in a scene and get information and labels on that item (such as paintings in an art gallery). See this page for that version.







