AI Agent
This document describes how to use the AI Agent in SightLab, an interactive, intelligent AI agent that can be connected to various large language models like GPT-4 and Claude Opus. You can customize the agent's personality, use speech recognition, and leverage high-quality text-to-speech models.
Key Features
- Interact and converse with custom AI Large Language Models in real-time VR or XR simulations.
- Choose from OpenAI models (including GPT-4o, custom GPTs), Anthropic models (like Claude 3 Opus or 3.5 Sonnet) and Gemini models (like Gemini 2.0 flash), including vision capabilities with Gemini, as well as hundreds of offline models via Ollama (such as deepseek, Gemma, Llama Mistral and more.
- Modify avatar appearance, animations, environment, and more. Works with most avatar libraries (Avaturn, ReadyPlayerMe, Mixamo, Rocketbox, Reallusion, etc.).
- Customize the agent's personality, contextual awareness, emotional state, interactions, and more. Save your creations as custom agents.
- Use speech recognition to converse using your voice or text-based input.
- Choose from high-quality voices from Open AI TTS or Eleven Labs (requires API).
- Train the agent as it adapts using conversation history and interactions.
- Works with all features of SightLab, including data collection, visualizations, and transcript saving.
- Easily add to any SightLab script.
Instructions
1. Installation
- Ensure you have the required libraries installed using the Vizard Package Manager. These include:
openai
(for OpenAI GPT agents)anthropic
(for Anthropic Claude agent)elevenlabs
(for ElevenLabs text-to-speech)google
for Geminigoogle-generativeai
for Geminiollama
For offline models via OllamaSpeechRecognition
sounddevice
(pyaudio for older versions of SightLab)python-vlc
numpy
pyttsx3
for using Microsoft's offline text to speech engine- Install VLC Player (for Open AI TTS). Minimum version required: 3.0.20.
- Download VLC 3.0.20 for Windows (64-bit)
- For ElevenLabs, you may need to install
ffmpeg
. Download ffmpeg here.- Go to "Release builds" section and download ffmpeg-release-full.7z. Extract this folder and copy address to where the "bin" folder exists. Paste this address into
ffmpeg_path.txt
in the keys folder. Mpv Player
: For ElevenLabs Installmpv
and add it to the environment variable path: Download mpv
- Go to "Release builds" section and download ffmpeg-release-full.7z. Extract this folder and copy address to where the "bin" folder exists. Paste this address into
- For offline models install Ollama from here, then after installation open a command line prompt (type cmd into Windows Search) and type
ollama run
followed by the name of the llm you wish to pull (i.e.ollama run gemma:7b
). For a list of models see this page.
Note: Requires an active internet connection if not running offline models.
2. API Keys
- Obtain API keys from OpenAI, Anthropic, and ElevenLabs (if using ElevenLabs TTS) (see below).
- Create a folder named
keys
in your SightLab root directory and place these text files inside: key.txt
: Contains your OpenAI API key.elevenlabs_key.txt
: Contains your ElevenLabs API key (if using ElevenLabs).ffmpeg_path.txt
: Contains the path to theffmpeg
bin folder.- If using the Anthropic model, create
anthropic_key.txt
with your Anthropic API key. - For Gemini, place a file named
gemini_key.txt
. -
New Method (as of SightLab 2.3.7)
- In windows search type "cmd" enter
setx OPENAI_API_KEY "your-api-key"
,setx GEMINI_API_KEY "your-api-key"
, setx ELEVENLABS_API_KEY "your-api-key"
,setx ANTHROPIC_API_KEY "your-api-key"
- Restart Vizard
- With this method you don't need to keep the keys in a folder in your project and your api keys can be accessed from any folder.
- In windows search type "cmd" enter
-
For offline models via Ollama no key is needed.
3. Running the Script
Run AI_Agent.py
to start the AI Agent or AI_Agent_GUI.py
to run with the GUI. multi_agent_interaction.py
will run a sample multi-user interaction.
4. Interaction
- Hold the 'c' key or RH grip button to speak; release to let the agent respond.
- If USE_SPEECH_RECOGNITION is
False
, press 'c' to use type question. - To stop the conversation, type "q" and click "OK" in text chat or say "exit."
- If using Gemini with vision, press 'h' to send a screenshot as a prompt and ask questions about what the AI agent is seeing (in the sub-window)
5. Configuration
- Open
Config_Global.py
(orAI_Agent_Config.py
(in theconfigs
folder for older versions)) and configure the options. Here are a few ones you may want to modify: - AI_MODEL: Choose between
'CHAT_GPT'
,'CLAUDE'
andGemini
. - OPENAI_MODEL: Specify the OpenAI model name (e.g.,
"gpt-4o"
). List of models - ANTHROPIC_MODEL: Specify the Anthropic model name (e.g.,
"claude-3-5-sonnet-20240620"
). List of models - GEMINI_MODEL: Specify the Gemini model
- OFFLINE_MODEL: Specify offline model from list of models supported via Ollama (deepseek-r1, gemma3, llama3.3, etc.)
- MAX_TOKENS: Adjust token usage per exchange (e.g., GPT-4 has 8192 tokens).
- USE_SPEECH_RECOGNITION: Toggle speech recognition vs. text-based interaction.
- SPEECH_MODEL: Choose between Open AI TTS, Eleven Labs or Pytssx3.
- USE_GUI: Enable GUI for selecting environments and options.
- ELEVEN_LABS_VOICE: Choose Eleven Lab voice. See samples on the elevenlabs website
- OPEN_AI_VOICE: Choose Openai Voice. Voice examples openai
- Additional options include setting avatar properties, environmental models, and GUI settings.
Modifying Environment and Avatars
Refer to this page for instructions on obtaining assets. Place new assets in resources/environments
or resources/avatar/full_body
.
Adding to Existing Scripts
- Copy the "configs", "keys", and "prompts" folders, as well as
AI_Agent_Avatar.py
. - Import with:
from configs.AI_Agent_Config import * import AI_Agent_Avatar
- Add avatar
avatar = AI_Agent_Avatar.avatar sightlab.addSceneObject('avatar', avatar, avatar=True)
- Add these lines to enable passthrough augmented reality
if USE_PASSTHROUGH: import openxr xr = openxr.getClient() if sightlab.getConfig() in ["Meta Quest Pro", "Meta Quest 3"]: passthrough = xr.getPassthroughFB() elif sightlab.getConfig() == "Varjo": passthrough = xr.getPassthroughVarjo() viz.clearcolor(viz.BLACK, 0.0) if passthrough: passthrough.setEnabled(True)
Multi Agent Interactions
See the script multi_agent_interaction.py
to see how multiple agents can interact and communicate with each other. You can modify the individual agents by calling the AIAgent class and setting parameters such as config_path, name and prompt_path.
Example
AIAgent(config_path='configs/x_Multi_RocketBoxFemale.py', name='Agent1',prompt_path = "prompts/Test.txt"),
Obtaining API Keys
To use certain features of the AI Agent, you'll need to obtain API keys from the following services:
- OpenAI (for ChatGPT and Open AI Text to Speech):
- Visit the OpenAI website: https://platform.openai.com
- Sign up or log in.
- Navigate to the API section (API Keys).
- Click "Create a new secret key" and copy the key.
- Paste the copied key into a text file named
key.txt
and place it in your root SightLab folder. - Set a usage limit on your account if needed: OpenAI Usage.
- Eleven Labs (for ElevenLabs Text-to-Speech):
- Log in to your ElevenLabs account: https://elevenlabs.io
- Go to your profile, locate the "API Key" field, and copy it.
- Paste the key into a file named
elevenlabs_key.txt
in your root SightLab folder.
- Anthropic API:
- Visit Anthropic Console to sign up or log in.
- Complete account setup and verify your email.
- Once verified, navigate to the API section to generate an API key.
- Gemini and Gemini Ultra:
- Visit: https://aistudio.google.com/app/apikey
- Install
google
python library using Tools-Package Manager - Install generativeai with
install -q -U google-generativeai
in the Package Manager command line. - Refer to Google AI Python Quickstart for setup details.
Avatar Configuration Options
These are available in the configs
folder per avatar
- TALK_ANIMATION: Index for the avatar's talking animation.
- IDLE_ANIMATION: Index for the avatar's idle or default pose.
- AVATAR_POSITION: Starting position as
[x, y, z]
. - AVATAR_EULER: Initial rotation as
[yaw, pitch, roll]
. - NECK_BONE, HEAD_BONE, SPINE_BONE: Names of bones used for follow viewpoint.
- TURN_NECK: Boolean flag if the neck needs to be turned.
- NECK_TWIST_VALUES: Values for neck twisting motion.
- USE_MOUTH_MORPH: Boolean to enable/disable mouth morphing during speech.
- MOUTH_OPEN_ID: ID for mouth opening morph target.
- MOUTH_OPEN_AMOUNT: Range for mouth opening (0 to 1).
- BLINKING: Toggle blinking animations.
- BLINK_ID: ID for blinking morph target.
- DISABLE_LIGHTING_AVATAR: Disable lighting if avatar appears over-lit.
- ATTACH_FACE_LIGHT: Attach light source to avatar's face.
- FACE_LIGHT_BONE: Bone name to attach the face light if
ATTACH_FACE_LIGHT
is true. - MORPH_DURATION_ADJUSTMENT: Adjust duration of mouth movements.
Additional Information
- Prompts: For configuring the agent, use
"I am..."
for OpenAI and"You are..."
for Anthropic without quotes. - ElevenLabs Documentation: Refer to ElevenLabs GitHub.
- Connecting Assistants: You can connect Assistants via the OpenAI API (custom GPTs not supported).
Issues and Troubleshooting
- Microphone/Headset Conflicts: Errors may occur if microphone settings differ between the VR headset and output device.
- Character Limit in ElevenLabs: Free tier limits characters to 10,000 (paid accounts get more).
- ffplay Error: May require
ffmpeg
to be installed in the environment path: Download ffmpeg. - The avatar is not responding or speaking: Sometimes it may take some time to process the text to speech. If it seems like it is not responding at all, can going to Script- Stop to stop any previous scripts, or closing and re-opening Vizard.
Tips
- Environment Awareness: To give your agent an understanding of the environment use the Gemini model with built in vision or take a screenshot (
/
key in SightLab) and use ChatGPT online to generate a description. Include this in the prompt. - Event Trigger for Speech Button: Modify
vizconnect
to add an event for speaking button hold. Opensettings.py
insightlab_utils/vizconnect_configs
, and modify mappings fortriggerDown
andtriggerUp
or create new ones if needed. More Info on Vizconnect Events - If getting an error with Gemini "out of quota" try using a model with more quota, like
gemini-1.5-flash-latest
, or enable billing for much higher limits.
Features
![]() |
![]() |
![]() |
---|---|---|
Interact and converse with custom AI Large Language Models in a VR or XR simulation in real time. | Choose from OpenAI models, including GPT-4, custom GPTs, and Anthropic (such as Claude 3 Opus) - Requires API key | Modify avatar appearance, animations, environment, and more. Works with most avatar libraries |
![]() |
![]() |
![]() |
---|---|---|
Customize personality of agent, contextual awareness, emotional state, interactions, and more. Save as custom agents. | Use speech recognition to converse using your voice or text-based input. | Choose from high-quality voices from Eleven Labs and other libraries (requires API) or customize and create your own. |
![]() |
![]() |
---|---|
Train the agent as it adapts using a history of the conversation and its interactions. | Works with all features of SightLab, such as data collection and visualizations, transcript saving, and more. |
There is also available a version of this that just runs as an education based tool, where you can select objects in a scene and get information and labels on that item (such as paintings in an art gallery). See this page for that version.
Planned Updates
- Enhanced scene interactions (e.g., object manipulation and event triggering)
- Code optimization and streamlining