Educational Interaction Tool- AI
This document describes how to use the Educational Interaction Tool in SightLab. This can be connected to an interactive, intelligent AI agent powered by various large language models like GPT-4 and Claude Opus. You can customize the agent's personality, use speech recognition, and leverage high-quality text-to-speech models. You can also record your own annotations that work with these features, connecting them to a virtual avatar or just using voice-over.
Tagged objects can display 3D text annotations and trigger audio explanations. With AI integration, users can ask follow-up questions. Any 3D scene object should automatically be taggable for interactions and conversational information.
Location: ExampleScripts > Education_Application_AI
Key Features
- Interact and converse with custom AI Large Language Models in real-time VR or XR simulations.
- Choose from OpenAI models (including GPT-4 and custom GPTs) and Anthropic models (e.g., Claude 3 Opus).
- Customize the agent's personality, contextual awareness, emotional state, interactions, and more. Save your creations as custom agents.
- Use speech recognition for voice or text-based interaction.
- Select high-quality voices from OpenAI TTS or Eleven Labs (requires API).
- Train the agent to adapt using conversation history and interactions.
- Works seamlessly with all SightLab features, including data collection, visualizations, and transcript saving.
- Automatically tag objects in scenes to prompt questions and present information.
Instructions
1. Installation
- Install the required libraries using the
Vizard Package Manager
. These include:
openai
(for OpenAI GPT agents)anthropic
(for Anthropic Claude agent)elevenlabs
(for ElevenLabs text-to-speech)SpeechRecognition
sounddevice
(pyaudio for older versions)python-vlc
-
numpy
-
Install VLC Player (minimum version 3.0.20).
-
Note: An active internet connection is required.
-
If using Vizard 8 or higher, copy the contents of the "updated speech recognition files" into
C:\Program Files\WorldViz\Vizard8\bin\lib\site-packages\speech_recognition
, overwriting__init__.py
andaudio.py
.
2. API Keys
- Create a "keys" folder in your SightLab root directory and add text files:
key.txt
: OpenAI API key.elevenlabs_key.txt
: Eleven Labs API key.ffmpeg_path.txt
: Path toffmpeg
'sbin
folder (if needed).anthropic_key.txt
: Anthropic API key for using Anthropic models.
3. Configuration
- Open
AI_Agent_Config_Education.py
(in configs folder) to configure options:
- AI_MODEL:
'CHAT_GPT'
or'CLAUDE'
. - OPENAI_MODEL: OpenAI model name (e.g.,
"gpt-4"
). - ANTHROPIC_MODEL: Anthropic model name (e.g.,
"claude-3-opus-20240229"
). - MAX_TOKENS: Number of tokens per exchange.
- USE_SPEECH_RECOGNITION: Toggle for voice interaction.
- SPEECH_MODEL: Choose OpenAI TTS or Eleven Labs.
-
USE_GUI: Enable or disable the GUI for environment selection.
-
Adjust avatar properties, environment settings, and more as needed.
4. Running the Script
Run AI_Agent_Education.py
to start.
5. Interaction
- Press and hold the 'c' key or RH grip button to speak. Release to stop and let the AI respond.
- Use the mouse or RH trigger to select objects and prompt information.
- 3D text will appear based on head position or eye gaze.
Modifying Environment and Avatars
- Place environment models in
resources/environments
or update the path in the configuration file. - Use the SightLab VR GUI to select which objects in the scene will be interactive.
Obtaining API Keys
- OpenAI:
- Visit OpenAI.
- Sign up/log in and navigate to the API section.
- Generate and save a new secret key in
key.txt
. - Eleven Labs:
- Log in at Eleven Labs.
- Copy your API key into
elevenlabs_key.txt
. - Anthropic:
- Go to Anthropic, sign up, and verify your account.
- Save the API key in
anthropic_key.txt
.
Additional Information
- Prompts for GPT models should be enclosed in quotation marks. Example:
"I am..."
for OpenAI; Anthropic does not require quotes. - Refer to ElevenLabs GitHub for more information.
- See Connecting Assistants for integrating assistants through OpenAI.
Issues and Troubleshooting
- Microphone Settings: Errors may occur if the microphone source conflicts between the VR headset and the system output.
- Character Limits: Eleven Labs' free tier limits output to 10,000 characters (paid plans offer higher limits).
- ffmpeg/mpv Errors: Ensure
ffmpeg
andmpv
are installed and their paths are added to Vizard's environment path.
Tips
- Environment Awareness: Take screenshots in SightLab (
/
key), upload them to ChatGPT, and use the descriptions in prompts. - Custom Event Mapping: Modify
vizconnect
settings insettings.py
undersightlab_utils/vizconnect_configs
for speaking button events. See Vizconnect Events for more. - You can connect "Assistants" through the openai API, but not custom GPTs.