Educational Interaction Tool- AI

This document describes how to use the Educational Interaction Tool in SightLab. This can be connected to an interactive, intelligent AI agent powered by various large language models like GPT-4 and Claude Opus. You can customize the agent's personality, use speech recognition, and leverage high-quality text-to-speech models. You can also record your own annotations that work with these features, connecting them to a virtual avatar or just using voice-over.

Tagged objects can display 3D text annotations and trigger audio explanations. With AI integration, users can ask follow-up questions. Any 3D scene object should automatically be taggable for interactions and conversational information.

Location: ExampleScripts > Education_Application_AI

Key Features

Interact and converse with custom AI Large Language Models in real-time VR or XR simulations.
Choose from OpenAI models and Anthropic models.
Customize the agent's personality, contextual awareness, emotional state, interactions, and more. Save your creations as custom agents.
Use speech recognition for voice or text-based interaction.
Select high-quality voices from OpenAI TTS or Eleven Labs (requires API).
Train the agent to adapt using conversation history and interactions.
Works seamlessly with all SightLab features, including data collection, visualizations, and transcript saving.
Automatically tag objects in scenes to prompt questions and present information.

Instructions

1. Installation

Install the required libraries using the

Vizard Package Manager

. These include:

openai (for OpenAI GPT agents)
anthropic (for Anthropic Claude agent)
elevenlabs (for ElevenLabs text-to-speech)
SpeechRecognition
sounddevice (pyaudio for older versions)
python-vlc (note: no longer required as of SightLab 2.3.8)
numpy (included in SightLab)
Install VLC Player (minimum version 3.0.20). (note: no longer required as of SightLab 2.3.8)
Download VLC 3.0.20
Note: An active internet connection is required.
If using Vizard 8 or higher, copy the contents of the "updated speech recognition files" into C:\Program Files\WorldViz\Vizard8\bin\lib\site-packages\speech_recognition, overwriting __init__.py and audio.py.

2. API Keys

OpenAI:
Visit OpenAI.
Sign up/log in and navigate to the API section.
Eleven Labs:
Log in at Eleven Labs.
Anthropic:
Go to Anthropic, sign up, and verify your account.
Create a "keys" folder in your SightLab root directory and add text files:
key.txt: OpenAI API key.
elevenlabs_key.txt: Eleven Labs API key.
ffmpeg_path.txt: Path to ffmpeg's bin folder (if needed).
anthropic_key.txt: Anthropic API key for using Anthropic models.
New Method (as of SightLab 2.3.7)
In windows search type "cmd" enter setx OPENAI_API_KEY "your-api-key", setx GEMINI_API_KEY "your-api-key",
setx ELEVENLABS_API_KEY "your-api-key", setx ANTHROPIC_API_KEY "your-api-key"
Restart Vizard
With this method you don't need to keep the keys in a folder in your project and your api keys can be accessed from any folder.

3. Configuration

Open AI_Agent_Config_Education.py

(in configs folder) to configure options:

AI_MODEL: 'CHAT_GPT' or 'CLAUDE'.
OPENAI_MODEL: OpenAI model name (e.g., "gpt-4").
ANTHROPIC_MODEL: Anthropic model name (e.g., "claude-3-opus-20240229").
MAX_TOKENS: Number of tokens per exchange.
USE_SPEECH_RECOGNITION: Toggle for voice interaction.
SPEECH_MODEL: Choose OpenAI TTS or Eleven Labs.
USE_GUI: Enable or disable the GUI for environment selection.
Adjust avatar properties, environment settings, and more as needed.

4. Running the Script

Run AI_Agent_Education.py to start.

5. Interaction

Press and hold the 'c' key or RH grip button to speak. Release to stop and let the AI respond.
Use the mouse or RH trigger to select objects and prompt information.
3D text will appear based on head position or eye gaze.

Modifying Environment and Avatars

Place environment models in resources/environments or update the path in the configuration file.
Use the SightLab VR GUI to select which objects in the scene will be interactive.

Additional Information

Prompts for GPT models should be enclosed in quotation marks. Example: "I am..." for OpenAI; Anthropic does not require quotes.
Refer to ElevenLabs GitHub for more information.
See Connecting Assistants for integrating assistants through OpenAI.

Issues and Troubleshooting

Microphone Settings: Errors may occur if the microphone source conflicts between the VR headset and the system output.
Character Limits: Eleven Labs' free tier limits output to 10,000 characters (paid plans offer higher limits).
ffmpeg/mpv Errors: Ensure ffmpeg and mpv are installed and their paths are added to Vizard's environment path.

Tips

Environment Awareness: Take screenshots in SightLab (/ key), upload them to ChatGPT, and use the descriptions in prompts.
Custom Event Mapping: Modify vizconnect settings in settings.py under sightlab_utils/vizconnect_configs for speaking button events. See Vizconnect Events for more.
You can connect "Assistants" through the openai API, but not custom GPTs.