AI Object Detection brings automatic object identification to VR eye tracking studies by combining real-time YOLOv8 object detection with SightLab gaze data collection. During a session, the script captures the Vizard render window, runs YOLO on each frame to identify objects in the scene, and creates invisible 3D collision volumes at the approximate positions of detected objects. These volumes are registered as SightLab sceneObjects with gaze=True, so dwell time, view count, and other gaze metrics are collected automatically per object — no manual scene object setup required.
This allows researchers to run VR eye tracking studies where participants look around a virtual environment and the system automatically records what they looked at, for how long, and how many times — all without having to pre-label every object in the scene.
Desktop testing: The script also works in desktop mode without a headset, which is useful for testing and validating detection settings before running a full VR session.
Architecture
Vizard renders 3D scene (e.g. homeOffice)
|
v
Render window (or desktop mirror in HMD mode)
|
v
WindowCapture (finds window by title from config)
|
v
grab_yolo_frame() -- Win32 PrintWindow -> numpy RGB array
|
v
YOLODetector (background thread, ultralytics YOLOv8)
|
v
DetectedObjectManager
- Creates vizshape.addBox() collision volumes
- Registers each as sightlab.addSceneObject(key, node, gaze=True)
- Matches detections across frames to maintain persistent keys
- Removes stale objects after OBJECT_PERSISTENCE_TIME
|
v
SightLab collects gaze/dwell data on each tracked object
Requirements
Software
Requirement
Notes
Vizard 8
WorldViz Vizard with Python 3.x
SightLab
sightlab_utils must be on the Python path
ultralytics
YOLOv8 — pip install ultralytics
opencv-python
Image processing — pip install opencv-python
numpy
Array handling — pip install numpy
pywin32
Window capture — pip install pywin32
Installing Dependencies in Vizard
Use Vizard's built-in Package Manager (Tools → Package Manager) or run pip directly from Vizard's Python:
Note: The first time ultralytics runs, it will download the YOLOv8 model file (~6 MB for yolov8n.pt). This requires an internet connection.
Files
File
Purpose
AI_ObjectDetection_Config.py
All tunable settings (model, thresholds, visuals, capture, environment)
AI_ObjectDetection.py
Main script — run this in Vizard
How to Run
Open AI_ObjectDetection_Config.py and verify settings (model, environment, confidence, etc.)
CAPTURE_WINDOW_TITLE must match the Vizard render window title (default: "AI_ObjectDetection"). Set to None to be prompted with a window picker at startup
Open AI_ObjectDetection.py in Vizard and press F5 (or use the "Run WinViz on Current File" task)
The script loads the configured environment (default: homeOffice.osgb)
Press Spacebar to start the trial — YOLO detection begins automatically
The participant looks around the VR scene; detected objects appear as semi-transparent boxes with labels
Press Spacebar again to end the trial
SightLab saves gaze data (dwell time, view count, etc.) per detected object to the data/ folder
Runtime Keyboard Controls
Key
Action
Space
Start / stop trial
d
Toggle debug bounding boxes on/off
i
Toggle YOLO overlays in HMD (keeps them on desktop mirror for researcher)
Model size. Options: yolov8n.pt (nano, fastest), yolov8s.pt (small), yolov8m.pt (medium, most accurate)
YOLO_CONFIDENCE
0.4
Minimum confidence threshold (0.0–1.0). Lower = more detections but more false positives
DETECTION_INTERVAL
0.5
Seconds between YOLO inference runs. Lower = more responsive, higher = less CPU
YOLO_CLASSES
None
COCO class IDs to detect. None = all classes. Example: [56, 62, 63] for chair, tv, laptop
MAX_TRACKED_OBJECTS
15
Maximum simultaneous tracked objects
3D Mapping
Setting
Default
Description
DEFAULT_OBJECT_DEPTH
2.0
Distance (meters) in front of the view where collision volumes are placed
COLLISION_BOX_SIZE
[0.2, 0.2, 0.15]
Width, height, depth (meters) of each collision volume. Thicker depth = easier gaze intersection
OBJECT_PERSISTENCE_TIME
5.0
Seconds an object survives after YOLO stops detecting it. Must be > SightLab's dwell threshold (500ms) or dwell data won't accumulate
MATCHING_DISTANCE_THRESHOLD
0.5
Max normalised screen-space distance to match a new detection to an existing tracked object of the same class. Higher = more forgiving when the user moves
Visualization
Setting
Default
Description
SHOW_DEBUG_BOXES
True
Show green semi-transparent bounding boxes over detected objects
DEBUG_BOX_ALPHA
0.25
Opacity of debug boxes (0.0–1.0)
SHOW_LABELS
True
Show 3D text labels (class name + confidence) above each object
SHOW_OVERLAYS_IN_HMD
True
Whether overlays render in the HMD at startup. Toggle with i key at runtime. When off, overlays still appear on the desktop mirror
Gaze Tracking
Setting
Default
Description
ENABLE_GAZE_TRACKING
True
Register detected objects as SightLab gaze targets
USE_GAZE_BASED_ID
True
Print console messages and show labels when gaze dwells on an object
Window Capture
Setting
Default
Description
CAPTURE_WINDOW_TITLE
"AI_ObjectDetection"
Window title to capture. Must match the Vizard window title. Set to None to be prompted with a window picker at startup