AI Object Detection — Documentation

Overview
AI Object Detection brings automatic object identification to VR eye tracking studies by combining real-time YOLOv8 object detection with SightLab gaze data collection. During a session, the script captures the Vizard render window, runs YOLO on each frame to identify objects in the scene, and creates invisible 3D collision volumes at the approximate positions of detected objects. These volumes are registered as SightLab sceneObjects with gaze=True, so dwell time, view count, and other gaze metrics are collected automatically per object — no manual scene object setup required.
This allows researchers to run VR eye tracking studies where participants look around a virtual environment and the system automatically records what they looked at, for how long, and how many times — all without having to pre-label every object in the scene.
Desktop testing: The script also works in desktop mode without a headset, which is useful for testing and validating detection settings before running a full VR session.
Architecture
Vizard renders 3D scene (e.g. homeOffice)
|
v
Render window (or desktop mirror in HMD mode)
|
v
WindowCapture (finds window by title from config)
|
v
grab_yolo_frame() -- Win32 PrintWindow -> numpy RGB array
|
v
YOLODetector (background thread, ultralytics YOLOv8)
|
v
DetectedObjectManager
- Creates vizshape.addBox() collision volumes
- Registers each as sightlab.addSceneObject(key, node, gaze=True)
- Matches detections across frames to maintain persistent keys
- Removes stale objects after OBJECT_PERSISTENCE_TIME
|
v
SightLab collects gaze/dwell data on each tracked object
Requirements
Software
| Requirement | Notes |
|---|---|
| Vizard 8 | WorldViz Vizard with Python 3.x |
| SightLab | sightlab_utils must be on the Python path |
| ultralytics | YOLOv8 — pip install ultralytics |
| opencv-python | Image processing — pip install opencv-python |
| numpy | Array handling — pip install numpy |
| pywin32 | Window capture — pip install pywin32 |
Installing Dependencies in Vizard
Use Vizard's built-in Package Manager (Tools → Package Manager) or run pip directly from Vizard's Python:
"C:\Program Files\WorldViz\Vizard8\bin\python.exe" -m pip install ultralytics opencv-python numpy pywin32
Note: The first time
ultralyticsruns, it will download the YOLOv8 model file (~6 MB foryolov8n.pt). This requires an internet connection.
Files
| File | Purpose |
|---|---|
AI_ObjectDetection_Config.py |
All tunable settings (model, thresholds, visuals, capture, environment) |
AI_ObjectDetection.py |
Main script — run this in Vizard |
How to Run
- Open
AI_ObjectDetection_Config.pyand verify settings (model, environment, confidence, etc.) CAPTURE_WINDOW_TITLEmust match the Vizard render window title (default:"AI_ObjectDetection"). Set toNoneto be prompted with a window picker at startup- Open
AI_ObjectDetection.pyin Vizard and press F5 (or use the "Run WinViz on Current File" task) - The script loads the configured environment (default:
homeOffice.osgb) - Press Spacebar to start the trial — YOLO detection begins automatically
- The participant looks around the VR scene; detected objects appear as semi-transparent boxes with labels
- Press Spacebar again to end the trial
- SightLab saves gaze data (dwell time, view count, etc.) per detected object to the
data/folder
Runtime Keyboard Controls
| Key | Action |
|---|---|
Space |
Start / stop trial |
d |
Toggle debug bounding boxes on/off |
i |
Toggle YOLO overlays in HMD (keeps them on desktop mirror for researcher) |
o |
Toggle origin direction arrow |
r |
Reset viewpoint position |
p |
Toggle SightLab gaze point visibility |
Configuration Reference (AI_ObjectDetection_Config.py)
YOLO Detection
| Setting | Default | Description |
|---|---|---|
YOLO_MODEL |
'yolov8n.pt' |
Model size. Options: yolov8n.pt (nano, fastest), yolov8s.pt (small), yolov8m.pt (medium, most accurate) |
YOLO_CONFIDENCE |
0.4 |
Minimum confidence threshold (0.0–1.0). Lower = more detections but more false positives |
DETECTION_INTERVAL |
0.5 |
Seconds between YOLO inference runs. Lower = more responsive, higher = less CPU |
YOLO_CLASSES |
None |
COCO class IDs to detect. None = all classes. Example: [56, 62, 63] for chair, tv, laptop |
MAX_TRACKED_OBJECTS |
15 |
Maximum simultaneous tracked objects |
3D Mapping
| Setting | Default | Description |
|---|---|---|
DEFAULT_OBJECT_DEPTH |
2.0 |
Distance (meters) in front of the view where collision volumes are placed |
COLLISION_BOX_SIZE |
[0.2, 0.2, 0.15] |
Width, height, depth (meters) of each collision volume. Thicker depth = easier gaze intersection |
OBJECT_PERSISTENCE_TIME |
5.0 |
Seconds an object survives after YOLO stops detecting it. Must be > SightLab's dwell threshold (500ms) or dwell data won't accumulate |
MATCHING_DISTANCE_THRESHOLD |
0.5 |
Max normalised screen-space distance to match a new detection to an existing tracked object of the same class. Higher = more forgiving when the user moves |
Visualization
| Setting | Default | Description |
|---|---|---|
SHOW_DEBUG_BOXES |
True |
Show green semi-transparent bounding boxes over detected objects |
DEBUG_BOX_ALPHA |
0.25 |
Opacity of debug boxes (0.0–1.0) |
SHOW_LABELS |
True |
Show 3D text labels (class name + confidence) above each object |
SHOW_OVERLAYS_IN_HMD |
True |
Whether overlays render in the HMD at startup. Toggle with i key at runtime. When off, overlays still appear on the desktop mirror |
Gaze Tracking
| Setting | Default | Description |
|---|---|---|
ENABLE_GAZE_TRACKING |
True |
Register detected objects as SightLab gaze targets |
USE_GAZE_BASED_ID |
True |
Print console messages and show labels when gaze dwells on an object |
Window Capture
| Setting | Default | Description |
|---|---|---|
CAPTURE_WINDOW_TITLE |
"AI_ObjectDetection" |
Window title to capture. Must match the Vizard window title. Set to None to be prompted with a window picker at startup |
CAPTURE_FLIP |
None |
Flip captured frame: 0 = vertical, 1 = horizontal, -1 = both, None = no flip |
Other
| Setting | Default | Description |
|---|---|---|
SCREEN_RECORD |
True |
Enable SightLab's built-in screen recording |
ENVIRONMENT_MODEL |
'sightlab_resources/environments/homeOffice.osgb' |
3D environment to load |
INSTRUCTION_MESSAGE |
(see config) | Text shown at trial start |
Common COCO Class IDs
For use with YOLO_CLASSES:
| ID | Class | ID | Class | ID | Class |
|---|---|---|---|---|---|
| 0 | person | 56 | chair | 66 | keyboard |
| 39 | bottle | 57 | couch | 67 | cell phone |
| 41 | cup | 58 | potted plant | 73 | book |
| 46 | banana | 59 | bed | 74 | clock |
| 47 | apple | 60 | dining table | 75 | vase |
| 49 | orange | 62 | tv/monitor | 76 | scissors |
| 51 | carrot | 63 | laptop | 77 | teddy bear |
| 55 | cake | 64 | mouse |
Full list: COCO dataset classes
How Dwell Time Collection Works
SightLab tracks gaze on registered scene objects automatically. For dwell data to accumulate on a YOLO-detected object:
- The object must persist with the same key across multiple frames (e.g.
yolo_chair_3staysyolo_chair_3) - The object must survive long enough for the user's gaze to exceed SightLab's dwell threshold (default 500ms)
- The collision box must be thick enough for the gaze ray to intersect it
If objects are being removed and recreated too quickly (new keys each time), dwell time resets to zero. This is controlled by:
OBJECT_PERSISTENCE_TIME— how long an object survives after YOLO stops detecting it (default: 5s)MATCHING_DISTANCE_THRESHOLD— how aggressively detections are matched to existing tracked objects (default: 0.5)COLLISION_BOX_SIZEdepth — thicker boxes are easier to hit with gaze rays (default: 0.15m)
Output Data
SightLab saves standard experiment data to the data/ folder, including per-object:
- Dwell time — total time gaze rested on each detected object
- View count — number of times gaze entered each object
- Average dwell time — mean gaze duration per view
- First view time — when the user first looked at each object
- Gaze timeline — temporal sequence of gaze events
Each YOLO-detected object appears in the data with its key (e.g. yolo_chair_3, yolo_laptop_7).
Troubleshooting
| Issue | Solution |
|---|---|
| No detections appearing | Check that CAPTURE_WINDOW_TITLE in the config matches the Vizard window title. Try setting it to None to use the window picker |
| Detections flicker / constantly reset | Increase OBJECT_PERSISTENCE_TIME and MATCHING_DISTANCE_THRESHOLD |
| Dwell time only recorded on one object | Same as above — objects are being recycled before dwell accumulates |
DeleteDC failed error |
The script already calls screen_capture.stop_capture() to prevent this. If it still occurs, ensure only one capture source is active |
ultralytics not installed warning |
Install with: pip install ultralytics using Vizard's Python |
| Vizard autocomplete spamming errors | The script uses __import__() for third-party packages to avoid this. If it persists, ensure no standard import ultralytics lines exist |
| Boxes appear but no gaze data | Verify ENABLE_GAZE_TRACKING = True and that the collision box depth is sufficient (≥ 0.1m) |
| Low frame rate | Increase DETECTION_INTERVAL, use yolov8n.pt (nano model), or reduce MAX_TRACKED_OBJECTS |