Skip to content

AI Data Analysis Tools

AI Agent

This document provides an overview, configuration, dependencies, and usage examples for the AI-driven data analysis scripts in this toolkit.


1. AI_Data_Analysis.py

Purpose:

  • Interactively query a single CSV file from your experiment’s data tree using natural language via either OpenAI or a local Ollama model.

Key Features:

  • Choice of AI backend at launch: CHAT_GPT or OFFLINE_OLLAMA.
  • Configurable offline model (default: gemma3).
  • Automatic context generation including:

  • First 20 rows of the dataset (adjustable). If parsing lots of files or the trial_data that has lots of timestamps may need to significantly increase this number.

  • Summary statistics (DataFrame.describe()).
  • Top 10 numeric correlation pairs (adjustable)
  • Create custom study context file to guide data summaries
  • Text-to-speech (TTS) support when using cloud GPT (Nova voice).

Configuration Constants:

OFFLINE_MODEL       = "gemma3"        # default Ollama model name
PRODUCE_SUMMARY     = True            # toggle automatic summary on startup
STUDY_CONTEXT       = """"""          # inline fallback context (optional)
STUDY_CONTEXT_FILE  = "study_context.txt"  # external notes (set "" to disable)
MAX_PREVIEW_ROWS    = 20              # rows of raw data to include in context
MAX_CORR_PAIRS      = 10              # top correlation pairs

Dependencies:

  • pandas, vizinput, openai (for cloud), vlc (for audio playback)
  • Environment variable OPENAI_API_KEY for cloud usage.

Usage:

  1. Place script within your project.
  2. Run:

AI_Data_Analysis.py

3. Select AI backend when prompted.
4. Choose a CSV file from the scanned directory.
5. If PRODUCE_SUMMARY is set to True it will automatically produce a summary followed by the option to ask follow up questions. Note: It may take some time to produce the summary depending on the amount of data (at least around 15-20 seconds if not longer)
6. Enter natural-language questions at the prompt; type exit to quit.


2. AI_Data_Analysis_Entire_Folder.py

Purpose:

  • Sweep the entire experiment data tree for CSV files, build compact summaries per file, and allow natural-language interrogation across the full collection.

Key Features:

  • Scans Data/.
  • Summarizes each CSV with limited preview rows and correlation pairs.
  • AI backend options: CHAT_GPT (with TTS) or OFFLINE_OLLAMA.
  • Adjustable token budget knobs:

  • MAX_PREVIEW_ROWS (default 5)

  • MAX_CORR_PAIRS (default 5)

Configuration Constants:

MAX_PREVIEW_ROWS = 5
MAX_CORR_PAIRS   = 5
OFFLINE_MODEL    = "gemma3"

Dependencies:

  • Same as AI_Data_Analysis.py.

Usage:

  1. Place script in your experiment root.
  2. Run:

AI_Data_Analysis_Entire_Folder.py

3. Choose AI backend when prompted.
4. Ask questions across all CSVs; type exit to quit.


3. Auto_AI_View_Detection.py

Purpose:

  • Analyze extracted video frames using OpenAI’s Vision API (GPT-4o-mini) to detect objects of interest and compute dwell times based on frame counts.

Key Features:

  • Encodes PNG frames as Base64 and sends to the Vision API.
  • Threshold-based dwell detection (FRAMES_FOR_DWELL, default 30 frames).
  • Aggregates object mention counts to approximate attention/dwell metrics.
  • Logs analysis results to timestamped text files in Analysis_Outputs/.

Configuration Constants:

FRAMES_FOR_DWELL = 30

Dependencies:

  • openai, pandas, Python standard libraries, and environment variable OPENAI_API_KEY.

Workflow:

  1. Record session: Use SightLab’s built-in screen recording to capture your VR session.
  2. Add video: Place the recorded .mp4/.avi file into the replay_videos/ folder.
  3. Extract frames: Run Convert Video to Images.py to generate individual PNG frames in extracted_frames/.
  4. Detect views: Execute Auto_AI_View_Detection.py, which analyzes those frames and saves output to Analysis_Outputs/analysis_output_<timestamp>.txt.
  5. Summarize interactions: Run object_interactions.py on the same frame set or analysis files to compute summary metrics of viewed objects.
  6. Follow up: Use Follow_Up_Questions.py to generate deeper AI-driven follow-up questions based on the saved analysis.

Usage:

  1. After extracting frames via the workflow above, run:

Auto_AI_View_Detection.py

2. Enter the number of frames to analyze (0 to process all).
3. Review the generated analysis in Analysis_Outputs/.


4. Convert Video to Images.py

Purpose:

  • Extract frames from session videos saved in replay_videos/ (.mp4/.avi) into individual PNG images.

Key Features:

  • GUI prompt to select a video file from replay_videos/ using vizinput.
  • Batch extraction of frames via OpenCV.
  • Saves frames sequentially as extracted_frames/frame_####.png.

Dependencies:

  • opencv-python (cv2), vizinput.

Usage:

  1. Place your .mp4/.avi files in replay_videos/.
  2. Run in Vizard:

"Convert Video to Images.py"

3. Select a video when prompted.
4. Frames are saved in extracted_frames/.


5. Follow_Up_Questions.py

Purpose:

  • Generate AI-driven follow-up questions or deeper analysis based on existing analysis output texts.

Key Features:

  • Lists and selects analysis files from Analysis_Outputs/.
  • Loads optional context_prompt.txt to seed the AI prompt.
  • Uses OpenAI Chat Completion with TTS playback (optional).
  • Writes follow-up answers to Analysis_Answers/ with timestamped filenames.

Configuration Constants:

SPEAK_RESPONSE = True
OPEN_AI_TTS_MODEL = "tts-1"
OPEN_AI_VOICE = "nova"

Dependencies:

  • openai, vlc, vizinput, and environment variable OPENAI_API_KEY.

Usage:

  1. Ensure Analysis_Outputs/ contains .txt files.
  2. (Optional) Create context_prompt.txt for custom AI context.
  3. Run:

Follow_Up_Questions.py

4. Choose a file; AI-generated follow-ups appear and are saved in Analysis_Answers/.


6. object_interactions.py

Purpose:

  • Similar to Auto_AI_View_Detection.py, this script analyzes object interactions from frame sequences, detecting and counting via Vision API.

Key Features:

  • Reads frames from extracted_frames/.
  • Uses a local key.txt for the OpenAI API key.
  • Logs dwell-based interaction counts to timestamped files in Analysis_Outputs/.

Configuration Constants:

FRAMES_FOR_DWELL = 30  # frames threshold

Dependencies:

  • openai, pandas, and Python standard libraries.

Usage:

  1. After obtaining an openai key, In windows search type "cmd" enter setx OPENAI_API_KEY "your-api-key"Restart Vizard
  2. Ensure extracted_frames/ contains frames.
  3. Run:

object_interactions.py

4. Enter number of frames to analyze.
5. Check Analysis_Outputs/ for results.


Was this page helpful?