LLM- ASSISTED ONBOARDING VIA RETRIEVAL-AUGMENTED INTERACTIVE COMPUTATIONAL NOTEBOOKS

 

 

Berke Odacı
Computer Science & Engineering, MSc. Thesis 2025

 

Thesis Jury

Prof. Dr. Selim Balcısoy (Thesis Advisor), Prof. Dr. Tolga Kurtuluş Çapın

 Asst. Prof. Dr. Dilara Keküllüoğlu

 

 

Date & Time: June 30th, 2025 – 10.30 AM

Place: FENS 2019

Keywords: Large Language Models, Interactive Computational Notebooks,

Onboarding Support, Code Comprehension, Visual Analytics

 

Abstract

 

Recent advancements in large language models (LLMs) have significantly improved their ability to understand programming workflows and generate functional code. While these models are widely used for code-related tasks such as generation and completion, they often fall short in providing sufficient explanation or contextual understanding, both of which are essential for effectively working with existing projects.

This challenge is particularly evident in Visual Analytics workflows, where interactive computational notebooks (e.g., Jupyter Notebooks) are commonly used to prototype and document complex visualizations, data transformations, and machine learning pipelines. These notebooks are accessed not only by developers but also by domain experts such as economists, analysts, or researchers who interact with the outputs, interpret the findings, or request changes. For both groups, onboarding into an unfamiliar project can be time-consuming and error-prone due to missing documentation, implicit logic, and the complexity of the code-output relationship.

 

To address this, we present a tool that supports the onboarding process by leveraging LLMs to analyze, explain, and edit interactive computational notebooks. The system parses the notebook into a directed graph of cells, generates natural language explanations for each cell, and stores them in a retrieval-augmented vector store. Users interact with the notebook through a web-based interface, where they can ask natural language questions, select specific cells for focused explanations, and even request code modifications, all with the ability to revert changes if needed.

We evaluate the tool with both software developers and domain experts through a mixed-method study, including task-based interactions and post-task surveys. Results show that the tool improves users’ understanding of unfamiliar notebooks, increases their confidence in continuing the project, and is highly valued as a future onboarding aid. The tool demonstrates the potential of LLMs to bridge the gap between code and interpretation in data-driven environments, supporting more efficient collaboration and knowledge transfer across roles.