Thesis defence: Xiu Li

Thesis defence

Date: Tuesday 22 October 2024

Time: 13.00 – 16.00

Location: Lilla hörsalen, DSV, Borgarfjordsgatan 12, Kista

Welcome to a thesis defence at DSV! Xiu Li presents her thesis on how language technology can be used to create more intelligent textbooks.

On October 22, 2024, Xiu Li will present her PhD thesis at the Department of Computer and Systems Sciences (DSV), Stockholm University. The title of the thesis is “Exploring Natural Language Processing for Linking Digital Learning Materials – Towards Intelligent and Adaptive Learning Systems”.

PhD student: Xiu Li, DSV
Opponent: Asad Sayeed, University of Gothenburg
Main supervisor: Aron Henriksson, DSV
Supervisors: Jalal Nouri and Martin Duneld, DSV

Contact Xiu Li

The defence takes place at DSV in Kista, starting at 13:00 pm.

Find your way to DSV

Abstract

The digital transformation in education has created many opportunities but also made it challenging to navigate the growing landscape of digital learning materials. The volume and diversity of learning resources create challenges for both educators and learners to identify and utilize the most relevant resources based on specific learning contexts.

In light of this, there is a critical demand for systems capable of effectively connecting different learning materials to support teaching and learning activities and, for that purpose, natural language processing can be used to provide some of the essential building blocks for educational content recommendation systems. Hence, this thesis explores the use of natural language processing techniques for automatically linking and recommending relevant learning resources in the form of textbook content, exercises and curriculum goals.

A key question is how to represent diverse learning materials effectively and, to that end, various language models are explored; the obtained representations are then used for measuring semantic textual similarity between learning materials. Learning materials can also be represented based on educational concepts, which is investigated in an ontology-based linking approach. To further enhance the representations and improve linking performance, different language models can be combined and augmented using external knowledge in the form of knowledge graphs and knowledge bases. Beyond approaches based on semantic textual similarity, prompting large language models is explored and a method based on retrieval-augmented generation (RAG) to improve linking performance is proposed.

The thesis presents a systematic empirical evaluation of natural language processing techniques for representing and linking digital learning content, spanning different types of learning materials, use cases, and subjects. The results demonstrate the feasibility of unsupervised approaches based on semantic textual similarity of representations derived from pre-trained language models, and that contextual embeddings outperform traditional text representation methods.

Furthermore, zero-shot prompting of large language models can outperform methods based on semantic textual similarity, leveraging RAG to exploit an external knowledge base in the form of a digital textbook. The potential practical applications of the proposed approaches for automatic linking of digital learning materials pave the way for the development of intelligent and adaptive learning systems, including intelligent textbooks.

Theme

Research subject

AI and Data Science

Research groups

Learning Analytics and AI for Education Group

The Learning Analytics and AI for Education Group does research on how data-driven methods (learning analytics) can be used to understand and strengthen education. We also study the application of AI technology in educational contexts.

Genre photo: A child and a robot are holding hands. Photo: Andy Kelly/Unsplash.

Natural Language Processing Research Group

The Natural Language Processing Research Group develops, applies and evaluates NLP methods, in particular involving large language models, across various domains. We focus on topics such as privacy, explainability, and domain adaptation.

HPV-16 cells - a high-risk type for cancer.

Last updated: September 23, 2024

Source: Department of Computer and Systems Sciences, DSV