Denna sida på svenska

Research project Privacy-Preserving Techniques for Large Language Models

Recent breakthroughs in AI have been driven mainly by large language models. While they can be very useful, they also threaten privacy – they leak private information. This project aims to identify these risks and develop privacy-preserving techniques.

Denna sida på svenska

Theme

Digitalization and AI

Contact person at SU

Thomas Vakili

PhD student

Department of Computer and Systems Sciences

08-16 16 59

thomas.vakili@dsv.su.se

Overview

Project period

15-02-2021 - 30-01-2026

Responsible

Department of Computer and Systems Sciences

Research subjects

AI and Data Science Language Technology

Status

Ongoing

Funding

Digital Futures

Research groups

Clinical Text Mining Group

The Clinical Text Mining Group is a creative research group of computer scientists, engineers, computational linguists and physicians. We perform research in both artificial intelligence, language technology and health informatics.

HPV-16 cells - a high-risk type for cancer.

A knight in armour in a fantasy landscape.

Image: Thomas Vakili (generated with Open AI’s Dall-e 2).

Large language models (LLMs) have led to impressive breakthroughs in artificial intelligence (AI) and natural language processing (NLP). LLMs consist of enormous amounts of parameters trained to process human language. This learning is achieved by processing vast amounts of text.

Multiple studies have shown that LLMs memorize information in their training data that can then leak. These privacy issues are worsening as LLMs grow and consume more training data. The risks are especially dire in domains where data are sensitive, such as the clinical domain. At the same time, these are the domains where AI can have the most beneficial societal impact. This project aims to study the threats to privacy that come with LLMs and investigate privacy-preserving techniques to mitigate these risks. Doing so is crucial if LLMs are to be used in an ethical and legal manner.

This is Thomas Vakili’s PhD project. His main supervisor is Hercules Dalianis, and his co-supervisor is Aron Henriksson.

Project members

Project managers

Thomas Vakili

PhD student

Department of Computer and Systems Sciences

08-16 16 59

thomas.vakili@dsv.su.se

Hercules Dalianis

Professor

Department of Computer and Systems Sciences

08-16 16 16

hercules@dsv.su.se

Aron Henriksson

Associate professor

Department of Computer and Systems Sciences

08-16 49 85

aronhen@dsv.su.se

Publications

Thomas Vakili's licentiate thesis (2023):

”Attacking and Defending the Privacy of Clinical Language Models”

See all related publications in Diva