Stockholm university

Caroline ArvidssonPhD student

About me

My research focuses on the cognitive processes subserving conversational behavior. I combine psycholinguistics, neuroscience, and computational models, to understand higher-level processing in conversational production, comprehension, and turn-taking.

Research projects

Publications

A selection from Stockholm University publication database

  • Conversational production and comprehension: fMRI-evidence reminiscent of but deviant from the classical Broca–Wernicke model

    2024. Caroline Arvidsson (et al.). Cerebral Cortex 34 (3)

    Article

    A key question in research on the neurobiology of language is to which extent the language production and comprehension systems share neural infrastructure, but this question has not been addressed in the context of conversation. We utilized a public fMRI dataset where 24 participants engaged in unscripted conversations with a confederate outside the scanner, via an audio-video link. We provide evidence indicating that the two systems share neural infrastructure in the left-lateralized perisylvian language network, but diverge regarding the level of activation in regions within the network. Activity in the left inferior frontal gyrus was stronger in production compared to comprehension, while comprehension showed stronger recruitment of the left anterior middle temporal gyrus and superior temporal sulcus, compared to production. Although our results are reminiscent of the classical Broca–Wernicke model, the anterior (rather than posterior) temporal activation is a notable difference from that model. This is one of the findings that may be a consequence of the conversational setting, another being that conversational production activated what we interpret as higher-level socio-pragmatic processes. In conclusion, we present evidence for partial overlap and functional asymmetry of the neural infrastructure of production and comprehension, in the above-mentioned frontal vs temporal regions during conversation.

    Read more about Conversational production and comprehension: fMRI-evidence reminiscent of but deviant from the classical Broca–Wernicke model
  • Why the GPT task of predicting the next word does not suffice to describe human language production: A conversational fMRI-study

    2023. Caroline Arvidsson, Johanna Sundström, Julia Uddén. Program Pdf of The 15th Annual Meeting of the Society for the Neurobiology of Language

    Conference

    Interest is surging around the ”next-word-predictability” task that allowed large language models to reach their current capacity. It is sometimes claimed that prediction is enough to model language production. We set out to study predictability in an interactive setting. The current fMRI study used the information-theoretic measure of surprisal – the negative log-probability of a word occurring given the preceding linguistic context, estimated by a pre-trained language model (GPT-2). Surprisal has been shown to correlate with bottom-up processing located in the bilateral middle and superior temporal gyri (MTG/STG) during narrative comprehension (Willems et al., 2016). Still, surprisal has never been used to investigate conversational comprehension or any kind of language production. We hypothesized that previous results on surprisal in narrative comprehension would be replicated with conversational comprehension and that next-word- predictability would not encompass language production processes. We utilized a publicly available fMRI dataset in which participants (N=24) engaged in unscripted conversations (12 min/participant) via an audio- video link with a confederate outside the scanner. The conversational events Production, Comprehension, and Silence were modeled in a whole-brain analysis. Two parametric modulations of production and comprehension were added: (1) log-transformed context-independent word frequency (control regressor) and (2) surprisal. Production-surprisal and Comprehension-surprisal were respectively contrasted against the implicit baseline. These contrasts were compared with the contrasts Production and Comprehension vs implicit baseline. If surprisal merely indexed part of the activity in the latter, broader contrasts, this provides a handle on production and comprehension processes beyond next-word-predictability. For surprisal in conversational production, we observed statistically signi�cant clusters in the left inferior frontal gyrus (LIFG), the medial frontal gyrus, and the motor cortex. Importantly, Production vs implicit baseline showed bilateral STG activation while STG was not parametrically modulated by surprisal. Moreover, the bilateral MTG/STG were the only clusters active for Comprehension vs implicit baseline and they were also modulated by surprisal. For comprehension, we thus replicated the previous narrative comprehension study (Willems et al.,2016), showing that unpredictable words activate the bilateral MTG/STG also in conversational settings. Next- word-predictability is thus so far a good model for conversational comprehension. For production, however, the next-word-predictability task helped to hone in on what is sometimes considered core production machinery in LIFG. Several functional interpretations of the STG recruitment during production are possible (such as monitoring for speech errors), but the current results point in the direction of two important conclusions: (1) a functional division of the frontal and temporal cortices during production, where the frontal component is prediction-related, and (2) that language processing during production is more than prediction, at least at the word-level. We provide a functional handle on such extra-predictive processes.

    Read more about Why the GPT task of predicting the next word does not suffice to describe human language production: A conversational fMRI-study
  • Investigating Conversational Dynamics in Human-Robot Interaction with fMRI

    2023. Torubarova Ekaterina (et al.). Proceedings of the Annual Meeting of the Cognitive Science Society

    Conference

    We investigated how verbal communication with a robot differs from talking to a human in terms of brain activity by analysing an open-source fMRI dataset. We focused on modeling conversational dynamics rather than conversation as a whole, by analysing fine-grained events, in particular turn initiation. The results indicate that turn initiation in a conversation with a human involves higher activation in auditory and visual cortex than turn initiation with a robot. Conversely, listening to the robot showed higher engagement of auditory cortex than listening to a human. We suggest that verbal and non-verbal turn-taking cues provided by the human agent engage more cognitive processing for picking up the turn. On the other hand, listening to a robot agent requires more processing than listening to a human. Both findings suggest that the accurate simulation of appropriate turn-taking cues and behaviors will help robots to establish more natural conversation dynamics and that the use of brain imaging can provide valuable objective measurements for assessing user states in human-robot interaction.

    Read more about Investigating Conversational Dynamics in Human-Robot Interaction with fMRI
  • When did you stop speaking to yourself? Age-related differences in adolescents’ world knowledge-based audience design

    2022. Caroline Arvidsson, David Pagmar, Julia Uddén. Royal Society Open Science 9 (11)

    Article

    The ability to adapt utterances to the world knowledge of one’s addressee is undeniably ubiquitous in human social cognition, but its development and association with other cognitive mechanisms during adolescence have not been studied. In an online production task, we measured the ability of children entering adolescence (ages 11–12, M= 11.8, 𝑁=29,17girlsN=29, 17 girls) and adolescents (ages 15–16, M = 15.9, 𝑁=29,17girlsN=29, 17 girls) to tailor referential expressions in accordance with the inferred world knowledge of their addressee—an ability we refer to as world knowledge-based audience design (AD). A post-test survey showed that both age groups held similar assumptions about the addressees’ knowledge of referents, but the younger age group did not consistently adapt their utterances in accordance with these assumptions during online production, resulting in a significantly improved AD behaviour across age groups. We also investigated the reliance of AD on executive functions (EF). Executive functioning (as reflected by performance on the Wisconsin card sorting task) increased significantly with age, but did not explain the age-related increase in AD performance. We thus provide evidence in support of an adolescent development of world knowledge-based AD over and above development of EF.

    Read more about When did you stop speaking to yourself? Age-related differences in adolescents’ world knowledge-based audience design
  • The Brain in Conversation: Mapping Turn-taking, Production and Comprehension with fMRI

    2022. Caroline Arvidsson (et al.).

    Conference

    INTRODUCTION: Conversation is the most ubiquitous form of language use. A hallmark of conversation is turn-taking, in which speakers rapidly alternate between speaker and listener roles without conscious effort, while simultaneously planning their upcoming turn. Since previous neurolinguistic studies have mainly investigated single or few linguistic processes in isolated environments that lack resemblance to real-world language use, the neurobiology of turntaking, production, and comprehension during real-time conversation is currently under-explored. In this fMRI investigation, we asked whether turn initiations would activate areas outside the classical perisylvian core language network and whether we would observe differences in activation during conversational production vs. conversational comprehension. METHODS: We utilized a publicly available fMRI dataset in which participants (N = 23) engaged in unscripted conversations via an audio-video link with a confederate outside the scanner. Each conversation (24 per participant) lasted for one minute. Conversational events were defined from the participant’s perspective. These events included turn initiations, defined as a 600 ms time window whose offset coincided with the onset of the participant’s turn. The duration of turn initiations was based on the reported minimum latency of speech preparation. The other events investigated in this study were production (defined as participant speech), and comprehension (defined as confederate speech). RESULTS: Turn initiations were associated with frontal regions outside of the classical perisylvian core language network. One cluster (2796 voxels, significant with FWE-correction used throughout) was observed in the medial prefrontal cortex bilaterally, spanning from the dorsal portion to the most ventral anterior cingulate cortex. Activation during turn initiations was also observed in the left middle frontal gyrus. Furthermore, both production and comprehension during conversation were associated with core language regions in the bilateral temporal lobes, but activation in the left inferior frontal gyrus (LIFG) was only present for production. Moreover, larger parts of the occipital cortex, and specifically the fusiform face area, were activated in comprehension than in production. DISCUSSION: We suggest that the observed frontal activation during turn initiations reflects sociopragmatic processes involved in intention processing and attentional control – processes that have not previously been localized outside the core perisylvian language network but have been hypothesized to play a crucial role in speech preparation during interaction. Furthermore, we interpret the fusiform face area activation during comprehension as an indication that listeners are aided by their interlocutor’s facial gestures specifically when comprehending speech input during real-time conversation. Finally, LIFG activation in conversational production but not comprehension may reflect the syntactic and semantic heuristics at play in conversational comprehension, minimizing the need for a full syntactic parse. The utilization of such heuristics may be a possible prerequisite for consistently meeting the expectations of timing in turn-taking.

    Read more about The Brain in Conversation
  • The brain in conversation: Mapping the neural correlates of turn-taking, production, and comprehension using fMRI

    2022. Caroline Arvidsson.

    Conversation is the primary mode of language use. A key feature of conversation is turn-taking, during which interlocutors rapidly switch between speaker and listener roles without conscious effort. As previous neuroimaging studies have investigated language comprehension in isolated contexts, little is known regarding the neurocognitive bases of language use in reciprocal interaction. The present fMRI study investigates turn-taking, production, and comprehension processes, by utilizing existing conversational data between participants (N = 23) and a confederate outside the scanner. Turn initiations were associated with regions (the medial prefrontal cortex and the middle frontal gyrus) outside of the perisylvian core language network. Production and comprehension were both associated with core language regions in the temporal lobes, but activation in the left inferior frontal gyrus was mainly associated with production. Activation in the fusiform face area was linked to comprehension. The current findings suggest that (1) the coordination of speaker change is dependent on pragmatic processes that have been relatively overlooked in models of speech preparation, and (2) listeners are aided by their interlocutor's facial gestures when processing speech input during conversation. In addition, the results indicate that production and comprehension processes may differ (e.g., on the syntactic level), even in conversation.

    Read more about The brain in conversation: Mapping the neural correlates of turn-taking, production, and comprehension using fMRI
  • Audience design and frame of reference in adolescents' reference production

    2021. Caroline Arvidsson, David Pagmar, Julia Uddén. Abstracts, 1519-1519

    Conference

    When participating in dialogue, speakers design their utterances to accommodate the individual needs of listeners (Bentz, et al., in prep). This feature is known as audience design (Clark & Murphy, 1982). Although audience design is central to conventional conversation, it is not known at which age speakers begin taking into account the world knowledge/frame of reference of their interlocutors. Indications from recent studies suggest that albeit preschool and first grade children engage in basic forms of perspective taking (Nadig & Sedivy, 2002), they fail to adapt their utterances in accordance with listener-specific needs in reference production (Pagmar, et al., in prep). Adult participants do however adapt their utterances, and individual differences in the adult population were not dependent on cognitive control function (Bentz, et al., in prep). The dependence on cognitive control function, e.g. switching, may be hypothesized to be greater in children. The current study aims to test the referential production of two age groups; early and mid adolescents (11;0-12;11 and 15;0-16;11), with the purpose of tracing the development of the ability to use information regarding listener-perspective during on-line referential production, and test its relation to cognitive control. The paradigm builds further on the well-established Director’s task but does not require the participants to take the visual perspective of the listener. Instead, participants are presented with a set of pictures portraying referents well-known to them, e.g. popular cartoon characters, hosts of children’s tv-shows, etc. Knowledge of the referents are controlled through post-test surveys. Furthermore, they are asked to direct listeners of two distinct groups, small children and elders, into choosing the target referent. Participants who take the frame of reference of addressees into consideration are expected to adopt different strategies when addressing the different groups, i.e., increase informativeness when denoting referents assumed to be unknown to the listener vs using less informative referential expressions (such as proper names) when denoting referents judged to be known to the listener. Cognitive control/executive function is assessed using the Wisconsin card sorting task. Results are discussed in terms of cognitive costs of switching strategies and the Gricean maxim of quantity.

    Read more about Audience design and frame of reference in adolescents' reference production
  • Conversations between ages five and seven: Connections to executive functions and implicature comprehension

    David Pagmar (et al.).

    A language user must rely on several different abilities to carry out a conversation, e.g. the ability to acknowledge the conversational contributions of others, to respond appropriately, to stay on topic, etc. There are many aspects of the development of conversational conduct that are yet unknown. In this study, the longitudinal development of conversational conduct, as in acknowledging one’s interlocutor’s previous turn, were traced from age 5;0 to 7;2. We also investigated whether conversational conduct was predicted by core language skill, executive functions, and specific pragmatic abilities. Previous findings of productive morpho- syntactic accuracy were replicated, while findings concerning longitudinal receptive vocabulary were not. We also found connections between childrens’ conversational responses and executive functions, working memory, and the comprehension of conversational implicatures. The results suggest that conversational conduct is dependent on both inferring communicative intentions, as well as being able to keep track of others' contributions and how they relate to previous turns.

    Read more about Conversations between ages five and seven

Show all publications by Caroline Arvidsson at Stockholm University