A Treebank Approach to the Study of Spoken Slovenian

Code

Z6-4617 (B) - included in ARIS records

Head

PhD Kaja Dobrovoljc

Period

10/1/2022 - 5/31/2025

Science

Humanities (1)

Reseacher status

Researcher (1)
Junior expert or technical associate (0)

Education

Doctoral degree (1)

Sex

Woman (1)

Status

Employed at RO and RRD (1)

No. of publications

100–999 (1)

Projects / Programmes source: ARIS

A Treebank Approach to the Study of Spoken Slovenian

Research activity

Code	Science	Field	Subfield
6.05.00	Humanities	Linguistics

Code	Science	Field
6.02	Humanities	Languages and Literature

Keywords

spoken language, spoken grammar; syntactically annotated corpora, treebanks, dependency syntax, syntactic trees; corpus linguistics, corpus-driven research, comparing corpora, language variation

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Organisations (1) , Researchers (1)

0581 University of Ljubljana, Faculty of Arts

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	36491	PhD Kaja Dobrovoljc	Linguistics	Head	2022 - 2025	197

Abstract

Based on the unitary approach to the study of language, whereby speech and writing are seen as two ends of the same continuum, the past three decades have witnessed an unprecedented increase of corpus linguistic research aimed at describing speech-specific syntactic phenomena that have been ignored or insufficiently addressed by traditional grammatical frameworks. However, this trend is significantly less pronounced in Slovenian linguistics, where research on syntactic characteristics of spoken Slovenian is still scarce and has mostly been focused on top-down investigations of individual syntactic phenomena based on qualitative analyses of relatively small amounts of data. To bridge this gap and establish the necessary empirical foundations for future grammatical descriptions of spoken Slovenian, this project will systematically investigate the potential of syntactically annotated corpora, i.e. treebanks, for linguistic research on spoken Slovenian by (1) establishing a coherent framework for syntactic annotation of spoken Slovenian, (2) providing a high-quality treebank of spoken Slovenian, and (3) developing a methodology for its bottom-up statistics-driven linguistic analysis, while (4) promoting the use of syntactically annotated corpora in linguistics in general. Specifically, we will significantly improve the current version of the Spoken Slovenian Treebank (Dobrovoljc and Nivre 2016), the only syntactically annotated corpus of spoken Slovenian to date, both in terms of size, documentation, and the quality of annotations. In turn, the new treebank will be used to perform a pioneering bottom-up identification of speech-specific syntactic patterns in spoken Slovenian by means of a keyness analysis resulting in a list of syntactic trees with a statistically significant higher frequency of occurrence in speech than in writing. We expect the in-depth analysis of this list to empirically confirm the known, prototypical, cognitively most salient speech-specific syntactic phenomena on the one hand, and lead to the potential discovery of previously unidentified, statistically most salient patterns of spoken language use, on the other. Thus, the project will result in several important contributions to Slovenian linguistics by providing new resources, methods, and analyses for the study of spoken Slovenian, but also to the field of corpus linguistics in general by providing new insights on the heretofore underexploited methodological potential of syntactically parsed corpora, both for spoken language studies and studies on language variation in general.

A Treebank Approach to the Study of Spoken Slovenian

Views history

Favourite

A Treebank Approach to the Study of Spoken Slovenian

FRASCATI classification

FORD classification

Confirmation required

Views history

Favourite