Projects / Programmes
Slovenian word-prevalence: an online mega-study of word knowledge
Code |
Science |
Field |
Subfield |
6.05.00 |
Humanities |
Linguistics |
|
Code |
Science |
Field |
6.02 |
Humanities |
Languages and Literature |
word prevalence, vocabulary knowledge, crowdsourcing megastudy, lexicology, Slovenian, word-picture matching task, lexical decision task
Data for the last 5 years (citations for the last 10 years) on
October 15, 2025;
Data for score A3 calculation refer to period
2020-2024
Data for ARIS tenders (
04.04.2019 – Programme tender,
archive
)
Database |
Linked records |
Citations |
Pure citations |
Average pure citations |
WoS |
23
|
130
|
117
|
5.09
|
Scopus |
42
|
271
|
247
|
5.88
|
Organisations (4)
, Researchers (9)
0618 Research Centre of the Slovenian Academy of Sciences and Arts
0312 University Medical Centre Ljubljana
no. |
Code |
Name and surname |
Research area |
Role |
Period |
No. of publicationsNo. of publications |
1. |
55245 |
Tina Pogorelčnik |
Interdisciplinary research |
Researcher |
2023 - 2025 |
27 |
2. |
58798 |
Klara Trpkova Bergant |
Interdisciplinary research |
Researcher |
2024 - 2025 |
12 |
0588 University of Ljubljana, Faculty of Education
no. |
Code |
Name and surname |
Research area |
Role |
Period |
No. of publicationsNo. of publications |
1. |
38535 |
PhD Matic Pavlič |
Linguistics |
Researcher |
2023 - 2025 |
172 |
1540 University of Nova Gorica
no. |
Code |
Name and surname |
Research area |
Role |
Period |
No. of publicationsNo. of publications |
1. |
31177 |
PhD Artur Stepanov |
Linguistics |
Researcher |
2023 - 2025 |
123 |
Abstract
The goal of the project is to determine word-prevalence data for Slovenian by means of a mega-study of lexical decision and word–picture matching tasks. The project will provide the crucial subjective psycholinguistic norm, namely prevalence, defined as the percentage of the population that knows a word. Ratings of 10,000-20,000 Slovenian words will be collected from 4,000-8,000 native speakers through lexical decision and word–picture matching tasks, and converted into standardized, freely available, and reliably evaluated norms of word prevalence.
Based on word prevalence, researchers can select word stimuli more carefully and according to their intentions. First, by ranking words according to word prevalence (combined with word frequency), it is possible to delineate word difficulty ranges that can be used in selecting stimuli for psycholinguistic studies with factorial designs as well as for clinical use of diagnostic tests (such as receptive vocabulary tests). It can also be used to predict differences in word processing efficiency. Second, word prevalence can be used as an estimate of the difficulty of words in vocabulary tests. In addition, it is likely to be of interest to researchers developing algorithms for assessing the difficulty of texts. Third, word prevalence is useful in selecting vocabulary for preparing materials for teaching and learning a language as an L1 or L2. Finally, one of the main criteria for selecting headwords in general (monolingual or bilingual) dictionaries is currently word frequency. In the low-frequency ranges, it will be extremely useful to supplement this standard with word prevalence.
To achieve the goal, we will first prepare the experimental protocol for the mega-study; that is, building language datasets, such as a word list and a nonword list, and defining the socio-demographic metadata to be collected from the respondents. Then, the questionnaire will be promoted to obtain responses from a large number of Slovenian L1 adult speakers. The questionnaire will run for one year. The responses will then be analyzed to obtain answers to questions such as how respondents’ age, sex, place of growing up, education, number of languages spoken, and occupation affect word prevalence. We will also obtain information about which words are better known by Slovenian speakers and how corpus frequency, word length, and other variables correlate with word prevalence. Furthermore, a methodology will be developed for including word-prevalence data in dictionary compilation.
To tackle unforeseen challenges that may arise during the project, we have appointed an international independent observer and advisor with vast experience gained in the recent Catalan word-prevalence research project.