Loading...
Projects / Programmes source: ARIS

Slovenian word-prevalence: an online mega-study of word knowledge

Research activity

Code Science Field Subfield
6.05.00  Humanities  Linguistics   

Code Science Field
6.02  Humanities  Languages and Literature 
Keywords
word prevalence, vocabulary knowledge, crowdsourcing megastudy, lexicology, Slovenian, word-picture matching task, lexical decision task
Evaluation (metodology)
source: COBISS
Points
5,084.2
A''
927.36
A'
2,134.3
A1/2
2,264.15
CI10
281
CImax
81
h10
8
A1
17.14
A3
2.9
Data for the last 5 years (citations for the last 10 years) on October 15, 2025; Data for score A3 calculation refer to period 2020-2024
Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )
Database Linked records Citations Pure citations Average pure citations
WoS  23  130  117  5.09 
Scopus  42  271  247  5.88 
Organisations (4) , Researchers (9)
0618  Research Centre of the Slovenian Academy of Sciences and Arts
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  53502  PhD Dejan Gabrovšek  Linguistics  Researcher  2023 - 2024  82 
2.  21708  PhD Nataša Gliha Komac  Linguistics  Researcher  2023 - 2025  418 
3.  37555  PhD Janoš Ježovnik  Linguistics  Researcher  2023 - 2025  126 
4.  30798  PhD Andrej Perdih  Linguistics  Head  2023 - 2025  208 
5.  54765  Miha Sušnik  Linguistics  Researcher  2025  23 
0312  University Medical Centre Ljubljana
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  55245  Tina Pogorelčnik  Interdisciplinary research  Researcher  2023 - 2025  27 
2.  58798  Klara Trpkova Bergant  Interdisciplinary research  Researcher  2024 - 2025  12 
0588  University of Ljubljana, Faculty of Education
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  38535  PhD Matic Pavlič  Linguistics  Researcher  2023 - 2025  172 
1540  University of Nova Gorica
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  31177  PhD Artur Stepanov  Linguistics  Researcher  2023 - 2025  123 
Abstract
The goal of the project is to determine word-prevalence data for Slovenian by means of a mega-study of lexical decision and word–picture matching tasks. The project will provide the crucial subjective psycholinguistic norm, namely prevalence, defined as the percentage of the population that knows a word. Ratings of 10,000-20,000 Slovenian words will be collected from 4,000-8,000 native speakers through lexical decision and word–picture matching tasks, and converted into standardized, freely available, and reliably evaluated norms of word prevalence. Based on word prevalence, researchers can select word stimuli more carefully and according to their intentions. First, by ranking words according to word prevalence (combined with word frequency), it is possible to delineate word difficulty ranges that can be used in selecting stimuli for psycholinguistic studies with factorial designs as well as for clinical use of diagnostic tests (such as receptive vocabulary tests). It can also be used to predict differences in word processing efficiency. Second, word prevalence can be used as an estimate of the difficulty of words in vocabulary tests. In addition, it is likely to be of interest to researchers developing algorithms for assessing the difficulty of texts. Third, word prevalence is useful in selecting vocabulary for preparing materials for teaching and learning a language as an L1 or L2. Finally, one of the main criteria for selecting headwords in general (monolingual or bilingual) dictionaries is currently word frequency. In the low-frequency ranges, it will be extremely useful to supplement this standard with word prevalence. To achieve the goal, we will first prepare the experimental protocol for the mega-study; that is, building language datasets, such as a word list and a nonword list, and defining the socio-demographic metadata to be collected from the respondents. Then, the questionnaire will be promoted to obtain responses from a large number of Slovenian L1 adult speakers. The questionnaire will run for one year. The responses will then be analyzed to obtain answers to questions such as how respondents’ age, sex, place of growing up, education, number of languages spoken, and occupation affect word prevalence. We will also obtain information about which words are better known by Slovenian speakers and how corpus frequency, word length, and other variables correlate with word prevalence. Furthermore, a methodology will be developed for including word-prevalence data in dictionary compilation. To tackle unforeseen challenges that may arise during the project, we have appointed an international independent observer and advisor with vast experience gained in the recent Catalan word-prevalence research project.
Views history
Favourite