Slovenian word-prevalence: an online mega-study of word knowledge

Code

J6-50199 (A) - included in ARIS records

Head

PhD Andrej Perdih

Period

10/1/2023 - 9/30/2026

Science

Humanities (7)
Interdisciplinary research (2)

Reseacher status

Researcher (9)
Junior expert or technical associate (0)

Education

Doctoral degree (6)
Other (3)

Sex

Woman (3)
Man (6)

Status

Employed at RO and RRD (9)

No. of publications

10–99 (4)
100–999 (5)

Projects / Programmes source: ARIS

Slovenian word-prevalence: an online mega-study of word knowledge

Research activity

Code	Science	Field	Subfield
6.05.00	Humanities	Linguistics

Code	Science	Field
6.02	Humanities	Languages and Literature

Keywords

word prevalence, vocabulary knowledge, crowdsourcing megastudy, lexicology, Slovenian, word-picture matching task, lexical decision task

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Points

5,084.2

A''

927.36

2,134.3

A1/2

2,264.15

CI10

281

CImax

h10

17.14

2.9

Data for the last 5 years (citations for the last 10 years) on October 15, 2025; Data for score A3 calculation refer to period 2020-2024

Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Database	Linked records	Citations	Pure citations	Average pure citations
WoS	23	130	117	5.09
Scopus	42	271	247	5.88

Organisations (4) , Researchers (9)

0618 Research Centre of the Slovenian Academy of Sciences and Arts

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	53502	PhD Dejan Gabrovšek	Linguistics	Researcher	2023 - 2024	82
2.	21708	PhD Nataša Gliha Komac	Linguistics	Researcher	2023 - 2025	418
3.	37555	PhD Janoš Ježovnik	Linguistics	Researcher	2023 - 2025	126
4.	30798	PhD Andrej Perdih	Linguistics	Head	2023 - 2025	208
5.	54765	Miha Sušnik	Linguistics	Researcher	2025	23

0312 University Medical Centre Ljubljana

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	55245	Tina Pogorelčnik	Interdisciplinary research	Researcher	2023 - 2025	27
2.	58798	Klara Trpkova Bergant	Interdisciplinary research	Researcher	2024 - 2025	12

0588 University of Ljubljana, Faculty of Education

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	38535	PhD Matic Pavlič	Linguistics	Researcher	2023 - 2025	172

1540 University of Nova Gorica

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	31177	PhD Artur Stepanov	Linguistics	Researcher	2023 - 2025	123

Abstract

The goal of the project is to determine word-prevalence data for Slovenian by means of a mega-study of lexical decision and word–picture matching tasks. The project will provide the crucial subjective psycholinguistic norm, namely prevalence, defined as the percentage of the population that knows a word. Ratings of 10,000-20,000 Slovenian words will be collected from 4,000-8,000 native speakers through lexical decision and word–picture matching tasks, and converted into standardized, freely available, and reliably evaluated norms of word prevalence. Based on word prevalence, researchers can select word stimuli more carefully and according to their intentions. First, by ranking words according to word prevalence (combined with word frequency), it is possible to delineate word difficulty ranges that can be used in selecting stimuli for psycholinguistic studies with factorial designs as well as for clinical use of diagnostic tests (such as receptive vocabulary tests). It can also be used to predict differences in word processing efficiency. Second, word prevalence can be used as an estimate of the difficulty of words in vocabulary tests. In addition, it is likely to be of interest to researchers developing algorithms for assessing the difficulty of texts. Third, word prevalence is useful in selecting vocabulary for preparing materials for teaching and learning a language as an L1 or L2. Finally, one of the main criteria for selecting headwords in general (monolingual or bilingual) dictionaries is currently word frequency. In the low-frequency ranges, it will be extremely useful to supplement this standard with word prevalence. To achieve the goal, we will first prepare the experimental protocol for the mega-study; that is, building language datasets, such as a word list and a nonword list, and defining the socio-demographic metadata to be collected from the respondents. Then, the questionnaire will be promoted to obtain responses from a large number of Slovenian L1 adult speakers. The questionnaire will run for one year. The responses will then be analyzed to obtain answers to questions such as how respondents’ age, sex, place of growing up, education, number of languages spoken, and occupation affect word prevalence. We will also obtain information about which words are better known by Slovenian speakers and how corpus frequency, word length, and other variables correlate with word prevalence. Furthermore, a methodology will be developed for including word-prevalence data in dictionary compilation. To tackle unforeseen challenges that may arise during the project, we have appointed an international independent observer and advisor with vast experience gained in the recent Catalan word-prevalence research project.

Slovenian word-prevalence: an online mega-study of word knowledge

Views history

Favourite

Slovenian word-prevalence: an online mega-study of word knowledge

FRASCATI classification

FORD classification

Confirmation required

Views history

Favourite