The development of the academic part of any language is an important indicator of its vitality. The paper presents the construction of a contemporary academic language resource for Slovene and provides a framework for further research based on it. The KAS prototype corpus contains texts harvested from the Open Science portal of Slovenia and contains about 50,000 scientific texts with over one billion tokens. Its compilation included the collection, filtering, cleaning and linguistic annotation of its texts, while KAS corpus research will give results in the fields of text classification, terminological tool and database development and in description of contemporary academic Slovene.
COBISS.SI-ID: 62530146
In the paper, the authors focus on the academic use of the passive voice, a verbal form that has often been declared inappropriate for Slovene. The analysis is limited to the passive participle and to six syntactic patterns defined in the Slovene grammar, and is performed on two corpora: the Kas corpus of academic texts and the Kres corpus, the general corpus of Slovene. The results of the analyses confirm that the passive voice is used in academic texts to a larger extent (primarily in the present tense), but also show that passive voice patterns are used with a relatively smaller set of lexical items than in Kres.
COBISS.SI-ID: 34453853
In this paper we present the development of the terminology extraction module for Slovene which was framed within the Sketch Engine corpus management system and motivated by the KAS research project on resources and tools for analysing academic Slovene. We describe the formalism used for defining the grammaticality of terms as well as the calculation of the score of individual terms, give an overview of the definition of the term grammar for Slovene and evaluate it on a Slovene KAS corpus of academic Slovene
COBISS.SI-ID: 62994018