Lemmatisation is one of the basic language technology components. In this paper we present a supervised machine learning method that learns lemmatisation models from morphological lexica. We show its advantages over previously developed methods.
COBISS.SI-ID: 21593383
Morphosyntactic tagging is one of the basic language technology components. In this paper we introduce a method that enables increasing the accuracy of morphosyntactic tagging by combining the outputs of multiple taggers.
COBISS.SI-ID: 22416423
The paper introduces the first version of the jos100k corpus, linguistically annotated only on the morphosyntactic level.
COBISS.SI-ID: 21930023
Developing semantic lexica is a time consuming and expensive task. The paper developes a method where open source language resources, such as Wikipedia, can be used to automatically extend language wordnets.
COBISS.SI-ID: 40118626
The paper discusses the process and results of manual semantic annotation of the jos100k corpus.
COBISS.SI-ID: 42066018