J2-9180 — Final report
1.
Machine learning of lemmatisers

Lemmatisation is one of the basic language technology components. In this paper we present a supervised machine learning method that learns lemmatisation models from morphological lexica. We show its advantages over previously developed methods.

COBISS.SI-ID: 21593383
2.
Morphosyntactic tagging of Slovene with a meta-tagger

Morphosyntactic tagging is one of the basic language technology components. In this paper we introduce a method that enables increasing the accuracy of morphosyntactic tagging by combining the outputs of multiple taggers.

COBISS.SI-ID: 22416423
3.
Morphosytanctically tagged corpus jos100k

The paper introduces the first version of the jos100k corpus, linguistically annotated only on the morphosyntactic level.

COBISS.SI-ID: 21930023
4.
Automating the creation of the Slovene semantic lexicon

Developing semantic lexica is a time consuming and expensive task. The paper developes a method where open source language resources, such as Wikipedia, can be used to automatically extend language wordnets.

COBISS.SI-ID: 40118626
5.
Semantic annotation of Slovene

The paper discusses the process and results of manual semantic annotation of the jos100k corpus.

COBISS.SI-ID: 42066018