Embeddings-based techniques for Media Monitoring Applications (EMMA)

Code

L2-50070 (B) - included in ARIS records

Head

PhD Nada Lavrač

Period

10/1/2023 - 9/30/2026

Science

Engineering sciences and technologies (7)
Humanities (8)
Other (1)

Reseacher status

Researcher (15)
Junior expert or technical associate (1)

Education

Doctoral degree (7)
Doctorate/foreign document (1)
Other (8)

Sex

Woman (4)
Man (12)

Status

Employed at RO and RRD (16)

No. of publications

0 (1)
10–99 (8)
100–999 (7)

Projects / Programmes source: ARIS

Embeddings-based techniques for Media Monitoring Applications (EMMA)

Research activity

Code	Science	Field	Subfield
2.07.00	Engineering sciences and technologies	Computer science and informatics

Code	Science	Field
1.02	Natural Sciences	Computer and information sciences

Keywords

machine learning, text mining, natural language processing, deep neural networks, document representation, language models, embeddings, media monitoring

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Points

8,196.8

A''

1,813.79

2,803.29

A1/2

3,713.57

CI10

8,375

CImax

2,153

h10

27.16

6.37

Data for the last 5 years (citations for the last 10 years) on October 15, 2025; Data for score A3 calculation refer to period 2020-2024

Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Database	Linked records	Citations	Pure citations	Average pure citations
WoS	274	6,424	5,972	21.8
Scopus	398	10,266	9,341	23.47

Organisations (2) , Researchers (16)

0106 Jožef Stefan Institute

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	59671	Jaya Caporusso	Linguistics	Young researcher	2025	16
2.	58623	Nikola Ivačič	Linguistics	Researcher	2024 - 2025	0
3.	57800	Boshko Koloski	Computer science and informatics	Young researcher	2025	66
4.	55962	Taja Kuzman	Linguistics	Researcher	2023 - 2025	113
5.	08949	PhD Nada Lavrač	Computer science and informatics	Head	2023 - 2025	893
6.	36871	PhD Nikola Ljubešić	Linguistics	Researcher	2023 - 2025	470
7.	50070	PhD Matej Martinc	Linguistics	Researcher	2024 - 2025	97
8.	29539	PhD Vid Podpečan	Computer science and informatics	Researcher	2023 - 2025	114
9.	31844	PhD Senja Pollak	Linguistics	Researcher	2023 - 2025	338
10.	56524	Marko Pranjić	Linguistics	Researcher	2023 - 2025	28
11.	53851	Matthew RJ Purver, Ph.D.	Linguistics	Researcher	2023 - 2025	126
12.	56348	Peter Rupnik		Technical associate	2023 - 2025	93

1539 University of Ljubljana, Faculty of Computer and Information Science

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	55754	Matej Klemen	Computer science and informatics	Young researcher	2023	20
2.	15295	PhD Marko Robnik Šikonja	Computer science and informatics	Researcher	2023 - 2025	473
3.	50769	PhD Tadej Škvorc	Computer science and informatics	Researcher	2023 - 2025	18
4.	56007	Aleš Žagar	Computer science and informatics	Researcher	2023 - 2025	35

Abstract

In machine learning, the analysis of big data is still a great challenge. Term big data refers data, characterised by its large volume, velocity, veracity, and variety. The proposed project tackles the challenge of the language variety and velocity (dynamics) of media contents, which we address by using advanced text representation methods (embeddings) and deep learning. The increasing amounts of media content include a spectrum from traditional high-quality news to less-reliable social media content. Media monitoring and analysis need to be performed in real-time: grouping articles by their content, adding several categories of meta-information, summarizing several news sources, performing analyses, and reporting. Clipping agencies, such as Slovenian agency Kliping d.o.o., which will co-finance this industrial project, therefore, face a challenging problem, especially as many analytical tasks have to be performed manually, especially in less-resourced languages where many tools are non-existent or do not return results of sufficient quality. Kliping monitors over 70,000 traditional articles and over 1 million social media posts per day, resulting in more than 1,500 daily reports for their respective target users, covering the Slovenian as well as Western Balkans media space and thus including text in six different languages (Slovenian, Croatian, Bosnian, Serbian, Macedonian and Albanian) and two alphabets (Latin and Cyrillic). Recent machine learning techniques for advanced Natural Language Processing, which are based on text embeddings and large pretrained language models, enable the development of advanced text processing tools for text analysis, such as text categorisation in terms of their topics or sentiment, and text summarisation from multiple sources. However, even the best of these tools have to be adapted and improved to cope with the specific user needs, the complexity of news category hierarchies, metadata structures used in the news industry, and coverage of multiple languages. To this end, this project aims to develop advanced multilingual news and social media content analysis tools to help automate text analysis processes while increasing society’s ability to understand the rapid flow of information surrounding us.

Embeddings-based techniques for Media Monitoring Applications (EMMA)

Views history

Favourite

Embeddings-based techniques for Media Monitoring Applications (EMMA)

FRASCATI classification

FORD classification

Confirmation required

Views history

Favourite