Loading...
Projects / Programmes source: ARIS

Embeddings-based techniques for Media Monitoring Applications (EMMA)

Research activity

Code Science Field Subfield
2.07.00  Engineering sciences and technologies  Computer science and informatics   

Code Science Field
1.02  Natural Sciences  Computer and information sciences 
Keywords
machine learning, text mining, natural language processing, deep neural networks, document representation, language models, embeddings, media monitoring
Evaluation (metodology)
source: COBISS
Points
8,196.8
A''
1,813.79
A'
2,803.29
A1/2
3,713.57
CI10
8,375
CImax
2,153
h10
36
A1
27.16
A3
6.37
Data for the last 5 years (citations for the last 10 years) on October 15, 2025; Data for score A3 calculation refer to period 2020-2024
Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )
Database Linked records Citations Pure citations Average pure citations
WoS  274  6,424  5,972  21.8 
Scopus  398  10,266  9,341  23.47 
Organisations (2) , Researchers (16)
0106  Jožef Stefan Institute
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  59671  Jaya Caporusso  Linguistics  Young researcher  2025  16 
2.  58623  Nikola Ivačič  Linguistics  Researcher  2024 - 2025 
3.  57800  Boshko Koloski  Computer science and informatics  Young researcher  2025  66 
4.  55962  Taja Kuzman  Linguistics  Researcher  2023 - 2025  113 
5.  08949  PhD Nada Lavrač  Computer science and informatics  Head  2023 - 2025  893 
6.  36871  PhD Nikola Ljubešić  Linguistics  Researcher  2023 - 2025  470 
7.  50070  PhD Matej Martinc  Linguistics  Researcher  2024 - 2025  97 
8.  29539  PhD Vid Podpečan  Computer science and informatics  Researcher  2023 - 2025  114 
9.  31844  PhD Senja Pollak  Linguistics  Researcher  2023 - 2025  338 
10.  56524  Marko Pranjić  Linguistics  Researcher  2023 - 2025  28 
11.  53851  Matthew RJ Purver, Ph.D.  Linguistics  Researcher  2023 - 2025  126 
12.  56348  Peter Rupnik    Technical associate  2023 - 2025  93 
1539  University of Ljubljana, Faculty of Computer and Information Science
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  55754  Matej Klemen  Computer science and informatics  Young researcher  2023  20 
2.  15295  PhD Marko Robnik Šikonja  Computer science and informatics  Researcher  2023 - 2025  473 
3.  50769  PhD Tadej Škvorc  Computer science and informatics  Researcher  2023 - 2025  18 
4.  56007  Aleš Žagar  Computer science and informatics  Researcher  2023 - 2025  35 
Abstract
In machine learning, the analysis of big data is still a great challenge. Term big data refers data, characterised by its large volume, velocity, veracity, and variety. The proposed project tackles the challenge of the language variety and velocity (dynamics) of media contents, which we address by using advanced text representation methods (embeddings) and deep learning. The increasing amounts of media content include a spectrum from traditional high-quality news to less-reliable social media content. Media monitoring and analysis need to be performed in real-time: grouping articles by their content, adding several categories of meta-information, summarizing several news sources, performing analyses, and reporting. Clipping agencies, such as Slovenian agency Kliping d.o.o., which will co-finance this industrial project, therefore, face a challenging problem, especially as many analytical tasks have to be performed manually, especially in less-resourced languages where many tools are non-existent or do not return results of sufficient quality. Kliping monitors over 70,000 traditional articles and over 1 million social media posts per day, resulting in more than 1,500 daily reports for their respective target users, covering the Slovenian as well as Western Balkans media space and thus including text in six different languages (Slovenian, Croatian, Bosnian, Serbian, Macedonian and Albanian) and two alphabets (Latin and Cyrillic). Recent machine learning techniques for advanced Natural Language Processing, which are based on text embeddings and large pretrained language models, enable the development of advanced text processing tools for text analysis, such as text categorisation in terms of their topics or sentiment, and text summarisation from multiple sources. However, even the best of these tools have to be adapted and improved to cope with the specific user needs, the complexity of news category hierarchies, metadata structures used in the news industry, and coverage of multiple languages. To this end, this project aims to develop advanced multilingual news and social media content analysis tools to help automate text analysis processes while increasing society’s ability to understand the rapid flow of information surrounding us.
Views history
Favourite