Loading...
Projects / Programmes source: ARIS

Developing Advanced Data Mining and Subgroup Analysis Techniques for Bibliometric Research: The Biblium Python Package and Orange Add-on Orangebib

Research activity

Code Science Field Subfield
5.13.00  Social sciences  Information science and librarianship   

Code Science Field
5.08  Social Sciences  Media and communications 
Keywords
bibliometric analysis, data mining, machine learning, subgrop discovery, open-source software, Python, Orange
Evaluation (metodology)
source: COBISS
Points
6,093.77
A''
755.87
A'
2,465.21
A1/2
2,680.46
CI10
5,802
CImax
1,851
h10
31
A1
20.17
A3
1.2
Data for the last 5 years (citations for the last 10 years) on October 15, 2025; Data for score A3 calculation refer to period 2020-2024
Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )
Database Linked records Citations Pure citations Average pure citations
WoS  108  3,451  3,306  30.61 
Scopus  142  5,056  4,803  33.82 
Organisations (1) , Researchers (10)
0590  University of Ljubljana, Faculty of Public Administration
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  18942  PhD Aleksander Aristovnik  Economics  Researcher  2023 - 2025  990 
2.  32753  PhD Nejc Brezovar  Administrative and organisational sciences  Researcher  2025  128 
3.  56256  Kaja Godec    Technical associate  2023 
4.  11373  PhD Dimitar Hristovski  Computer science and informatics  Researcher  2023 - 2025  154 
5.  60260  Maša Lemajić    Technical associate  2025 
6.  58555  Suzana Mišić    Technical associate  2023 - 2024 
7.  38162  PhD Dejan Ravšelj  Economics  Researcher  2023 - 2025  205 
8.  32754  PhD Dalibor Stanimirović  Systems and cybernetics  Researcher  2024 - 2025  214 
9.  28519  PhD Lan Umek  Administrative and organisational sciences  Head  2023 - 2025  239 
10.  54755  Petra Vujković  Economics  Young researcher  2023 - 2024  21 
Abstract
Bibliometric analysis has become increasingly important in recent years as a means of evaluating and analyzing the scientific literature. As the proportion of bibliometric documents in total scientific output increases dramatically, there is a need to use more advanced statistical methods, especially those related to data mining and subgroup analysis, to improve bibliometric analysis. Although subgroups occur naturally in bibliographic data (temporal dimension, geographic scope, topic, etc.), their evaluation and analysis has rarely been performed. In this project, we will present concrete examples of data mining methods that could be integrated into bibliometrics, especially in terms of prediction (classification and regression) and subgroup analysis. To address this gap, we will be the first to implement two subgroup discovery approaches in bibliometrics. Both algorithms aim to discover subgroups of bibliographic documents that reflect significant relationships between two aspects, such as keywords and authors. The first algorithm combines a partitioning clustering approach with contingency table analysis and extracts subgroups of documents that reflect significant relationships between the analyzed aspects. The second algorithm will combine a hierarchical clustering approach and statistical classification techniques (such as logistic regression, support vector machines, neural networks, etc.) to extract subgroups that are similar with respect to one analyzed aspect and can be reliably separated from the rest of the documents by the second analyzed aspect. As part of the project, we will implement basic and advanced bibliometric techniques in a Python package called Biblium. Biblium will be the most comprehensive Python package for bibliometric analysis, as it will integrate all the procedures from the R package Bibliometrix along with more sophisticated methods for analyzing bibliographic data, including data mining methods and subgroup analysis. In addition, we will perform the bibliometric analysis itself and implement several state-of-the-art approaches and visualizations that are implemented in different programs but are not under one umbrella. In the final phase of the project, we will integrate Biblium with the open source data mining software Orange as its add-on Orangebib. This integration will combine bibliometric analysis with data mining methods in a user-friendly software that does not require programming skills to use. Together with existing Orange add-ons (bioinformatics, advanced text mining, geomaps, etc.), Orange users will be able to find new, creative ways to combine different aspects of bibliographic data and make an important contribution to the field of bibliometrics. We plan to apply data mining and subgroup discovery techniques to several areas, including applications in the natural sciences (medicine, drug repurposing, genetics, etc.) and the social sciences (public administration, online learning, taxation, artificial intelligence, and disruptive technologies in the public sector, etc.). We intend to publish several papers as results of the project, including software and methodology papers in leading journals of scientometrics and data mining, as well as application of the developed and implemented tools in several journals of natural and social sciences. We also plan to participate in several (inter)national conferences in the field of scientometrics, presenting Biblium and Orangebib. As a final deliverable, we plan to organize a free one-day online workshop where users will learn how to use Orangebib to easily perform advanced bibliometric analyzes.
Views history
Favourite