Developing Advanced Data Mining and Subgroup Analysis Techniques for Bibliometric Research: The Biblium Python Package and Orange Add-on Orangebib

Code

J5-50183 (A) - included in ARIS records

Head

PhD Lan Umek

Period

10/1/2023 - 9/30/2026

Science

Engineering sciences and technologies (2)
Social sciences (5)
Other (3)

Reseacher status

Researcher (7)
Junior expert or technical associate (3)

Education

Doctoral degree (6)
Other (4)

Sex

Woman (4)
Man (6)

Status

Employed at RO and RRD (8)
No data on employment in RO (2)

No. of publications

0 (3)
10–99 (1)
100–999 (6)

Projects / Programmes source: ARIS

Developing Advanced Data Mining and Subgroup Analysis Techniques for Bibliometric Research: The Biblium Python Package and Orange Add-on Orangebib

Research activity

Code	Science	Field	Subfield
5.13.00	Social sciences	Information science and librarianship

Code	Science	Field
5.08	Social Sciences	Media and communications

Keywords

bibliometric analysis, data mining, machine learning, subgrop discovery, open-source software, Python, Orange

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Points

6,093.77

A''

755.87

2,465.21

A1/2

2,680.46

CI10

5,802

CImax

1,851

h10

20.17

1.2

Data for the last 5 years (citations for the last 10 years) on October 15, 2025; Data for score A3 calculation refer to period 2020-2024

Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Database	Linked records	Citations	Pure citations	Average pure citations
WoS	108	3,451	3,306	30.61
Scopus	142	5,056	4,803	33.82

Organisations (1) , Researchers (10)

0590 University of Ljubljana, Faculty of Public Administration

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	18942	PhD Aleksander Aristovnik	Economics	Researcher	2023 - 2025	990
2.	32753	PhD Nejc Brezovar	Administrative and organisational sciences	Researcher	2025	128
3.	56256	Kaja Godec		Technical associate	2023	0
4.	11373	PhD Dimitar Hristovski	Computer science and informatics	Researcher	2023 - 2025	154
5.	60260	Maša Lemajić		Technical associate	2025	0
6.	58555	Suzana Mišić		Technical associate	2023 - 2024	0
7.	38162	PhD Dejan Ravšelj	Economics	Researcher	2023 - 2025	205
8.	32754	PhD Dalibor Stanimirović	Systems and cybernetics	Researcher	2024 - 2025	214
9.	28519	PhD Lan Umek	Administrative and organisational sciences	Head	2023 - 2025	239
10.	54755	Petra Vujković	Economics	Young researcher	2023 - 2024	21

Abstract

Bibliometric analysis has become increasingly important in recent years as a means of evaluating and analyzing the scientific literature. As the proportion of bibliometric documents in total scientific output increases dramatically, there is a need to use more advanced statistical methods, especially those related to data mining and subgroup analysis, to improve bibliometric analysis. Although subgroups occur naturally in bibliographic data (temporal dimension, geographic scope, topic, etc.), their evaluation and analysis has rarely been performed. In this project, we will present concrete examples of data mining methods that could be integrated into bibliometrics, especially in terms of prediction (classification and regression) and subgroup analysis. To address this gap, we will be the first to implement two subgroup discovery approaches in bibliometrics. Both algorithms aim to discover subgroups of bibliographic documents that reflect significant relationships between two aspects, such as keywords and authors. The first algorithm combines a partitioning clustering approach with contingency table analysis and extracts subgroups of documents that reflect significant relationships between the analyzed aspects. The second algorithm will combine a hierarchical clustering approach and statistical classification techniques (such as logistic regression, support vector machines, neural networks, etc.) to extract subgroups that are similar with respect to one analyzed aspect and can be reliably separated from the rest of the documents by the second analyzed aspect. As part of the project, we will implement basic and advanced bibliometric techniques in a Python package called Biblium. Biblium will be the most comprehensive Python package for bibliometric analysis, as it will integrate all the procedures from the R package Bibliometrix along with more sophisticated methods for analyzing bibliographic data, including data mining methods and subgroup analysis. In addition, we will perform the bibliometric analysis itself and implement several state-of-the-art approaches and visualizations that are implemented in different programs but are not under one umbrella. In the final phase of the project, we will integrate Biblium with the open source data mining software Orange as its add-on Orangebib. This integration will combine bibliometric analysis with data mining methods in a user-friendly software that does not require programming skills to use. Together with existing Orange add-ons (bioinformatics, advanced text mining, geomaps, etc.), Orange users will be able to find new, creative ways to combine different aspects of bibliographic data and make an important contribution to the field of bibliometrics. We plan to apply data mining and subgroup discovery techniques to several areas, including applications in the natural sciences (medicine, drug repurposing, genetics, etc.) and the social sciences (public administration, online learning, taxation, artificial intelligence, and disruptive technologies in the public sector, etc.). We intend to publish several papers as results of the project, including software and methodology papers in leading journals of scientometrics and data mining, as well as application of the developed and implemented tools in several journals of natural and social sciences. We also plan to participate in several (inter)national conferences in the field of scientometrics, presenting Biblium and Orangebib. As a final deliverable, we plan to organize a free one-day online workshop where users will learn how to use Orangebib to easily perform advanced bibliometric analyzes.

Developing Advanced Data Mining and Subgroup Analysis Techniques for Bibliometric Research: The Biblium Python Package and Orange Add-on Orangebib

Views history

Favourite

Developing Advanced Data Mining and Subgroup Analysis Techniques for Bibliometric Research: The Biblium Python Package and Orange Add-on Orangebib

FRASCATI classification

FORD classification

Confirmation required

Views history

Favourite