Data compression paradigm based on omitting self-evident information

Code

J2-4458 (A) - included in ARIS records

Head

PhD Borut Žalik

Period

11/1/2022 - 10/31/2025

Science

Engineering sciences and technologies (13)

Reseacher status

Researcher (12)
Junior expert or technical associate (1)

Education

Doctoral degree (10)
Other (3)

Sex

Woman (1)
Man (12)

Status

Employed at RO and RRD (12)
No data on employment in RO (1)

No. of publications

1–9 (1)
10–99 (4)
100–999 (8)

Projects / Programmes source: ARIS

Data compression paradigm based on omitting self-evident information

Research activity

Code	Science	Field	Subfield
2.07.00	Engineering sciences and technologies	Computer science and informatics

Code	Science	Field
1.02	Natural Sciences	Computer and information sciences

Keywords

data compression, feature, optimization, restoration algorithms, universal platform

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Points

6,179.85

A''

1,370.3

2,815.42

A1/2

4,385.77

CI10

3,553

CImax

224

h10

22.2

30.39

Data for the last 5 years (citations for the last 10 years) on October 15, 2025; Data for score A3 calculation refer to period 2020-2024

Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Database	Linked records	Citations	Pure citations	Average pure citations
WoS	213	2,927	2,563	12.03
Scopus	273	4,111	3,671	13.45

Organisations (1) , Researchers (13)

0796 University of Maribor, Faculty of Electrical Engineering and Computer Science

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	53590	PhD Jernej Cukjati	Computer science and informatics	Young researcher	2022 - 2023	6
2.	53755	Aljaž Jeromel	Computer science and informatics	Researcher	2022 - 2025	28
3.	37222	PhD Štefan Kohek	Computer science and informatics	Researcher	2022 - 2025	134
4.	16259	PhD Simon Kolmanič	Computer science and informatics	Researcher	2022 - 2025	211
5.	21318	PhD Bogdan Lipuš	Computer science and informatics	Researcher	2022 - 2025	58
6.	33709	PhD Niko Lukač	Computer science and informatics	Researcher	2022 - 2025	233
7.	29243	PhD Domen Mongus	Computer science and informatics	Researcher	2022 - 2025	297
8.	32690	Sašo Pečnik	Computer science and informatics	Researcher	2022 - 2025	24
9.	15671	PhD David Podgorelec	Computer science and informatics	Researcher	2022 - 2025	221
10.	08638	PhD Krista Rizman Žalik	Computer science and informatics	Researcher	2022 - 2025	192
11.	18726	PhD Damjan Strnad	Computer science and informatics	Researcher	2022 - 2025	262
12.	06671	PhD Borut Žalik	Computer science and informatics	Head	2022 - 2025	876
13.	31475	Denis Žganec	Computer science and informatics	Technical associate	2022 - 2025	19

Abstract

Data compression is one of the traditional disciplines of Computer Science, but one that has made no significant progress in recent decades. It has also failed to keep up with new scientific trends, where new devices collect ever-increasing amounts of highly heterogeneous data. These data are compressed using either domain-dependent or general-purpose methods. The latter are well-known lossless solutions from 30 years ago (e.g., RAR or ZIP). They achieve generality by handling the data stream on the level of bytes, ignoring potential higher-level relations in the data. Domain-dependent methods are lossy, near lossless, or lossless. Lossy methods operate by transforming the data into frequency space, performing the quantisation there, and encoding the remaining values in a lossless manner, whereby the lossless part is typically domain dependent as well. Near lossless and lossless methods are significantly different and typically prediction-based. However, the prediction is made from a narrow spatial and/or temporal context, which reduces its efficiency. Most methods are symmetric, which means that decoding is performed by the same pipeline as encoding, only in a reversed order. The disadvantage is that the time complexity of decoding is the same as that of encoding, which requires similar infrastructure for both the encoder and the decoder. Finally, each type of data requires a specific solution that is not transferable to other types of data (e.g. audio compression is completely different from compression of raster images). In the COMPROMISE project, we aim to develop a new data compression methodology which will be largely domain independent and asymmetric. By using a unified pipeline of procedures, the methodology will be suitable for lossy, near lossless, and lossless compression. Domain independence will be achieved by forming feature repertoires in different domains and linking those repertoires to a unified domain-independent taxonomy. In our case, a feature will be any piece of information with high discriminative or predictive value for human interpretation or machine processing (e.g., computer vision, classification) of a data stream. The obtained repertoire of features will be reduced through a domain-independent iterative optimisation process, as long as the set of remaining features will allow the restoration techniques to perform satisfactory reconstruction of the input data. The compression pipeline will be the same for lossy, lossless, and near lossless compression, except that the output in the latter two cases will include the residuals, obtained as the difference between the original and the restored data. The data decompression will be much simpler and will consist of features and residuals decoding, restoration of data from features, and applying residuals in cases of lossless or near lossless mode. This will set the requirements for the decoder substantially lower than those for the encoder. The concept of domain-independent features also allows the information about higher-level relations in the data to be preserved in the compressed form, which improves the reusability of data on different semantic levels. In order to demonstrate the universality and domain independence of the methodology we will use raster images, digital audio, biomedical signals, and sparse voxel grids in our study. These domains differ in both the data dimensionality and dynamism, while addressing two human perceptual systems – vision and hearing. The proposed domain independent methodology will be implemented with a unified platform, which will be used to demonstrate the efficiency and universality of the COMPROMISE methodology, to validate the key performance indicators, and to verify the scientific hypothesis. By using the methodology, we expect to achieve better lossless and near lossless compression ratios than existing domain-dependent methods, which will set the foundation for a new generation of data compression methods.

Data compression paradigm based on omitting self-evident information

Views history

Favourite

Data compression paradigm based on omitting self-evident information

FRASCATI classification

FORD classification

Confirmation required

Views history

Favourite