Autors of the article had analysed the structure of imprecise Markov chains and study their convergence by means of accessibility relations. They first identified the sets of states, so-called minimal permanent classes, that are the minimal sets capable of containing and preserving the whole probability mass of the chain. These classes generalise the essential classes known from the classical theory. Then they defined a class of extremal imprecise invariant distributions and shown that they are uniquely determined by the values of the upper probability on minimal permanent classes. Moreover, they gave conditions for unique convergence to these extremal invariant distributions.
COBISS.SI-ID: 32156509
High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data-mining techniques, both in terms of effectiveness and efficiency. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper we take a novel perspective on the problem of clustering high-dimensional data. Instead of attempting to avoid the curse of dimensionality by observing a lower-dimensional feature subspace, we embrace dimensionality by taking advantage of some inherently high-dimensional phenomena. More specifically, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest neighbor lists of other points, can be successfully exploited in clustering. We validate our hypothesis by proposing several hubness-based clustering algorithms and testing them on high-dimensional data. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise.
COBISS.SI-ID: 26713639
National Readership Surveys (NRS) are among the largest syndicated commercial surveys. These continuous, probability-based surveys are aiming to provide reliable readership estimates for national newspapers and magazines. Serving as a currency to buy and sell advertising space in print media they are very important for the media and advertising industry. The Slovenian National Readership survey has a two-stage probability sampling design and data is collected with computer assisted personal interviewing performed by 10 full-time employed interviewers and 15 students working part-time. Response rate being between 30 and 40%, the net sample is about 6,500 per year. From 2007 to 2010 the Slovenian NRS survey was supervised by the Centre for Social Informatics at the Faculty of Social Sciences. Each week a control questionnaire was mailed to 20% of respondents asking them about the interviewer visit, if they were personally interviewed and how they would assess the interviewer behaviour. To control survey answers also some questions from the survey were included: how often they use internet, to which of the seven major Slovenian newspapers is their household subscribed and what are their gender and age. About half responded within two weeks while others were called by phone and those not contacted after five trials were sent another letter. Among the respondents that did not respond to the second letter, we randomly selected 10% to be visited by a field interviewer. In total, 90% of the control sample cooperated and final control results show only 3 to 4% of respondents were not really interviewed or were interviewed by phone. Analysis by interviewers indicates one of them had significantly more anomalies than others. Based on the analysis of control variables the accordance with the survey database is very high (98%) for gender and age but not so complete for internet use and newspaper subscription. Although the rate of interviews not carried out properly and non-matching characteristics is not very high, more resources could be invested in fieldwork efforts to prevent interviewers from omitting cases.
COBISS.SI-ID: 31855709