When analyzing time to disease recurrence, we sometimes need to work with data where all the recurrences are recorded, but no information is available on the possible deaths. This may occur when studying diseases of benign nature where patients are only seen at disease recurrences or in poorly-designed registries of benign diseases or medical device implantations without sufficient patient identifiers to obtain their dead/alive status at a later date. When the average time to disease recurrence is long enough in comparison with the expected survival of the patients, statistical analysis of such data can be significantly biased. Under the assumption that the expected survival of an individual is not influenced by the disease itself, general population mortality tables may be used to remove this bias. We show why the intuitive solution of simply imputing the patient's expected survival time does not give unbiased estimates of the usual quantities of interest in survival analysis and further explain that cumulative incidence function analysis does not require additional assumptions on general population mortality. We provide an alternative framework that allows unbiased estimation and introduce two new approaches: an iterative imputation method and a mortality adjusted at risk function. Their properties are carefully studied, with the results supported by simulations and illustrated on a real-world example.
COBISS.SI-ID: 32255193
When building classifiers, it is natural to require that the classifier correctly estimates the event probability (Constraint 1), that it has equal sensitivity and specificity (Constraint 2) or that it has equal positive and negative predictive values (Constraint 3). We prove that in the balanced case, where there is equal proportion of events and non-events, any classifier that satisfies one of these constraints will always satisfy all. Such unbiasedness of events and non-events is much more difficult to achieve in the case of rare events, i.e. the situation in which the proportion of events is (much) smaller than 0.5. Here, we prove that it is impossible to meet all three constraints unless the classifier achieves perfect predictions. Any non-perfect classifier can only satisfy at most one constraint, and satisfying one constraint implies violating the other two constraints in a specific direction. Our results have implications for classifiers optimized using g-means or [Formula: see text]-measure, which tend to satisfy Constraints 2 and 1, respectively. Our results are derived from basic probability theory and illustrated with simulations based on some frequently used classifiers.
COBISS.SI-ID: 33010393
The relative survival field has seen a lot of development in the last decade, resulting in many different and even opposing suggestions on how to approach the analysis. We carefully define and explain the differences between the various measures of survival (overall survival, crude mortality, net survival and relative survival ratio) and study their differences using colon and prostate cancer data extracted from the national population-based cancer registry of Slovenia as well as simulated data. The colon and prostate cancer data demonstrate clearly that when analysing population-based data, it is useful to split the overall mortality in crude probabilities of dying from cancer and from other causes. Complemented by net survival, it provides a complete picture of cancer survival in a given population. But when comparisons of different populations as defined for example by place or time are of interest, our simulated data demonstrate that net survival is the only measure to be used. The choice of the method should be done in two steps: first, one should determine the measure of interest and second, one should choose among the methods that estimate that measure consistently.
COBISS.SI-ID: 33117913