P2-0209 — Final report
1.
Characterizing the RNA targets and position-dependent splicing regulation by TDP-43

Based on extensive bioinformatics analyses of iCLIP data, we found that TDP43 preferentially binds long clusters of UG-rich sequences and that MALAT1 and NEAT1 are the main targets in subjects with FTLD. We identified unusually long clusters of TDP43 binding at deep intronic positions downstream of silenced exons. A substantial proportion of alternative mRNA isoforms regulated by TDP43 encode proteins that regulate neuronal development or have been implicated in neurological diseases, highlighting the importance of TDP43 for the regulation of splicing in the brain.

COBISS.SI-ID: 8278100
2.
Learning qualitative models from numerical data

The paper describes Pade, a new method for qualitative learning which estimates partial derivatives of the target function from training data and uses them to induce qualitative models of the target function. We formulated three methods for computation of derivatives, all based on using linear regression on local neighbourhoods. The methods were empirically tested on artificial and realworld data. We also provide a case study which shows how the developed methods can be used in practice.

COBISS.SI-ID: 8324436
3.
ICLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution

We developed individualnucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) followed by highthroughput sequencing to study proteinRNA interactions. We developed algorithms and the iCount pipeline for mapping iCLIP sequence reads to the human genome, qualitycontrol filtering, removal of PCR duplicates and quantification of binding using random barcodes, generation of crosslink maps, identification of significant clusters of crosslinks and analysis of enriched pentanucleotides. We then studied the positioning of hnRNP C particles and their role in alternative splicing.

COBISS.SI-ID: 7800916
4.
An efficient explanation of individual classifications using game theory

We developed a general method for explaining individual classifications that is independent of the classifier model. Explanation detects the interactions among attributes that are used by the model for predictions. We showed the relation with the game theory that enables the efficient implementation for explanation which reliably searches otherwise exponential space in a linear time.

COBISS.SI-ID: 7543636
5.
Combining human analysis and machine data mining to obtain credible data relations

It is known that a decision-tree model can contain relations that are statistically significant, but, in reality, meaningless to a human. When the task is domain analysis, meaningless relations are problematic, since they can lead to wrong conclusions and can consequently undermine a human’s trust in DM programs. To eliminate problematic relations from the conclusions of analysis, we propose an interactive method called Human–Machine Data Mining (HMDM). The method constructs multiple models in a specific way so that a human can reexamine the relations in different contexts and, based on observed evidence, conclude which relations and models are credible—that is, both meaningful and of high quality. Based on the extracted credible relations and models, the human can construct correct overall conclusions about the domain. The method is demonstrated in two complex domains, extracting credible relations and models that indicate the segments of the higher education sector and the research and development sector that influence the economic welfare of a country. An experimental evaluation shows that the method is capable of finding important relations and models that are better in both meaning and quality than those constructed solely by the DM programs.

COBISS.SI-ID: 27888167