This is a preprint submitted to Nucleic Acids Research


Through ‘combinatorial splicing’, RNA metabolism may create enormous structural diversity in the proteome. Functional interactions among multiple alternative domains can have a disproportionate impact on the phenotype, requiring integrated RNA-level regulation of molecular composition. Splicing correlations within molecules expressed from a single gene, where these effects would be greatest, provide valuable clues to functional relationships and targets for splicing regulation. We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in ‘clock plots’ and linkage grids. Higher-order correlations are assessed via a loglinear model and Monte Carlo analysis with an empirical Bayes estimate of unobserved probabilities. log-linear coefficients are visualized in a ‘spliceprint,’ a signature of splice correlations in the transcriptome. We present two novel metrics: the developmental linkage index, which measures the directional change in pair-wise correlation with tissue differentiation, and the accuracy index, a very simple goodness-of-fit metric that is more sensitive than the integrated squared error, applied to sparsely populated tables, and does not diverge at low variance, unlike chi-square. Considerable attention is given to sparse contingency tables, which are characteristic of single gene libraries, but the methods apply to transcriptome analysis in general.


Bioinformatics | Computational Biology