As described in Spencer & Zeller, et al. (2011) Genome Research.
As described in Spencer & Zeller, et al. (2011) Genome Research.
![]() |
Sample production
Nematode culture
C. elegans strains were maintained as described (Brenner 1974). We used N2 as the wild-type strain. Other strains used in this study are listed in Supplemental Table S1.
Construction of cell-specific 3XFLAG::PAB-1 plasmids
To express 3XFLAG-tagged PAB-1 in specific cell types, promoters were amplified and cloned into the pSV41(Pgateway::3XFLAG::PAB-1 + unc-119 minigene) plasmid using the Gateway cloning system (Invitrogen). Transgenics were obtained by microparticle bombardment or by microinjection (see Supplemental Protocols SP1–SP4).
Isolation of cell-specific RNA by the mRNA-tagging method
Cell-specific RNA was isolated from transgenics expressing 3XFLAG-tagged PAB-1 using the mRNA-tagging strategy (Roy et al. 2002) described by Von Stetina et al. (2007; see Supplemental Protocol SP5).
Preparation and primary cell culture of embryonic cells and isolation of fluorescently labeled embryonic cells by FACS
Embryonic cells were isolated by FACS as previously described (Christensen et al. 2002; Fox et al. 2005, 2007). Cell types were sorted to a fractional purity ranging from 80%–97% (Supplemental Table S1). RNA was extracted from sorted cells in TRIzol LS, treated with DNase I, and purified using the DNA-free RNA kit from Zymo Research (see Supplemental Protocols SP6–SP8).
RNA amplification and microarray hybridization
RNA from sorted cells and mRNA-tagging lines was amplified and labeled using the WT-Ovation Pico, WT-Ovation Exon, and Encore Biotin kits from NuGEN for application to C. elegans tiling arrays (Affymetrix). Pearson correlation coefficients between replicates were determined to confirm consistent microarray data quality (see Supplemental Protocols SP9, SP10).
Computational analyses of tiling array data
Array annotation
Tiling array features were mapped to the C. elegans genome and WormBase gene annotation (Rogers et al. 2008). Additionally repetitive tiling probes were flagged (see Supplemental Protocol SP14). Based on annotated protein-coding gene models, tiling probes were annotated into exonic, intronic, intergenic, and ambiguous categories.
Normalization and transcript identification
Raw tiling array data were normalized to correct for uneven background (Borevitz et al. 2003; Zeller et al. 2009), between-array variability with quantile normalization (Bolstad et al. 2003), and probe-sequence effects with transcript normalization (Zeller et al. 2008). We evaluated the extent to which normalizing for probe sequence effects improved subsequent transcript recognition in comparison to DNA reference normalization (Huber et al. 2006) on the basis of the above probe annotation (Fig. 2A). In this context, we defined sensitivity as the percentage of tiling probes with signal above a cutoff (true-positives, TP) among all annotated exon probes and undetected ones (false-negatives, FN): Sn = TP/(TP + FN). Precision was defined as the percentage of annotated exon probes (TP) among those with signal above the cutoff (including true- and false-positives, FP): Pr = TP/(TP + FP). Varying the threshold parameter across the whole range of measured array intensities resulted in curves showing different trade-offs between precision and sensitivity (Fig. 2A; see Supplemental Protocol SP15).
For de novo identification of TARs from tiling array data, we employed mSTAD (margin-based segmentation of tiling array data), a machine-learning–based method (Laubinger et al. 2008; Zeller et al. 2008). Its internal parameters were trained on hybridization patterns and tiling probe annotations in regions around experimentally confirmed genes. Genome-wide TAR predictions were generated in a two-fold cross-validation scheme. Cross-validation accuracy was assessed with respect to annotated genes confirmed by full-length cDNA sequences (Fig. 2B,C) as well as to the modENCODE integrated transcript model (Supplemental Fig. S6; Supplemental Protocols SP16, SP17; Gerstein et al. 2010). Additionally, accuracy was compared to modMine TARs (Supplemental Fig. S6; Supplemental Protocol SP17; Supplemental Result SR4).
Identification of new transcripts
“Unannotated” TARs were identified in comparison to coding and noncoding genes and pseudogenes annotated in WormBase as those with <20 nt overlap to annotated exons. If additionally a given TAR did not overlap by ≥20 nt with exons of the integrated transcript model, we called it a novel transcript (Fig. 2D–F; see Supplemental Protocol SP18). Taking the per-nucleotide union of TARs obtained in individual samples, we obtained nonredundant (nr) TARS (analogously for expressed nrTARs, differentially expressed nrTARs, unannotated nrTARs, and novel nrTARs) (Fig. 2F; see Supplemental Protocol SP18). For each position within expressed nrTARs, we counted the number of samples in which a TAR was detected to generate histograms of sample specificity (Fig. 4C; Supplemental Fig. S17; see Supplemental Protocol SP18).
Detection of expressed transcripts and significant expression differences
For each annotated protein-coding gene and predicted TAR, we constructed a probe set for expression summarization (see Supplemental Protocol SP19). Subsequently, transcript expression was estimated using a customized RMA pipeline (see Supplemental Protocol SP19; Bolstad et al. 2003; Irizarry et al. 2003; Gautier et al. 2004). A Mann-Whitney U test with an empirical background model and FDR correction for multiple testing was used to detect expressed transcripts (Benjamini and Hochberg 1995). Genes and TARs with an FDR ≤0.05 were reported as expressed above background (see Table 2; Supplemental Fig. S9A; see also Supplemental Protocol SP20). We detected differentially expressed transcripts using a method based on linear models (Smyth 2004). Genes and TARs were called differentially expressed if the FDR was ≤0.05 and the fold change (FC) ≥2.0 (Table 2; Fig. 4A; Supplemental Figs. S9B,C, S12; see Supplemental Protocol SP21). To more strictly correct for potential false-positives resulting from multiple sample comparisons, we divided individual FDR estimates by the number of samples or sample comparisons, respectively. This resulted in an adjusted FDR of 1.3 × 10−4 for expression above background and of 7.4 × 10−4 for differential expression (Table 2; Supplemental Fig. S12; Supplemental Protocol SP22). We called genes “selectively enriched” in a given tissue (see Results) if they met the following requirements: (1) enriched expression in a given tissue (FDR ≤0.05 and FC ≥2.0), (2) fold change versus reference among the upper 40% of the positive FC range observed for this gene across all tissues, and (3) fold-change entropy among the lower 40% of the distribution observed for all genes (see Supplemental Protocol SP23; Schug et al. 2005).