The Sequencing core provides the first steps of data analysis as part of the sequencing service. These deliverables include:
- Image Analysis: raw tif files are processed for cluster position, intensity, and noise. These data are used for base calling.
- Base Calling: cluster intensities are used to output the sequence of bases from each cluster. A quality score is given for each base. Data will be delivered in standard Sanger FASTQ format.
- QC: data is reviewed for cluster density, error rate, % clusters passing filtering, intensity over time, and sequence composition.
- Demultiplexing: if you chose to multiplex samples using Illumina indexing, the barcode information will be used to extract the individual sample data.
Further Data Analysis
Following the delivery of data generated by Illumina sequencing and the above primary analysis, the data require extensive secondary and tertiary analyses for interpretation. The exact pipeline varies depending on experiment type. We highly recommend that you develop collaborations with bioinformatics experts so that your data can be optimized for your research goals.
Some available tools and resources for data alignment are listed below. This list is not exhaustive, but is provided as a starting point for further investigation. If you run across any particularly useful tools that are not on this list, let us know!
http://www.cbcb.umd.edu/software/ (Tuxedo suite)