2. Lab Practicals
2. Data analysis – Dry Lab
The Data from Illumina sequencing was then transferred to the Uppmax server. Each of us made an Uppmax account and used to access this data on the server.
In order to get an idea of the data analysis pipeline, on the first day of the exercise, we were given sample data to practice the use of various tools and get used to the command line usage.
On the second day, we got our own sample data from the Illumina sequencing run. The Illumina converts the image files and provides them in the form of Fastq files. We then followed the general flow of the data analysis from the practice exercise and the tutorial provided.
We started by checking the quality of the data using FastQC tools available on uppmax. We then used BWA tool to map the reads to the reference genome. This step generates human readable ‘SAM’ format.
This was then converted to ‘BAM’ i.e. binary format using Picard tools. This reduce the size as well as increase the processing efficiency. Further, most of the tools used downstream are compatible with the binary format of input files.
BAM files were then used for variant calling using GATK tool. The GATK looks for the difference in the nucleotide present in the mapped reads as compared to the reference. These were stored in a file format call ‘vcf’ file or the variant calling file. GATK was instructed to look for SNP as well as Indels (Insertions and Deletions) using different commands.
Once this was done, the reads were recalibrated and another round of variant calling was performed. The recalibration improves the alignment near the regions of Indels and improves the chances of detection of variants more efficiently and accurately.
Finally, this file was used to visualise the regions having Indels in a graphical manner. IGV tool was used for visualisation. This tool enabled us to look navigate to a specific gene (LYS2, LYS5 and LYS14 genes in our case) and look for the mutations i.e. variants.
We then used the loci information from the reference to look for information i the SNP databases for yeast and other similar resources.
Finally, on comparing the information obtained from IGV analysis and complementation test we were able to confirm the mutation was in LYS5 gene (for our group named Cocolocos).