# Use cases

Following are described some use cases, examples illustrating how to run TRiCoLOR modules. To this purpose, a test set is provided in the TRiCoLOR/test folder. The folder contains BAM files simulated with VISOR (accuracy ~ 0.9, reads length ~ 8000 bps, substitutions:insertions:deletions rate ~ 45:25:30, coverage ~ 50X) for a trio (SON, PARENT1 and PARENT2). BAM files for SON and PARENT1 contain a heterozigous TR expansion. In particular, the normal TR is chr20:17553794-17553824 (15xAC), taken from the list of TRs availables for the GRCh38 human reference genome, whereas the expanded TR is 115XAC (expansion of 100 motifs).

# TRiCoLOR SENSoR

We can identify repetitive regions in SON using the SENSoR module. As described in the General usage section, SENSoR requires haplotype-resolved or haplotype-tagged BAM files. For instance, using the haplotype-tagged BAM generated by VISOR, SENSoR can be run as follows:

#from TRiCoLOR/test
TRiCoLOR SENSoR -bam son/sim.srt.bam -o sensor_son

This will produce a gzipped BED file in the output folder, containing repetitive regions identified in the initial haplotype-resolved BAM files. As we are interested on further profiling a single TR, which is known to occur at chr20:17553794-17553824, we can ignore other regions.

#from TRiCoLOR/test
zgrep -P "^chr20\t17553" sensor_son/TRiCoLOR.srt.bed.gz > sensor_son/TRiCoLOR.srt.fltrd.bed

# TRiCoLOR REFER

We can profile the TR included in the BED file previously generated using the REFER module. As described in the General usage section, in addition to the haplotype-tagged BAM file and the BED file we already generated, REFER requires a reference genome. The one used to simulate the BAM files included in in the TRiCoLOR/test folder is the GRCh38 human reference genome.

#from TRiCoLOR/test
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
samtools faidx GRCh38_full_analysis_set_plus_decoy_hla.fa

Then, REFER can be run as follows. As the normal TR size is 30 bp and by default TRiCoLOR only outputs TRs larger than 50 bp, we lower this treshold by setting the -s/--size parameter to 20.

#from TRiCoLOR/test
TRiCoLOR REFER -g GRCh38_full_analysis_set_plus_decoy_hla.fa -bam son/sim.srt.bam -bed sensor_son/TRiCoLOR.srt.fltrd.bed -o refer_son -s 20 --samplename SON

This will produce a bgzipped VCF file containing the variant TR.

#from TRiCoLOR/test
bcftools view refer_son/TRiCoLOR.srt.vcf.gz

As can be seen from the VCF file, the first haplotype contains a normal TR (H1M:AC; H1N:15) whereas the second haplotype an expanded TR (H2M:AC; H2N:117), which almost perfectly approximate the ground truth (H1M:AC; H1N:15; H2M:AC; H2N:115). Together with the VCF file, TRiCoLOR outputs in BED format the TRs found in the reference (sensor_refer/reference/TRiCoLOR.srt.bed.gz) and in the individuals' haplotypes (sensor_refer/haplotype1/TRiCoLOR.srt.bed.gz, sensor_refer/haplotype2/TRiCoLOR.srt.bed.gz) as well as the haplotype-resolved consensus alignments (sensor_refer/haplotype1/TRiCoLOR.srt.bam, sensor_refer/haplotype2/TRiCoLOR.srt.bam). These additional files can be used to interactively visualize the TRs identified into their sequence context.

# TRiCoLOR ApP

We can interactively visualize the TRs identified using the ApP module. As described in the General usage section, ApP requires, together with the reference genome, the BED and the consensus BAM files generated by the REFER module, a region specifying a TR of interest.

#from TRiCoLOR/test
TRiCoLOR ApP -g GRCh38_full_analysis_set_plus_decoy_hla.fa -bam refer_son/haplotype1/TRiCoLOR.srt.bam refer_son/haplotype2/TRiCoLOR.srt.bam -o app_son -gb refer_son/reference/TRiCoLOR.srt.bed.gz -h1b refer_son/haplotype1/TRiCoLOR.srt.bed.gz -h2b refer_son/haplotype2/TRiCoLOR.srt.bed.gz chr20:17553795-17553825

This will produce an HTML file in the output folder, that can be opened using the default web browser.

# TRiCoLOR SAGE

As described in the General usage section, TRiCoLOR also provides users with the SAGE module, which checks the mendelian consistency of the TRs identified in a child if haplotype-resolved or haplotype-tagged alignments for both the parents are available. SAGE can be run as follows.

#from TRiCoLOR/test
TRiCoLOR SAGE -vcf refer_son/TRiCoLOR.srt.vcf.gz -bam parent1/sim.srt.bam parent2/sim.srt.bam -o sage_trio --samplename PARENT1 PARENT2 --mendel

This will produce a bgzipped multi-sample VCF file containing with parents genotyped.

#from TRiCoLOR/test
bcftools view sage_trio/TRiCoLOR.srt.vcf.gz

As can be seen from the VCF file, PARENT1 contains an heterozigous TR expansions while PARENT2 is homozigous reference. The TR identified in the child is threrefore considered mendelian consistent (MENDEL:0).