HomozgosityMapper pedigree

HomozygosityMapper

Technical documentation

 
database schema
HomozygosityMapper - database schema
Click image to enlarge.

HomozygosityMapper uses a very simple, strictly relational database schema. Besides a group of administrative tables (green), the database is split into two distinct parts: marker data (light blue) and project data. Each project is stored in its own tables reducing table sizes and hence access times; this also facilitates the deletion of single projects. To speed up data input, the relations within a project are not connected by foreign key constraints - since all data processing is done by the application, referential integrity can be guaranteed without explicit foreign keys.
The 'markers' table is shared with GeneDistiller and MutationTaster.

data analysis

In a first step, conducted directly after data upload, HomozygosityMapper detects homozygous stretches in all samples. Pseudo-code example:

for sample (samples){	
   for chromosome (chromosome){
      get genotypes from DB ordered by position (chromosome, sample)
         as array (index: pos, value: genotype)
      while (pos <= number of elements (genotypes)) {
         if (genotypes[pos]=='heterozygous'){
            blocklength[pos]=0;
            pos++;
         }
         else {
            pos2=pos;
            while (! DetectBlockEnd(pos,pos2) && pos2 <= number of elements (genotypes)){
               pos2++;     
            }       
            blocklength=pos2-pos;
            for my pos3 (pos..(pos2-1)){    
               blocklength[pos3]=pos3;
            }
            pos=pos;
         }
      }       
   }
}

The function DetectBlockEnd (which is not shown here) returns true when a genotype is heterozygous and not neighboured by 7 or more homozygous/unknown genotypes on each side. Although this significantly slows down block detection, it renders HomozygosityMapper more robust against genotyping errors; which may occur with a frequency of 2% with DNA of low quality or inappropriate genotype calling settings.

The choice of 7 markers reflects spacing and error rate (which increases on Affymetrix high-density arrays).

 

The second step is the user-defined data analysis in which cases and controls can be specified. Also, the users can enter a block length limit - higher block lengths will be set to this value when calculating the homozygosity score. This prevents inflation of homozygosity scores by very long homozygous stretches in single individuals. As default, values optimal for the marker density are used to limit block lengths.

Score calculation is quite simple:

HomScore (marker) = SUM (samples) (BlockLength (marker, sample))

In pseudo-code:

for marker (markers){
   score[marker]=0;
   for sample (samples){
      if (blocklength[sample] > maximum_block_length){
         score[marker]+=maximum_block_length;
      }
      else {
         score[marker]+=blocklength[sample];
      }
   }
}