DOT MATRIC COMPARISON OF SEQUENCES

 

Dot-matrix is the first method of sequence comparison proposed in the seventies. It is a simple graphical comparison method. It involves writing Sequence-1 horizontally (each base/residue in a column) and sequence-2 vertically (each base/residue in a row) in a graph sheet.  Each base / residue in Sequence-2 is compared with the bases / residues in sequence-1. If there is an identity a dot is placed in the interjection.  Adjoining dots (diagonally / vertically / horizontally) are joined (join only if there are at least three adjoining dots). This will result in lines (diagonals / inverted diagonals / vertical / horizontal) . This analyses can also be done using a computer programme where we can compare longer sequences (You will do this latter). The following types of line are possible:

A diagonal line representing sequence identity ·      

Diagonal lines not at the centre but above or below to the centre – representing repeat·      

Inverted diagonal lines representing inversions ·      

Horizontal and vertical lines representing tandem repeats.

 

  G C A A T C G C A G C C A C G T G C A
G *           *     *         *   *    
C   *       *   *     * *   *       *  
A     * *         *       *           *
A     * *         *       *           *
T         *                     *      
C   *       *   *     * *   *       *  
G *           *     *         *   *    
C   *       *   *     * *           *  
A     * *         *       *           *
G *           *     *         *   *    
C   *       *   *     * *   *       *  
C   *       *   *     * *   *       *  
A     * *         *       *           *
C   *       *   *     * *   *       *  
G *           *     *         *   *    
T         *                     *      
G *           *     *         *   *    
C   *       *   *     * *   *       *  
A     * *         *       *           *

This is self comparison i.e a sequence is compared to itself.  Please note the consecutive dots diagonally (from upper left corner towards lower right corner).  If you join these dots you get a diagonal line  which indicates the sequence identity between the two sequences

 

  G C A A T C G C A G C C A C G T G C A
G *           *     *         *   *    
C   *       *   *     * *   *       *  
A     * *         *       *           *
A     * *         *       *           *
T         *                     *      
C   *       *   *     * *   *       *  
G *           *     *         *   *    
C   *       *   *     * *           *  
A     * *         *       *           *
C   *       *   *     * *   *       *  
C   *       *   *     * *   *       *  
A     * *         *       *           *
C   *       *   *     * *   *       *  
G *           *     *         *   *    
T         *                     *      
G *           *     *         *   *    
C   *       *   *     * *   *       *  
A     * *         *       *           *

This is  the repeat of the above comparison. But there  is a single base deletion in the second sequence which breaks the diagonal line

 

  G C A A T C G C A G C C A C G T G C A
G *           *     *         *   *    
C   *       *   *     * *   *       *  
A     * *         *       *           *
A     * *         *       *           *
T         *                     *      
C   *       *   *     * *   *       *  
G *           *     *         *   *    
C   *       *   *     * *           *  
A     * *         *       *           *
C   *       *   *     * *   *       *  
C   *       *   *     * *   *       *  
A     * *         *       *           *
C   *       *   *     * *   *       *  
G *           *     *         *   *    
T         *                     *      
G *           *     *         *   *    
C   *       *   *     * *   *       *  
A     * *         *       *           *

This is the same dot matrix (same as matrix-2). In this you can observe  7 consecutive dots perpendicular to the original diagonal. If you join them you get a reverse diagonal. These reverse diagonals indicate the presence of inversions.

In the above three matrices we have used a window size of one i.e we took one base (from Sequence-2) at a time and compared to the bases in Sequence-1.   The window size can be increased by taking two bases (of Sequence-2) at a time and comparing to bases in Sequence-1. This will reduce the noise in the picture. It will display only lines  and not individual dots which are alone. (See the matrix below)

  G C A A T C G C A G C C A C G T G C A
G *           *     *             *    
C   *           *       *           *  
A     *                                
A       *                              
T         *                            
C           *               *          
G *           *     *             *    
C   *           *       *           *  
A                 *       *            
C                     *                
C   *           *       *           *  
A                         *            
C           *               *          
G                             *        
T                               *      
G *           *     *             *    
C   *           *       *           *  
A                                      

 

The same matrix below with a window size of three. Please note the reduced noice.

 

 

  G C A A T C G C A G C C A C G T G C A
G *           *                   *    
C   *                                  
A     *                                
A       *                              
T         *                            
C           *                          
G *           *                   *    
C               *       *              
A                                      
C                     *                
C                       *              
A                         *            
C                           *          
G                             *        
T                               *      
G *           *                   *    
C   *           *       *           *  
A                                      

 

  G C A A T C G C A G C A A A A A A C A
G *           *     *                  
C   *       *   *     *             *  
A     * *         *     * * * * * *   *
A     * *         *     * * * * * *   *
T         *                            
C   *       *   *     *             *  
G *           *     *                  
C   *       *   *     *             *  
C   *       *   *     *             *  
C   *       *   *     *             *  
C   *       *   *     *             *  
C   *       *   *     *             *  
C   *       *   *     *             *  
C   *       *   *     *             *  
G *           *     *                  
T         *                            
G *       *         *                  
C   *       *   *     *             *  
A     * *         *     * * * * * *   *

 

In the above matrix you can see consecutive dots representing vertical or horizontal line. Actually horizontal lines represent tandem repeat in sequence-1 while vertical lines represent tandem repeats in sequence-2

 

What you have understood so far ?:

How to write the sequence to create the matrix and how to fill the matrix with different word sizes. You also understood  what diagonals, inverted diagonals, horizontal and vertical lines mean.

In the above matrices we took a window (number of bases from sequence-2) and slide it against the sequence-1. This is called as the sliding window. This concept is used in most sequence analysis methods. i.e we take a section of sequence of particular length (window size) and compare them to another sequence by sliding it against it.

Stringency of comparison:

As  biological sequences undergo variation, when we compare two sequences it is advisable to give scope for variation. Otherwise we end of mostly telling "the two sequences are different". Stringency is the level of identity required to put a dot for example when the window size is 4: 100 percent stringency means  all four bases have to match when we slide it against the second sequence; 75 % stringency means that we can enter a dot if three out of 4 bases in the window match; 50 percent stringency means that we enter a dot if two out of four bases match. This way we can find the relatedness among diverged sequence.