AbCheck - How it works

AbCheck consists of two phases:

  1. The sequence you provide is aligned to map it to the Kabat numbering scheme.
  2. The aligned sequence is scanned against the Kabat sequence database using the KabatMan database program.


Alignment is performed using a program written in C.


For simplicity and speed, the alignment is done with a single consensus Ab sequence rather than a multiple alignment. This can cause problems with the positioning of gaps and is very sensitive to the exact consesnsus sequence. For example, at the start of the light chain, a strict consensus will often lead to the gap appearing at the start rather than at position 9 which is the true consensus gap position. Thus the consensus sequence (derived from antibodies in the January 1995 PDB files) has been hand-modified to ensure gaps occur in the correct position.

The following codes are used in addition to normal 1-letter amino acid names:

~ Acidic
! Hydrophobic
@ Amido
# Aromatic
$ Basic
% Hydroxyl containing
^ Proline
& Sulphur containing

Alignment is done using a simple Needleman and Wunsch algorithm using the Dayhoff-78 matrix normalised at a value of 10. Values for consensus sequence characters have been obtained by approximate averaging of the contributing residues. The gap penalty is set at 15.

Gaps are provided in the loops to allow the maximum length CDRs observed in the January 1995 Kabat sequence database.

The positioning of insertions within the loops must follow the standard Kabat numbering scheme. A normal alignment will not do this; for example in CDR-H2 the insertions at 52 will actually go into positions 52C then 52B then 52A rather than vice versa.

The alignment must account for the fact that either an Fv or an Fab could be supplied and that the end point of the supplied sequence for an Fv may vary. As the positioning of gaps in the constant region of the Fab is unimportant, a simple hand-derived consensus has been used for this part of the chains. If the supplied sequence has a length of ≥75% of the consensus Fab, the whole sequence is used; if <75%, the provided sequence and the consensus are truncated at the end of the Fv.


If the supplied sequence is <75% of the length of the full Fab chain, then the consensus is truncated at then end of the Fv. The supplied sequence is truncated at FGxGT+13 (pos >90) in the light chain and at WGxG+11 (pos >90) in the heavy chain. These sequences are almost invariant appearing just after CDR-3. If they are not found a warning is issued to indicate that the sequences could not be truncated.

Needleman & Wunsch sequence alignment is performed against the consensus sequence.

The program checks for unusual insertions. i.e. anything which has caused the consensus sequence to expand in length. This may result from really strange sequences or from very long loops (longer than the definition of the consensus sequence and numbering allows).

The aligned sequence is `re-packed' to place insertions which occur in the loops in the standard Kabat positions. This is done by having the Kabat numbering stored in a file in alternate groups representing sections which sould be numbered verbatim (framework) and those which should be treated specially (loops). Note that the regions defined as loops may not match the standard definitions (e.g. CDR-H2). This is because gaps may not be handled properly if insertions occur in the alignment at what is normally defined as the C-terminus of the framework. Within loop regions, the residues are numbered from the N-terminus until an insertion code label is met. Numbering then proceeds from the C-terminus until all residues have a label, or an insertion code is met. Finally any remaining residues are labelled with their insertion code labels from N- to C-terminus.

This means that there is a slight deviation from the Kabat numbering scheme when very long loops occur (especially in CDR-H3). When the insertions are more extensive than allowed by the Kabat insertion letters, additional letters are supplied. The Kabat database may insert these extra residues between, for example, H100G and H100H and does not specify how they should be named. This alignment procedure simply labels with letters alphabetically throughout the insertion region. The KabatMan database program adopts the same scheme.

A fudge routine is applied to move a deletion at L9 to L10. The alignment of sequences clearly places the deletion site at L9, but the Kabat scheme places it at L10.


Using the Kabat numbered sequence generated by the above procedure, a program written in Perl builds a control file for the Kabatman database program. This file requests the number of times the amino acid type found in your sequence occurs at that Kabat position in the database. It runs Kabatman on this control file and then parses the output to find residues which occur in less than 1% of the chains in the Kabat database. These data are reported to the user.

Copyright (c) 1995, Andrew C.R. Martin, UCL