PDBSProtEC: Dump Details

The PDBSprotEC data may be obtained in two forms:
The flat file format has 6 columns:
  • PDB code
  • the chain identifier (or blank)
  • first residue in the PDB file mapped to SwissProt
  • last residue in the PDB file mapped to SwissProt
  • the UniProt accession (SwissProt or trEMBL)
  • the EC number
The XML format contains the same data in the following format:
<pdbsprotec>
  <pdb_sprot_ec pdb='xxxx'>
      <chain id='X'>
          <region res1='nnn' res2='nnn' sprot='ssssss'>
              <ec ec1='n' ec2='n' ec3='n' ec4='n'>
                  n.n.n.n
              </ec>
          </region>
      </chain>
  </pdb_sprot_ec>
</pdbsprotec>
You can download example code to parse this XML file using Perl/DOM.
Note that one PDB chain may contain more than one region (indicated by the residue range in the flat file or the res1 and res2 attributes of the <region> tag in the XML).
In addition, one region may be assigned more than one EC number. This will appear as multiple rows in the flat file dump and as multiple EC tags in the XML.
An EC number of 0.0.0.0 indicates that this protein is not an enzyme. In other words it appears in UniProtKB, but there is no EC number specified either there or in the Enzyme database.
Where we don't know whether a protein is an enzyme, no EC indication is given. These may be protein chains which appear in UniProt/trEMBL, but not in UniProtKB or Enzyme. Alternatively they may be short peptides or non-protein chains.
Note that there is a difference between the data in the the flat file and the XML file.
The flat file contains rows with EC numbers of 0.0.0.0 where we have evidence that a protein is not an enzyme, but it does not contain rows for the chains where we have no information.
In the XML file, we also have EC numbers of 0.0.0.0 where we have evidence that a protein is not an enzyme. Unlike the flat file, the XML file does include PDB chains for which we have no EC information. However, these entries have no <ec> tag.
Go back to search