SIFTS database

The SIFTS database1 contains EC annotations for entries on the Protein Data Bank (PDB). Several models have been evaluated on this database, including IEConv2. I download the summary of the EC number(s) for each PDB chain that has been processed. In summary, there are 268,992 associations between 218,471 protein chains and 3,657 EC numbers.

  1. Dana, J. M., Gutmanas, A., Tyagi, N., Qi, G., O’Donovan, C., Martin, M., & Velankar, S. (2019). SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic acids research, 47(D1), D482-D489. 

  2. Hermosilla Casajus, P., Schäfer, M., Lang, M., Fackelmann, G., Vázquez Alcocer, P. P., Kozliková, B., … & Ropinski, T. (2021). Intrinsic-extrinsic convolution and pooling for learning on 3D protein structures. In International Conference on Learning Representations, ICLR 2021: Vienna, Austria, May 04 2021 (pp. 1-16). OpenReview. net.Â