Abstract:
Cryo-electron microscopy (cryo-EM) has become a significant technique for protein structure determination. The deposition of cryo-EM proteins into the Worldwide Protein Data Bank (wwPDB) has increased steadily from 11 in 2000 to 1423 in 2019. However, we note that there are plenty of suspicious defects in proteins solved from cryo-EM in the Protein Data Bank (PDB). These potential structural errors might be due to the lack of features for cryo-EM proteins in the current validation pipeline at wwPDB. We designed a validation tool, with several unsupervised machine learning models built in, to identify the defects. A Graphic User Interface (GUI) for ChimeraX protein visual platform was designed to help structural biologists identify reasons for introducing conformation defects in the protein modeling process.