Cystatins and lipocalins have attracted considerable interest for their potential applications in non-immunoglobulin protein scaffold engineering. In the present study, their potential homologs were screened computationally from non-redundant protein sequence database based on the overlapped conserved residues (OCR)-fingerprints, which can detect the protein family with low sequence identity, such as cystatins and lipocalins. Two types of OCR-fingerprints for each family were designed and showed very high detection efficiency (>90%). The protein sequence database was scanned by the fingerprints, which yielded the hypothetical sequences for cystatins and lipocalins. The hypothetical sequences were validated further based on their sequence motifs and structural models, which allowed an identification of the potential homologs of cystatins and lipocalins.
Tiede C, Tang AA, Deacon SE, Mandal U, Nettleship JE, Owen RL, George SE, Harrison DJ, Owens RJ, Tomlinson DC, McPherson MJ, Protein Eng. Des. Sel., 27, 145, 2014