Title

ThermoSeek

ThermoSeek is a user-friendly platform encompassing a repository of thermophilic and Psychrophilic protein sequence structures. Leveraging this database, the platform offers diverse functionalities, including sequence alignment and the exploration of similar motifs and fold structures. This comprehensive suite of tools holds significant promise as a valuable resource for the investigation of functional proteins, as well as for endeavors related to the modification of thermal stability and the structural optimization of enzymes. These attributes make ThermoPro particularly advantageous for the design of protein sequences and structures characterized by elevated thermal stability. For additional details, kindly refer to our publication:

Zhihang Chen, HaoJie Wang, RenXiao Wang and Yifei Qi. ThermoSeek: An Integrated Resource for Sequence and Structural Analysis of Proteins from Thermophilic Species. Submitted

Motif search

The functional and biophysical characteristics of structural proteins are intimately tied to the precise conformation assumed during the folding of their amino acid sequences, marked notably by the existence of specific local geometric arrangements of crucial residues. The motif search module adopts an inverted index approach to encode and store combinations of residues (motifs) along with their relative geometric configurations in thermophilic proteins. Employing a minimum spanning tree (MST) algorithm, the module efficiently trims the search space for query motifs. Subsequently, it retrieves proteins from the database that harbor motifs identical to those identified in the target protein.

motif search page

The motif search module includes task input, submission, and result presentation pages. Using the active site of PETase as an example, the steps to use this module as follows:

1. Upload the PDB file containing the query motif.

2. Optionally provide the PDB ID to download the structure from the RCSB PDB database.

3. Input the query motif residues in the format "Chain ResID" with at least three residues separated by ",".

4. Select the protein database you want to query and click the Search button.

5. Query motif information.

6. The Job ID of the task. You can wait on this page or use the Job ID to access the result page at the bottom of our homepage later.

7. Query motif information.

8. 3D structure of the query motif.

9. Query results are displayed, including the UniProt ID of the hit protein, the motif residues similar to the query motif, and the motif overlay RMSD in angstrom.

10. Download the result information as a CSV file.

Sequence alignment

The sequence alignment draws upon the well-established correlation between thermostability and evolvability, a relationship extensively explored in studies (Finch and Kim, 2018). The introduction of mutations to thermophilic proteins has been shown to exert a profound influence on protein evolution. Through sequence similarity comparisons, this alignment method provides indispensable homology information, enriching our understanding of the evolutionary relationships existing between input sequences and thermostable protein sequences.

BLASTp and MMseqs2 algorithm serves as robust tools for protein-protein alignment, primarily leveraging sequence identity as a basis for comparison.

sequence alignment page

The sequence alignment module includes task input, submission, and result presentation pages. Using the sequence of IsPETase as an example, the steo to use this module are as follows:

1. Upload the query protein sequence.

2. Select the protein database you want to query.

3. Upload the FASTA file containing the protein sequence.

4. Or provide the Uniport ID of the query protein and click the Search button.

5. Query sequence information

6. Query results are displayed, including the UniProt ID of the hit protein, sequences similar to the query, and their sequence identities.

7. Download the results as a CSV file along with the pairwise alignment file.

Sequence alignment tools used in this module:

BLAST Basic Local Alignment Search Tool, i.e., "search tool based on local alignment algorithm", is a commonly used tool in bioinformatics. It can compare the input nucleoprotein sequences with known sequences in the database to obtain sequence similarity and other information. To determine the origin or evolutionary relationship of the sequence. c: Camacho, C., Coulouris, G., Avagyan, V. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009). https://doi.org/10.1186/1471-2105-10-421

MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. MMseqs2 includes Linclust, the first clustering algorithm whose runtime scales linearly With Linclust it clustered 1.6 billion metagenomic sequence fragments in 10 h on a single server to 50% sequence identity. References: Steinegger, M., Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35, 1026–1028 (2017). https://doi.org/10.1038/nbt.3988

Fold Search

The meticulous examination of three-dimensional folding structures, coupled with the discernment and retrieval of homologous structures within thermostable proteins through their distinctive folding patterns, stands as a crucial approach facilitates the identification of structural similarities and relationships among thermostable proteins, thereby enriching our comprehension of their functional and evolutionary characteristics.

Foldseek algorithm was used ass tool for protein structure search and alignment.

fold search page

The fold search module includes task input, submission, and result presentation pages. Using the fold structure of IsPETase as an example, the steps to use this module are as follows:

1. Upload the structure of the query protein.

2. Or provide the PDB ID of the query protein.

3. Select the protein database.

4. Select the query mode parameters for FoldSeek and click the button.

5. Query protein information

6. Query results are displayed, including the UniProt ID of the hit protein, folds similar to the query, align length, their TM-scores and RMSD in angstrom.

7. Download the results as a CSV file, along with the FoldSeek result as an HTML file.

Structural alignment tools used in this module:

Foldseek is a software suite for searching and clustering protein structures. It is 600,000 times faster than the fastest state-of-the-art aligners. Allowing to query millions of structures in seconds. References: van Kempen, M., Kim, S.S., Tumescheit, C. et al. Fast and accurate protein structure search with Foldseek. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01773-0

We cordially invite your feedback and communication concerning any issues or inquiries encountered while utilizing the platform. Your input is invaluable in our continuous efforts to enhance the functionality and utility of our work.