, 2006, 2007; Petkun et al., 2010). Surprisingly, several CBM3s appeared not to be associated with the cellulolytic system of this bacterium. Among these proteins, we discovered that Cthe_0059, Cthe_0267 and Cthe_0404 shared similar N-terminal segments (∼165 residues) Protein Tyrosine Kinase inhibitor that resembled those of the B. subtilisσI-modulating factor RsgI (Fig. S1) and RsgI-like proteins in certain Firmicutes species
(data not shown). These ∼165-residue domains of the C. thermocellum hypothetical proteins were termed ‘RsgI-like domains’ here, and their sequences were used further in this study as queries to sequence similarity searches against the C. thermocellum genome databases (see next section). In lieu of a signal peptide motif, all nine RsgI-like proteins were predicted to contain three subdomains
– an ∼50- to 60-residue N-terminal region located inside the cell, followed by a single transmembrane helix (TMH) and a C-terminal region predicted to be localized on the cell exterior (Fig. 1). Putative TMHs were found to be located approximately at residues 55–85 in eight RsgI-like proteins. In one exception (Cthe_0260), a TMH carrying an ∼95 amino Panobinostat acid (aa) insert was located at residues 150–172, and the gene encoding this protein is likely to be monocistronic without an upstream sigI-like gene (Fig. 2). Comparative sequence analysis of the RsgI-like domains from C. thermocellum with those of RsgI-like proteins from Bacillus and several other Clostridium species revealed a relatively high sequence divergence. Nevertheless, the three abovementioned subdomains were consistently predicted in all N-terminal sequences of the identified RsgI-like proteins (Fig. S1). Within the context of the present work, the N-terminal sequences that constitute the intracellular domain of approximately 40 different RsgI-like proteins were aligned, in order to establish a novel Pfam family, designated PF12791 or RsgI_N. Using this motif, approximately 150 RsgI-like proteins can be found in public protein databases (data not shown). Two other N-terminal subdomains of the RsgI-like proteins, a
TMH and a part of the predicted extracellular-sensing domain, also share a very weak, Thymidine kinase but recognizable conservation (Fig. S1). Analysis of the C. thermocellum ATCC 27405 genome (GenBank accession numbers CP000568 and NC_009012), using the ∼165 aa N-terminal sequences of the B. subtilis RsgI and its three C. thermocellum homologues as blast queries, revealed the presence of six additional ORFs (Fig. S1). Eight of the nine rsgI-like genes appeared to form bicistronic operons downstream of genes encoding proteins, which bear strong similarity to the B. subtilisσI factor (Fig. 2). Similar findings for the sigI- and corresponding rsgI-like genes were evident from analysis of the genomes of two other C. thermocellum strains: DSM 4150 (JW20) and DSM 2360 (LQR1). Extensive analysis of the B. subtilisσI and its putative C. thermocellum homologues revealed an atypical domain organization.