However, it would result in overcondensation of data In order to

However, it would result in overcondensation of data. In order to reduce the loss of information, instead of considering just the bottom edge, at each step we also consider the edge to the right of the obtained bottom edge. A 10 �� 10 decagonal isometries selleckchem Regorafenib matrix (DIM) is thus constructed with the value ai,j being the number of times that ��i��-edge appears at the bottom of the decagon with ��j��-edge at its right while applying transformations to the decagon considering a given sequence. Notice that DIM can have nonzero values only right above or right below the diagonal (with the exception of a9,0 and a0,9 entries). This allows us to transform DIM into 20-dimensional vector by putting all potentially non-zero values in a fixed order. The 20-dimensional vector finally acts as descriptors for the transmembrane protein segments encoded.

2.3. Representing Transmembrane RegionsThe amino acid adjacency matrix and decagonal isometries matrix are used independently to encode the transmembrane and nontransmembrane protein segments. The associated matrix invariants mathematically characterize each of the membrane protein segments. Both representations are implemented independently and are used to distinguish between the transmembrane and non-transmembrane segments of membrane spanning proteins.For this purpose, the transmembrane protein sequences are segmented into the transmembrane and non-transmembrane regions. The non-transmembrane regions are further divided into polypeptide segments of length 20 residues.

It is essential to have the length of the non-transmembrane similar to that of the transmembrane segments in order to ensure better training of the classification models. All the transmembrane and non-transmembrane regions are then independently encoded using AA matrix and DIM. The encoded segments are divided into training and test sets. Table 1 lists the number of particular segments in each set.Table 1Training and test sets.We perform principal component analysis (PCA) with the descriptors derived from AA matrix to check if the numerical descriptors are able to discriminate the transmembrane segments from the non-transmembrane ones. As PCA is projection of multidimensional data onto a coordinate system defined by the principal components, it gives an initial validation regarding choice of descriptors.

Next, two independent counter propagation neural network (CPNN) models are developed using the invariants from both the matrices to distinguish between the transmembrane and non-transmembrane segments of the protein sequences.3. Results and Discussion3.1. Amino Acid Adjacency Cilengitide MatrixTo check if the row sum vector derived from the AA matrix well characterizes the transmembrane segments numerically, we perform the principal component analysis (PCA) and develop a CPNN model.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>