INFO MARCOIL ON THE NET VERSION C1

Marcoil (C)Mauro Delorenzi
Prediction of coiled-coil domains in protein sequences by posterior probabilities generated by the Hidden Markov Model MARCOIL

Go to the submission form

Publication
Delorenzi M. and Speed T., 2002.
An HMM model for coiled-coil domains and a comparison with PSSM-based predictions.
Bioinformatics, 18(4):617-625, 2002. Abstract

You can find here a short description and code for
MARCOIL
For a concise but fairly precise description of the prediction method please refer to the paper and to my master
thesis
(this describes in detail a preliminary version of Marcoil).
The web-interface is offered for ease of use when anlysing a small number of sequences.
For large jobs, please download the program and run it locally.

OPTIONS

Coiled-Coil Emission P. Matrix
The three matrices accesible through the web interface are the one used in the paper and trained on 9 "families" of protiens (9FAM). This is a matrix of amino-acid probabilities derived from a large dataset of coiled-coil domains.
It is unspecific, as the dataset contains all kind of domains and these differ in the number of helices, the orientation, the length and the hydrophobicity. The matrix is meant for first-pass genomic screenings.

It generalises the two matrices proposed by A. Lupas and collaborators and used by the program COILS. These matrices are MTIDK, derived from 5 and MTK derived from three "families" of proteins.
Those are matrices of frequency ratios and we computed from them, by using an estimate of absolute amino-acid frequencies, the other two matrices that can be used.

We hope in future to offer the use of matrices that are more specific for subclasses of coiled-coil domains.

HMM Transition P. Matrix
The two matrices we describe in the publication can be used here. MARCOIL-H gives higher posterior probabilities than MARCOIL-L, but the number of true positives for a given number of false positives is similar.
See the section on the interpretation of the posterior probabilities.

Alternatively you can define your own transition probabilities. In the paper we describe a parameterisation by which all the transition probabilities are computed on the basis of the 3 numbers i,r and t.

Increasing i will raise all the coiled-coil probabilities, but in relative terms favor the short domains more than the long domains.

Increasing t has a very similar effect.

Together i and t have a smoothing effect.
Larger values result in a higher sensitivity to the local "propensity" for a coiled-coil structure and can be used to identify the portion with the strongest prediction.
Smaller values give a smoother profile, where the probability attributed to a position depends more on the nearby sequences.

The default values of r sets a stringent requirement for the domains to respect the heptad pattern.
Setting r to 0 will exclude any deviation from a perfect heptad pattern, while increasing r reduces this stringency and for r=1 the requirement for the pattern disappears completely.
A moderate increase in r (maybe 10fold) can help in the identification of coiled-coil domains with irregularities in the typical pattern.

An inappropriate combination of the 3 parameters can produce probabilities outside the [0, 1] range. In this case the program halts and gives an error message.

OUTPUT

PROBABILITY PROFILE
The posterior probabilities that are reported and plotted depend in a complex ways from the emission and transition probabilities. The values that are obtained with the precomputed matrices are loosely speaking useful estimates of the confidence level of a correct prediction.

Nonetheless, in general you need to run at least a few examples of known coiled-coils and of proteins with no coiled-coils to get a feeling for the meaning of the scale used. (The databases described in our paper are available following the link to the code)

The scale is roughly comparable to that of the COILS program (but not to that of the PAIRCOIL program). COILS is more specific for parallel dimeric coiled-coils and has a scale that generally attributes lower probabilities to false as well as to true coiled-coils.
If you use your own transition probabilities, than the scale changes and you will need a number of examples to calibrate the probability scale. If for example you increase considerably the i and t values, you can get deceivingly high probabilities. On the contrary as i or t approach zero, even the strongest coiled-coil domains approach probabilities of zero.

Last modified: Feb 11, 2008

Marcoil (C)Mauro Delorenzi Prediction of coiled-coil domains in protein sequences by posterior probabilities generated by the Hidden Markov Model MARCOIL Go to the submission form
Publication Delorenzi M. and Speed T., 2002. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 18(4):617-625, 2002. Abstract	You can find here a short description and code for MARCOIL For a concise but fairly precise description of the prediction method please refer to the paper and to my master thesis (this describes in detail a preliminary version of Marcoil). The web-interface is offered for ease of use when anlysing a small number of sequences. For large jobs, please download the program and run it locally.

OPTIONS
Coiled-Coil Emission P. Matrix	The three matrices accesible through the web interface are the one used in the paper and trained on 9 "families" of protiens (9FAM). This is a matrix of amino-acid probabilities derived from a large dataset of coiled-coil domains. It is unspecific, as the dataset contains all kind of domains and these differ in the number of helices, the orientation, the length and the hydrophobicity. The matrix is meant for first-pass genomic screenings. It generalises the two matrices proposed by A. Lupas and collaborators and used by the program COILS. These matrices are MTIDK, derived from 5 and MTK derived from three "families" of proteins. Those are matrices of frequency ratios and we computed from them, by using an estimate of absolute amino-acid frequencies, the other two matrices that can be used. We hope in future to offer the use of matrices that are more specific for subclasses of coiled-coil domains.

HMM Transition P. Matrix	The two matrices we describe in the publication can be used here. MARCOIL-H gives higher posterior probabilities than MARCOIL-L, but the number of true positives for a given number of false positives is similar. See the section on the interpretation of the posterior probabilities. Alternatively you can define your own transition probabilities. In the paper we describe a parameterisation by which all the transition probabilities are computed on the basis of the 3 numbers i,r and t. Increasing i will raise all the coiled-coil probabilities, but in relative terms favor the short domains more than the long domains. Increasing t has a very similar effect. Together i and t have a smoothing effect. Larger values result in a higher sensitivity to the local "propensity" for a coiled-coil structure and can be used to identify the portion with the strongest prediction. Smaller values give a smoother profile, where the probability attributed to a position depends more on the nearby sequences. The default values of r sets a stringent requirement for the domains to respect the heptad pattern. Setting r to 0 will exclude any deviation from a perfect heptad pattern, while increasing r reduces this stringency and for r=1 the requirement for the pattern disappears completely. A moderate increase in r (maybe 10fold) can help in the identification of coiled-coil domains with irregularities in the typical pattern. An inappropriate combination of the 3 parameters can produce probabilities outside the [0, 1] range. In this case the program halts and gives an error message.

OUTPUT
PROBABILITY PROFILE	The posterior probabilities that are reported and plotted depend in a complex ways from the emission and transition probabilities. The values that are obtained with the precomputed matrices are loosely speaking useful estimates of the confidence level of a correct prediction. Nonetheless, in general you need to run at least a few examples of known coiled-coils and of proteins with no coiled-coils to get a feeling for the meaning of the scale used. (The databases described in our paper are available following the link to the code) The scale is roughly comparable to that of the COILS program (but not to that of the PAIRCOIL program). COILS is more specific for parallel dimeric coiled-coils and has a scale that generally attributes lower probabilities to false as well as to true coiled-coils. If you use your own transition probabilities, than the scale changes and you will need a number of examples to calibrate the probability scale. If for example you increase considerably the i and t values, you can get deceivingly high probabilities. On the contrary as i or t approach zero, even the strongest coiled-coil domains approach probabilities of zero.

Marcoil (C)Mauro Delorenzi Prediction of coiled-coil domains in protein sequences by posterior probabilities generated by the Hidden Markov Model MARCOIL

Go to the submission form

OPTIONS

OUTPUT

Marcoil (C)Mauro Delorenzi
Prediction of coiled-coil domains in protein sequences by posterior probabilities generated by the Hidden Markov Model MARCOIL