/*********************************************** ************************************************ ** ** ** hmm MAMOT ** ** a program for HMM modelling ** ** 2002, Mauro C. Delorenzi ** ** ** ************************************************ ************************************************/ ************************************************************************** IMPLEMENTED FUNCTIONALITIES mamot -G Generation of random sequences mamot -B Baum-Welch (BW, EM) LEARNING mamot -V Viterbi LEARNING (not yet tested) mamot -P FB (forward(-backward)) PROBABILITY mamot -D Forward-backward "posterior" DECODING mamot -Q Viterbi probability and DECODING ************************************************************************* USAGE - EXAMPLES General options --------------- -h : display help page -v : verbose output -m : Specify the file containing the model (default: INPUT/R1.model) -s : Specify the file containing the sequences (default: SEQUENCES/seqFile) -f : Write additional information in a file (depending on the command) -G: Generation of random sequences ---------------------------------- -r n : seed for the random number generator (default 1) -n n : number of sequences to generate mamot -G -vf -m modelfile -n 200 mamot -Gf -m R1 > generatedsequence mamot -G -n 1000 -m R1 > generatedsequences mamot -G -vf -m R4model mamot -G -vf -m R4model -n 200 -D and -V: Learning a model --------------------------- mamot -Batpv -j 2 -i 2 -w 0.1 -m model.hmm data.seq mamot -V -j 2 -i 2 -d 55 -m modelfile seqFile > results.txt -P: Computing the Probability given the model ---------------------------------------------- mamot -P -m modelfile seqFile > probabilityfile.txt -B and -Q: Decoding ------------------- mamot -D -m modelfile seqFile mamot -Q -m modelfile seqFile mamot -Batpv -j 2 -i 2 -w 0.1 -m nf1_full1.hmm nf1_selex5.seq mamot -Ba -j 3 -i 25 -d 55 -w 1.5 -m R1 seqFile mamot -Ba -k 5000 -l 40 -j 1 -i 1 -d 55 -m R1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - INPUT seqFile: file with sequencs, name at max 100 chars, multifasta format modelfile: file with model, name at max 100 chars, please see example file lines cannot be longer than 1000 chars INFORMATION -G => the random sequences are written to stdout With option -f also writes to the file in tab delim format that includes the state sequence as well. -P => writes sequences with added log Prob. to the file FBprob additionally log Prob -P is the default, can be omitted -D => writes posterior state probabilitie to the file PMatrix and also the same output as -P to the file FBprob -B, -V => writes the new model to the file BWprot, {should be in a format that can be used as model input file} and (for now) some "controls" to stdout //***********************************************************************// ADDITIONAL OPTIONS ------------------- -p (in Baum Welch only) tie emission distributions of pairs of complementary states (pool contributions in Baum Welch) -t (in Baum Welch only) tie emission distributions of states in the same tie group (pool contributions in Baum Welch) -u use both strands of a DNA sequence, use both independently in learning a model respectively apply decoding to both -e use both strands of a DNA sequence conjointly (in Baum Welch only) -m filename of the ModelFile -s filename of the Sequences File -d threshold of absolute value of change of total log likelihood to stop BW or Vit learning (vMINdifftotLogLik) default is kMINdifftotLogLik (here a signed number), -n nb of sequences to be generated default: 1 -a writes also intermediate (after each round) BW model results to file (alloutput = true) -f allows "additional" output to a file (bfileoutput = true) -g limits the "additional" output of -g to values above a cutoff (vMINProbPrint, with default kMINProbPrint) -i maximal number of iterations in BW (vMAXnbITERATIONS) default is kMAXnbITERATIONS -j minimal number of iterations in BW (vMINnbITERATIONS) default is kMINnbITERATIONS -k store sequences in memory when doing BaumWelch after first reading, followed by the nb of sequences and by -l: -l when using -k, maximal length of sequences that have to be used (for memory assignment) -b in Baum Welch and Viterbi Learning do not update transition probabilities -c in Baum Welch and Viterbi Learning do not update emission probabilities -w number (double) as weight for pseudocounts, 1 for standard pseudocount scheme, default is 0: no pseudocounts added in alphabetical order: ----------------------- -a writes also intermediate (after each round) BW model results to file (alloutput = true) -b in Baum Welch and Viterbi Learning do not update transition probabilities -c in Baum Welch and Viterbi Learning do not update emission probabilities -d threshold of absolute value of change of total log likelihood to stop BW (vMINdifftotLogLik) default is kMINdifftotLogLik (here a signed number), -e use both strands of a DNA sequence conjointly (in Baum Welch only) -f allows "additional" output to a file (bfileoutput = true) -g limits the "additional" output of -g to values above a cutoff (vMINProbPrint, with default kMINProbPrint) -i maximal number of iterations in BW (vMAXnbITERATIONS) default is kMAXnbITERATIONS -j minimal number of iterations in BW (vMINnbITERATIONS) default is kMINnbITERATIONS -k store sequences in memory when doing BaumWelch after first reading, followed by the nb of sequences and by -l: -l when using -k, maximal length of sequences that have to be used (for memory assignment) -m filename of the ModelFile -n nb of sequences to be generated, default: 1 -p (in Baum Welch only) tie emission distributions of pairs of complementary states (pool contributions in Baum Welch) -r (Generation only) specify the seed for the random generator -s filename of the Sequences File -t (in Baum Welch only) tie emission distributions of states in the same tie group (pool contributions in Baum Welch) -u use both strands of a DNA sequence, use both independently in learning a model respectively apply decoding to both -w number (double) as weight for pseudocounts, 1 for standard pseudocount scheme, default is 0: no pseudocounts added //****************************************************************************// SPECIFICATION / LIMITATIONS ---------------------------- This is still a working version, although through repeated use, most bugs have likely been removed while others certainly still exist and we are glad, if these are communicated to us. The user should carfully respect the specifications, for example for the syntax of the model definition file, as there is no extensive checking of the assumptions made by the code, and for example segmentation faults can happen, if the input specifications are not respected. Please report errors by e-mail to mauro.delorenzi@isrecch indicating the command line used and sending the input files. HIDDEN MARKOV MODEL states: any number, any name (within reason), up to 25 characters, limit characters to letters and numbers Emission symbols (alphabet): only single characters, corrently can handle only max. 30 capital "latin" letters properly Emission probabilities must be completely listed in the SAME order given in the line that defines the alphabet //**************************************************************************//