HMM
The HMM method can be used to determine the essentiality of the entire genome, as opposed to gene-level analysis of the other methods. It is capable of identifying regions that have unusually high or unusually low read counts (i.e. growth advantage or growth defect regions), in addition to the more common categories of essential and non-essential.
Note
Intended only for Himar1 datasets.
How does it work?
Usage
> python3 transit.py hmm <comma-separated .wig files> <annotation .prot_table or GFF3> <output_BASE_filename>
(will create 2 output files: BASE.sites.txt and BASE.genes.txt)
Optional Arguments:
-r <string> := How to handle replicates. Sum, Mean. Default: -r Mean
-l := Perform LOESS Correction; Helps remove possible genomic position bias. Default: Off.
-iN <float> := Ignore TAs occuring at given percentage (as integer) of the N terminus. Default: -iN 0
-iC <float> := Ignore TAs occuring at given percentage (as integer) of the C terminus. Default: -iC 0
Parameters
The HMM method automatically estimates the necessary statistical parameters from the datasets. You can change how the method handles replicate datasets:
Replicates: Determines how the HMM deals with replicate datasets by either averaging the read-counts or summing read counts across datasets. For regular datasets (i.e. mean-read count > 100) the recommended setting is to average read-counts together. For sparse datasets, it summing read-counts may produce more accurate results.
Output and Diagnostics
Column # |
Column Definition |
---|---|
1 |
Coordinate of TA site |
2 |
Observed Read Counts |
3 |
Probability for ES state |
4 |
Probability for GD state |
5 |
Probability for NE state |
6 |
Probability for GA state |
7 |
State Classification (ES = Essential, GD = Growth Defect, NE = Non-Essential, GA = Growth-Defect) |
8 |
Gene(s) that share(s) the TA site. |
Column Header |
Column Definition |
---|---|
Orf |
Gene ID |
Name |
Gene Name |
Desc |
Gene Description |
N |
Number of TA sites |
n0 |
Number of sites labeled ES (Essential) |
n1 |
Number of sites labeled GD (Growth-Defect) |
n2 |
Number of sites labeled NE (Non-Essential) |
n3 |
Number of sites labeled GA (Growth-Advantage) |
Avg. Insertions |
Mean insertion rate within the gene |
Avg. Reads |
Mean read count within the gene |
State Call |
State Classification (ES = Essential, GD = Growth Defect, NE = Non-Essential, GA = Growth-Defect) |