transit package


Submodules

pytransit.norm_tools module

class pytransit.norm_tools.AdaptiveBGCNorm[source]

Bases: pytransit.norm_tools.NormMethod

cleaninfgeom(rho)[source]

Returns a ‘clean’ output from the geometric distribution.

ecdf(x)[source]

Calculates an empirical CDF of the given data.

name = 'aBGC'
static normalize(data, wigList=[], annotationPath='', doTotReads=True, bgsamples=200000)[source]

Returns the normalized data using the aBGC method.

Parameters:
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
  • doTotReads (bool) – Boolean specifying whether to do TTR normalization as well.
  • bgsamples (int) – Integeer specifying how many samples to take.
Returns:

Array with the normalized data.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> normdata = norm_tools.aBGC_norm(data)
>>> print normdata
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

See also

normalize_data

class pytransit.norm_tools.BetaGeomNorm[source]

Bases: pytransit.norm_tools.NormMethod

cleaninfgeom(rho)[source]

Returns a ‘clean’ output from the geometric distribution.

ecdf(x)[source]

Calculates an empirical CDF of the given data.

name = 'betageom'
static normalize(data, wigList=[], annotationPath='', doTTR=True, bgsamples=200000)[source]

Returns normalized data according to the BGC method.

Parameters:
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
  • doTTR (bool) – Boolean specifying whether to do TTR norm as well.
  • bgsamples (int) – Integer specifying how many samples to take.
Returns:

Array with the data normalized using the betageom method.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> normdata = norm_tools.betageom_norm(data)
>>> print normdata
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]

See also

normalize_data

class pytransit.norm_tools.EmpHistNorm[source]

Bases: pytransit.norm_tools.NormMethod

static Fzinfnb(params, args)[source]

Objective function for the zero-inflated NB method.

name = 'emphist'
static normalize(data, wigList=[], annotationPath='')[source]

Returns the normalized data, using the empirical hist method.

Parameters:
  • wigList (list) – List of paths to wig formatted datasets.
  • annotationPath (str) – Path to annotation in .prot_table or GFF3 format.
Returns:

Array with the normalization factors for the emphist method.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.emphist_factors(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"], "transit/genomes/H37Rv.prot_table")
>>> print factors
array([[ 1.        ],
       [ 0.63464722]])

See also

normalize_data

pytransit.norm_tools.Fzinfnb(params, args)[source]

Objective function for the zero-inflated NB method.

class pytransit.norm_tools.NZMeanNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'nzmean'
static normalize(data, wigList=[], annotationPath='')[source]

Returns the normalization factors for the data, using the NZMean method.

Parameters:

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns:

Array with the normalization factors for the nzmean method.

Return type:

numpy array

Example:
>>> import pytransit._tools.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.nzmean_factors(data)
>>> print factors
array([[ 1.14836149],
       [ 0.88558737]])

See also

normalize_data

class pytransit.norm_tools.NoNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'nonorm'
static normalize(data, wigList=[], annotationPath='')[source]
class pytransit.norm_tools.NormMethod[source]
name = 'undefined'
static normalize()[source]
class pytransit.norm_tools.QuantileNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'quantile'
static normalize(data, wigList=[], annotationPath='')[source]

Performs Quantile Normalization as described by Bolstad et al. 2003

Parameters:

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns:

Array with the data normalized by the quantile normalization method.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> normdata = norm_tools.quantile_norm(data)
>>> print normdata

See also

normalize_data

class pytransit.norm_tools.TTRNorm[source]

Bases: pytransit.norm_tools.NormMethod

empirical_theta()[source]

Calculates the observed density of the data.

This is used as an estimate insertion density by some normalization methods. May be improved by more sophisticated ways later on.

Parameters:

data (numpy array) –

  1. numpy array defining read-counts at N sites.

Returns:

Density of the given dataset.

Return type:

float

Example:
>>> import pytransit.tnseq_tools as tnseq_tools
>>> import pytransit.norm_tools as norm_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> theta = norm_tools.empirical_theta(data)
>>> print theta
0.467133570136

See also

TTR_factors

name = 'emphist'
static normalize(data, wigList=[], annotationPath='', thetaEst=<function empirical_theta>, muEst=<function trimmed_empirical_mu>, target=100.0)[source]

Returns the normalization factors for the data, using the TTR method.

Parameters:
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
  • thetaEst (function) – Function used to estimate density. Should take a list of counts as input.
  • muEst (function) – Function used to estimate mean count. Should take a list of counts as input.
Returns:

Array with the normalization factors for the TTR method.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.TTR_factors(data)
>>> print factors
array([[ 1.        ],
       [ 0.62862886]])

See also

normalize_data

trimmed_empirical_mu(t=0.05)[source]

Estimates the trimmed mean of the data.

This is used as an estimate of mean count by some normalization methods. May be improved by more sophisticated ways later on.

Parameters:
  • data (numpy array) –
    1. numpy array defining read-counts at N sites.
  • t (float) – Float specifying fraction of start and end to trim.
Returns:

(Trimmed) Mean of the given dataset.

Return type:

float

Example:
>>> import pytransit.tnseq_tools as tnseq_tools
>>> import pytransit.norm_tools as norm_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> mu = norm_tools.trimmed_empirical_mu(data)
>>> print mu
120.73077107

See also

TTR_factors

class pytransit.norm_tools.TotReadsNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'totreads'
static normalize(data, wigList=[], annotationPath='')[source]

Returns the normalization factors for the data, using the total reads method.

Parameters:

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns:

Array with the normalization factors for the totreads method.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.totreads_factors(data)
>>> print factors
array([[ 1.2988762],
       [ 0.8129396]])

See also

normalize_data

class pytransit.norm_tools.ZeroInflatedNBNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'zinfb'
static normalize(data, wigList=[], annotationPath='')[source]

Returns the normalization factors for the data using the zero-inflated negative binomial method.

Parameters:

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns:

Array with the normalization factors for the zinfnb method.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.zinfnb_factors(data)
>>> print factors
[[ 0.0121883 ]
 [ 0.00747111]]

See also

normalize_data

pytransit.norm_tools.cleaninfgeom(x, rho)[source]

Returns a ‘clean’ output from the geometric distribution.

pytransit.norm_tools.ecdf(S, x)[source]

Calculates an empirical CDF of the given data.

pytransit.norm_tools.empirical_theta(X)[source]

Calculates the observed density of the data.

This is used as an estimate insertion density by some normalization methods. May be improved by more sophisticated ways later on.

Parameters:

data (numpy array) –

  1. numpy array defining read-counts at N sites.

Returns:

Density of the given dataset.

Return type:

float

Example:
>>> import pytransit.tnseq_tools as tnseq_tools
>>> import pytransit.norm_tools as norm_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> theta = norm_tools.empirical_theta(data)
>>> print theta
0.467133570136

See also

TTR_factors

pytransit.norm_tools.norm_to_target(data, target)[source]

Returns factors to normalize the data to the given target value.

Parameters:
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
  • target (float) – Floating point specifying the target for the mean of the data/
Returns:

Array with the factors necessary to normalize mean to target.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.norm_to_target(data, 100)
>>> print factors
[[ 1.8548104 ]
 [ 1.16088726]]

See also

normalize_data

pytransit.norm_tools.normalize_data(data, method='nonorm', wigList=[], annotationPath='')[source]

Normalizes the numpy array by the given normalization method.

Parameters:
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
  • method (str) – Name of the desired normalization method.
  • wigList (list) – List of paths for the desired wig-formatted datasets.
  • annotationPath (str) – Path to the prot_table annotation file.
Returns:

Array with the normalized data. list: List containing the normalization factors. Empty if not used.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
(normdata, normfactors) = norm_tools.normalize_data(data, "TTR")   # Some methods require annotation and path to wig files.
>>> print normfactors
array([[ 1.        ],
       [ 0.62862886]])
>> print normdata
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

Note

Some normalization methods require the wigList and annotationPath arguments.

pytransit.norm_tools.trimmed_empirical_mu(X, t=0.05)[source]

Estimates the trimmed mean of the data.

This is used as an estimate of mean count by some normalization methods. May be improved by more sophisticated ways later on.

Parameters:
  • data (numpy array) –
    1. numpy array defining read-counts at N sites.
  • t (float) – Float specifying fraction of start and end to trim.
Returns:

(Trimmed) Mean of the given dataset.

Return type:

float

Example:
>>> import pytransit.tnseq_tools as tnseq_tools
>>> import pytransit.norm_tools as norm_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> mu = norm_tools.trimmed_empirical_mu(data)
>>> print mu
120.73077107

See also

TTR_factors

pytransit.norm_tools.zinfnb_factors(data)[source]

Returns the normalization factors for the data using the zero-inflated negative binomial method.

Parameters:

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns:

Array with the normalization factors for the zinfnb method.

Return type:

numpy array

Example:
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.zinfnb_factors(data)
>>> print factors
[[ 0.0121883 ]
 [ 0.00747111]]

See also

normalize_data

pytransit.stat_tools module

pytransit.stat_tools.BH_fdr_correction(X)[source]

Adjusts p-values using the Benjamini Hochberg procedure

pytransit.stat_tools.FWER_Bayes(X)[source]
pytransit.stat_tools.F_mean_diff_dict(*args, **kwargs)[source]
pytransit.stat_tools.F_mean_diff_flat(*args, **kwargs)[source]
pytransit.stat_tools.F_shuffle_dict_libraries(*args, **kwargs)[source]
pytransit.stat_tools.F_shuffle_flat(*args, **kwargs)[source]
pytransit.stat_tools.F_sum_diff_dict(*args, **kwargs)[source]
pytransit.stat_tools.F_sum_diff_flat(*args, **kwargs)[source]
pytransit.stat_tools.HDI_from_MCMC(posterior_samples, credible_mass=0.95)[source]
pytransit.stat_tools.bFDR(X)[source]
pytransit.stat_tools.bayesian_ess_thresholds(Z_raw, ALPHA=0.05)[source]

Returns Essentiality Thresholds using a BH-like procedure

pytransit.stat_tools.binom(k, n, p)[source]

Binomial distribution. Uses Normal approximation for large ‘n’

pytransit.stat_tools.binom_cdf(k, n, p)[source]

CDF of the binomial distribution

pytransit.stat_tools.binom_test(k, n, p, type='two-sided')[source]

Does a binomial test given success, trials and probability.

pytransit.stat_tools.boxcoxTable(X, minlambda, maxlambda, dellambda)[source]

Returns a table of (loglik function, lambda) pairs for the data.

pytransit.stat_tools.boxcoxtransform(x, lambdax)[source]

Performs a box-cox transformation to data vector X. WARNING: elements of X should be all positive! Fixed: ‘>’ has changed to ‘<’

pytransit.stat_tools.comb(n, k)[source]
pytransit.stat_tools.comb1(n, k)[source]
pytransit.stat_tools.combine_lib_dicts(L1, L2)[source]
pytransit.stat_tools.cumulative_average(new_x, n, prev_avg)[source]
pytransit.stat_tools.dberndiff(d, peq, p01, p10)[source]
pytransit.stat_tools.dbinomdiff(d, n, P)[source]
pytransit.stat_tools.fact(n)[source]
pytransit.stat_tools.get_lib_data_dict(data1, ctrl_lib_str, data2, exp_lib_str, nTAs)[source]
pytransit.stat_tools.isEven(x)[source]
pytransit.stat_tools.loess(X, Y, h=10000)[source]
pytransit.stat_tools.loess_correction(X, Y, h=10000, window=100)[source]
pytransit.stat_tools.log_fac(n)[source]
pytransit.stat_tools.loglik(X, lambdax)[source]

Computes the log-likelihood function for a transformed vector Xtransform.

pytransit.stat_tools.multinomial(K, P)[source]
pytransit.stat_tools.my_perm(d, n)[source]
pytransit.stat_tools.norm(x, mu, sigma)[source]

Normal distribution

pytransit.stat_tools.parse_lib_index(nData, libstr, nTAs)[source]
pytransit.stat_tools.phi_coefficient(X, Y)[source]

Calculates the phi-coefficient for two bool arrays

pytransit.stat_tools.qberndiff(d, peq, p01, p10)[source]
pytransit.stat_tools.qbinomdiff(d, n, peq, p01, p10)[source]
pytransit.stat_tools.regress(X, Y)[source]

Performs linear regression given two vectors, X, Y.

pytransit.stat_tools.resampling(data1, data2, S=10000, testFunc=<function F_mean_diff_flat>, permFunc=<function F_shuffle_flat>, adaptive=False, lib_str1='', lib_str2='')[source]

Does a permutation test on two sets of data.

Performs the resampling / permutation test given two sets of data using a function defining the test statistic and a function defining how to permute the data.

Parameters:
  • ar – List or numpy array with the first set of observations.
  • data2 – List or numpy array with the second set of observations.
  • S – Number of permutation tests (or samples) to obtain.
  • testFunc – Function defining the desired test statistic. Should accept two lists as arguments. Default is difference in means between the observations.
  • permFunc – Function defining the way to permute the data. Should accept one argument, the combined set of data. Default is random shuffle.
  • adaptive – Cuts-off resampling early depending on significance.
Returns:

Tuple with described values
  • test_obs – Test statistic of observation.
  • mean1 – Arithmetic mean of first set of data.
  • mean2 – Arithmetic mean of second set of data.
  • log2FC – Normalized log2FC the means.
  • pval_ltail – Lower tail p-value.
  • pval_utail – Upper tail p-value.
  • pval_2tail – Two-tailed p-value.
  • test_sample – List of samples of the test statistic.

Example:
>>> import pytransit.stat_tools as stat_tools
>>> import numpy
>>> X = numpy.random.random(100)
>>> Y = numpy.random.random(100)
>>> (test_obs, mean1, mean2, log2fc, pval_ltail, pval_utail, pval_2tail, test_sample) = stat_tools.resampling(X,Y)
>>> pval_2tail
0.2167
>>> test_sample[:3]
[0.076213992904990535, -0.0052513291091412784, -0.0038425140184765172]
pytransit.stat_tools.sample_trunc_norm_post(data, S, mu0, s20, k0, nu0)[source]
pytransit.stat_tools.text_histogram(X, nBins=20, resolution=200, obs=None)[source]
pytransit.stat_tools.transformToRange(X, new_min, new_max, old_min=None, old_max=None)[source]
pytransit.stat_tools.tricoeff(N, S)[source]
pytransit.stat_tools.tricube(X)[source]

pytransit.tnseq_tools module

pytransit.tnseq_tools.ExpectedRuns(n, pnon)[source]

Expected value of the run of non=insertions (Schilling, 1990):

ER_n = log(1/p)(nq) + gamma/ln(1/p) -1/2 + r1(n) + E1(n)
Parameters:
  • n (int) – Integer representing the number of sites.
  • pins (float) – Floating point number representing the probability of non-insertion.
Returns:

Size of the expected maximum run.

Return type:

float

class pytransit.tnseq_tools.Gene(orf, name, desc, reads, position, start=0, end=0, strand='')[source]

Class defining a gene with useful attributes for TnSeq analysis.

This class helps define a “gene” with attributes that facilitate TnSeq analysis. Here “gene” can be defined to be any genomic region. The Genes class (with an s) can be used to define list of Gene objects with more useful operations on the “genome” level.

orf

A string defining the ID of the gene.

name

A string with the human readable name of the gene.

desc

A string with the description of the gene.

reads

List of lists of read-counts in possible site replicate dataset.

position

List of coordinates of the possible sites.

start

An integer defining the start coordinate for the gene.

end

An integer defining the end coordinate for the gene.

strand

A string defining the strand of the gene.

Example:
>>> import pytransit.tnseq_tools as tnseq_tools
>>> G = tnseq_tools.Gene("Rv0001", "dnaA", "DNA Replication A", [[0,0,0,0,1,3,0,1]],  [1,21,32,37,45,58,66,130], strand="+" )
>>> print G
Rv0001  (dnaA)  k=3 n=8 r=4 theta=0.37500
>>> print G.phi()
0.625
>>> print G.tosses
array([ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  1.])

See also

Genes

__eq__(other)[source]

Compares against other gene object.

Returns:True if the gene objects have same orf id.
Return type:bool
__ge__(other)

x.__ge__(y) <==> x>=y

__getitem__(i)[source]

Return read-counts at position i.

Parameters:i (int) – integer of the index of the desired site.
Returns:Reads at position i.
Return type:list
__gt__(other)

x.__gt__(y) <==> x>y

__le__(other)

x.__le__(y) <==> x<=y

__lt__(other)[source]

Compares against other gene object.

Returns:True if the gene object id is less than the other.
Return type:bool
__str__()[source]

Return a string representation of the object.

Returns:Human readable string with some of the attributes.
Return type:str
get_gap_span()[source]

Returns the span of the maxrun of the gene (i.e. number of nucleotides).

Returns:Number of nucleotides spanned by the max run.
Return type:int
get_gene_span()[source]

Returns the number of nucleotides spanned by the gene.

Returns:Number of nucleotides spanned by the gene’s sites.
Return type:int
phi()[source]

Return the non-insertion density (“phi”) for the gene.

Returns:Non-insertion density (i.e. 1 - theta)
Return type:float
theta()[source]

Return the insertion density (“theta”) for the gene.

Returns:Density of the gene (i.e. k/n )
Return type:float
total_reads()[source]

Return the total reads for the gene.

Returns:Total sum of read-counts.
Return type:float
class pytransit.tnseq_tools.Genes(wigList, annotation, norm='nonorm', reps='All', minread=1, ignoreCodon=True, nterm=0.0, cterm=0.0, include_nc=False, data=[], position=[], genome='', transposon='himar1')[source]

Class defining a list of Gene objects with useful attributes for TnSeq analysis.

This class helps define a list of Gene objects with attributes that facilitate TnSeq analysis. Includes methods that calculate useful statistics and even rudamentary analysis of essentiality.

wigList

List of paths to datasets in .wig format.

protTable

String with path to annotation in .prot_table format.

norm

String with the normalization used/

reps

String with information on how replicates were handled.

minread

Integer with the minimum magnitude of read-count considered.

ignoreCodon

Boolean defining whether to ignore the start/stop codon.

nterm

Float number of the fraction of the N-terminus to ignore.

cterm

Float number of the fraction of the C-terminus to ignore.

include_nc

Boolean determining whether to include non-coding areas.

orf2index

Dictionary of orf id to index in the genes list.

genes

List of the Gene objects.

Example:
>>> import pytransit.tnseq_tools as tnseq_tools
>>> G = tnseq_tools.Genes(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"], "transit/genomes/H37Rv.prot_table", norm="TTR")
>>> print G
Genes Object (N=3990)
>>> print G.global_theta()
0.40853707222816626
>>> print G["Rv0001"]   # Lookup like dictionary
Rv0001  (dnaA)  k=0 n=31    r=31    theta=0.00000
>>> print G[2]          # Lookup like list
Rv0003  (recF)  k=5 n=35    r=14    theta=0.14286
>>> print G[2].reads
[[  62.            0.            0.            0.            0.            0.
 0.            0.            0.            0.            0.            0.
 0.            0.           63.            0.            0.           13.
46.            0.            1.            0.            0.            0.
 0.            0.            0.            0.            0.            0.
 0.            0.            0.            0.            0.        ]
 [   3.14314432   67.26328843    0.            0.            0.            0.
 0.            0.            0.           35.20321637    0.            0.
 0.            0.           30.80281433    0.          101.20924707
 0.           23.25926796    0.           16.97297932    8.17217523
 0.            0.            2.51451546    3.77177318    0.62862886
 0.            0.           69.14917502    0.            0.            0.
 0.            0.        ]]

See also

Gene

__contains__(item)[source]

Defines __contains__ to check if gene exists in the list.

Parameters:item (str) – String with the id of the gene.
Returns:Boolean with True if item is in the list.
Return type:bool
__getitem__(i)[source]

Defines __getitem__ method so that it works as dictionary and list.

Parameters:i (int) – Integer or string defining index or orf ID desired.
Returns:A gene with the index or ID equal to i.
Return type:Gene
__len__()[source]

Defines __len__ returning number of genes.

Returns:Number of genes in the list.
Return type:int
__str__()[source]

Defines __str__ to print a generic str with the size of the list.

Returns:Human readable string with number of genes in object.
Return type:str
global_insertion()[source]

Returns total number of insertions, i.e. sum of ‘k’ over all genes.

Returns:Total sum of reads across all genes.
Return type:float
global_phi()[source]

Returns global non-insertion frequency, of the library.

Returns:Complement of global theta i.e. 1.0-theta
Return type:float
global_reads()[source]

Returns the reads among the library.

Returns:List of all the data.
Return type:list
global_run()[source]

Returns the run assuming all genes were concatenated together.

Returns:Max run across all genes.
Return type:int
global_sites()[source]

Returns total number of sites, i.e. sum of ‘n’ over all genes.

Returns:Total number of sites across all genes.
Return type:int
global_theta()[source]

Returns global insertion frequency, of the library.

Returns:Total sites with insertions divided by total sites.
Return type:float
local_gap_span()[source]

Returns numpy array with the span of nucleotides of the largest gap, ‘s’, for each gene.

Returns:Numpy array with the span of gap for all genes.
Return type:narray
local_gene_span()[source]

Returns numpy array with the span of nucleotides of the gene, ‘t’, for each gene.

Returns:Numpy array with the span of gene for all genes.
Return type:narray
local_insertions()[source]

Returns numpy array with the number of insertions, ‘k’, for each gene.

Returns:Numpy array with the number of insertions for all genes.
Return type:narray
local_phis()[source]

Returns numpy array of non-insertion frequency, ‘phi’, for each gene.

Returns:Numpy array with the complement of density for all genes.
Return type:narray
local_reads()[source]

Returns numpy array of lists containing the read counts for each gene.

Returns:Numpy array with the list of reads for all genes.
Return type:narray
local_runs()[source]

Returns numpy array with maximum run of non-insertions, ‘r’, for each gene.

Returns:Numpy array with the max run of non-insertions for all genes.
Return type:narray
local_sites()[source]

Returns numpy array with total number of TA sites, ‘n’, for each gene.

Returns:Numpy array with the number of sites for all genes.
Return type:narray
local_thetas()[source]

Returns numpy array of insertion frequencies, ‘theta’, for each gene.

Returns:Numpy array with the density for all genes.
Return type:narray
tosses()[source]

Returns list of bernoulli trials, ‘tosses’, representing insertions in the gene.

Returns:Sites represented as bernoulli trials with insertions as true.
Return type:list
total_reads()[source]

Returns total reads among the library.

Returns:Total sum of read-counts accross all genes.
Return type:float
pytransit.tnseq_tools.GumbelCDF(x, u, B)[source]

CDF of the Gumbel distribution:

e^(-e^( (u-x)/B))
Parameters:
  • x (int) – Length of the max run.
  • u (float) – Location parameter of the Gumbel dist.
  • B (float) – Scale parameter of the Gumbel dist.
Returns:

Cumulative probability o the Gumbel distribution.

Return type:

float

pytransit.tnseq_tools.VarR(n, pnon)[source]

Variance of the expected run of non-insertons (Schilling, 1990):

\[VarR_n = (pi^2)/(6*ln(1/p)^2) + 1/12 + r2(n) + E2(n)\]
Parameters:
  • n (int) – Integer representing the number of sites.
  • pnon (float) – Floating point number representing the probability of non-insertion.
Returns:

Variance of the length of the maximum run.

Return type:

float

pytransit.tnseq_tools.check_wig_includes_zeros(wig_list)[source]

Returns boolean list showing whether the given files include empty sites (zero) or not.

Parameters:wig_list (list) – List of paths to wig files.
Returns:List of boolean values.
Return type:list
pytransit.tnseq_tools.combine_replicates(data, method='Sum')[source]

Returns list of data merged together.

Parameters:
  • data (list) – List of numeric (replicate) data to be merged.
  • method (str) – How to combine the replicate dataset.
Returns:

List of numeric dataset now merged together.

Return type:

list

pytransit.tnseq_tools.getE1(n)[source]

Small Correction term. Defaults to 0.01 for now

pytransit.tnseq_tools.getE2(n)[source]

Small Correction term. Defaults to 0.01 for now

pytransit.tnseq_tools.getGamma()[source]

Euler-Mascheroni constant ~ 0.577215664901

pytransit.tnseq_tools.getR1(n)[source]

Small Correction term. Defaults to 0.000016 for now

pytransit.tnseq_tools.getR2(n)[source]

Small Correction term. Defaults to 0.00006 for now

pytransit.tnseq_tools.get_coordinate_map(galign_path, reverse=False)[source]

Attempts to get mapping of coordinates from galign file.

Parameters:
  • path (str) – Path to .galign file.
  • reverse (bool) – Boolean specifying whether to do A to B or B to A.
Returns:

Dictionary of coordinate in one file to another file.

Return type:

dict

pytransit.tnseq_tools.get_data(wig_list)[source]
Returns a tuple of (data, position) containing a matrix of raw read-counts
, and list of coordinates.
Parameters:

wig_list (list) – List of paths to wig files.

Returns:

Two lists containing data and positions of the wig files given.

Return type:

tuple

Example:
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["data/glycerol_H37Rv_rep1.wig", "data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
pytransit.tnseq_tools.get_data_stats(reads)[source]
pytransit.tnseq_tools.get_data_w_genome(wig_list, genome)[source]
pytransit.tnseq_tools.get_data_zero_fill(wig_list)[source]
Returns a tuple of (data, position) containing a matrix of raw read counts,
and list of coordinates. Positions that are missing are filled in as zero.
Parameters:wig_list (list) – List of paths to wig files.
Returns:Two lists containing data and positions of the wig files given.
Return type:tuple
pytransit.tnseq_tools.get_extended_pos_hash_gff(path, N=None)[source]
pytransit.tnseq_tools.get_extended_pos_hash_pt(path, N=None)[source]
pytransit.tnseq_tools.get_file_types(wig_list)[source]

Returns the transposon type (himar1/tn5) of the list of wig files.

Parameters:wig_list (list) – List of paths to wig files.
Returns:List of transposon type (“himar1” or “tn5”).
Return type:list
pytransit.tnseq_tools.get_gene_info(path)[source]

Returns a dictionary that maps gene id to gene information.

Parameters:path (str) – Path to annotation in .prot_table or GFF3 format.
Returns:
Dictionary of gene id to tuple of information:
  • name
  • description
  • start coordinate
  • end coordinate
  • strand
Return type:dict
pytransit.tnseq_tools.get_gene_info_gff(path)[source]

Returns a dictionary that maps gene id to gene information.

Parameters:path (str) – Path to annotation in GFF3 format.
Returns:
Dictionary of gene id to tuple of information:
  • name
  • description
  • start coordinate
  • end coordinate
  • strand
Return type:dict
pytransit.tnseq_tools.get_gene_info_pt(path)[source]

Returns a dictionary that maps gene id to gene information.

Parameters:path (str) – Path to annotation in .prot_table format.
Returns:
Dictionary of gene id to tuple of information:
  • name
  • description
  • start coordinate
  • end coordinate
  • strand
Return type:dict
pytransit.tnseq_tools.get_genes_in_range(pos_hash, start, end)[source]

Returns list of genes that occur in a given range of coordinates.

Parameters:
  • pos_hash (dict) – Dictionary of position to list of genes.
  • start (int) – Start coordinate of the desired range.
  • end (int) – End coordinate of the desired range.
Returns:

List of genes that fall within range.

Return type:

list

pytransit.tnseq_tools.get_pos_hash(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters:path (str) – Path to annotation in .prot_table or GFF3 format.
Returns:Dictionary of position to list of genes that share that position.
Return type:dict
pytransit.tnseq_tools.get_pos_hash_gff(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters:path (str) – Path to annotation in GFF3 format.
Returns:Dictionary of position to list of genes that share that position.
Return type:dict
pytransit.tnseq_tools.get_pos_hash_pt(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters:path (str) – Path to annotation in .prot_table format.
Returns:Dictionary of position to list of genes that share that position.
Return type:dict
pytransit.tnseq_tools.get_unknown_file_types(wig_list, transposons)[source]
pytransit.tnseq_tools.get_wig_stats(path)[source]

Returns statistics for the given wig file with read-counts.

Parameters:path (str) – String with the path to the wig file of interest.
Returns:
Tuple with the following statistical measures:
  • density
  • mean read
  • non-zero mean
  • non-zero median
  • max read
  • total reads
  • skew
  • kurtosis
Return type:tuple
pytransit.tnseq_tools.griffin_analysis(genes_obj, pins)[source]

Implements the basic Gumbel analysis of runs of non-insertion, described in Griffin et al. 2011.

This analysis method calculates a p-value of observing the maximun run of TA sites without insertions in a row (i.e. a “run”, r). Unusually long runs are indicative of an essential gene or protein domain. Assumes that there is a constant, global probability of observing an insertion (tantamount to a Bernoulli probability of success).

Parameters:
  • genes_obj (Genes) – An object of the Genes class defining the genes.
  • pins (float) – The probability of insertion.
Returns:

List of lists with results and information for the genes. The elements of the list are as follows:
  • ORF ID.
  • Gene Name.
  • Gene Description.
  • Number of TA sites with insertions.
  • Number of TA sites.
  • Length of largest run of non-insertion.
  • Expected run for a gene this size.
  • p-value of the observed run.

Return type:

list

pytransit.tnseq_tools.maxrun(lst, item=0)[source]

Returns the length of the maximum run an item in a given list.

Parameters:
  • lst (list) – List of numeric items.
  • item (float) – Number to look for consecutive runs of.
Returns:

Length of the maximum run of consecutive instances of item.

Return type:

int

pytransit.tnseq_tools.read_combined_wig(fname)[source]

Read the combined wig-file generated by Transit :: Filename -> Tuple([Site], [WigData], [Filename]) Site :: Integer WigData :: [Integer] Filename :: String

pytransit.tnseq_tools.read_genes(fname, descriptions=False)[source]

(Filename, Options) -> [Gene] Gene :: {start, end, rv, gene, strand}

pytransit.tnseq_tools.read_genome(path)[source]

Reads in FASTA formatted genome file.

Parameters:path (str) – Path to .galign file.
Returns:String with the genomic sequence.
Return type:string
pytransit.tnseq_tools.runindex(runs)[source]

Returns a list of the indexes of the start of the runs; complements runs().

Parameters:runs (list) – List of numeric data.
Returns:List of the index of the runs of non-insertions. Non-zero sites are treated as runs of zero.
Return type:list
pytransit.tnseq_tools.runs(data)[source]

Return list of all the runs of consecutive non-insertions.

Parameters:data (list) – List of numeric data.
Returns:List of the length of the runs of non-insertions. Non-zero sites are treated as runs of zero.
Return type:list
pytransit.tnseq_tools.runs_w_info(data)[source]

Return list of all the runs of consecutive non-insertions with the start and end locations.

Parameters:data (list) – List of numeric data to check for runs.
Returns:List of dictionary from run to length and position information of the tun.
Return type:list
pytransit.tnseq_tools.rv_siteindexes_map(genes, TASiteindexMap)[source]

([Gene], {TAsite: Siteindex}) -> {Rv: Siteindex}

pytransit.tnseq_tools.tossify(data)[source]

Reduces the data into Bernoulli trials (or ‘tosses’) based on whether counts were observed or not.

Parameters:data (list) – List of numeric data.
Returns:Data represented as bernoulli trials with >0 as true.
Return type:list

pytransit.transit_tools module

pytransit.transit_tools.ShowAskWarning(MSG='')[source]
pytransit.transit_tools.ShowError(MSG='')[source]
pytransit.transit_tools.ShowMessage(MSG='')[source]
pytransit.transit_tools.aton(aa)[source]
pytransit.transit_tools.basename(filepath)[source]
pytransit.transit_tools.cleanargs(rawargs)[source]

Returns a list and a dictionary with positional and keyword arguments.

-This function assumes flags must start with a “-” and and cannot be a
number (but can include them).
-Flags should either be followed by the value they want to be associated
with (i.e. -p 5) or will be assigned a value of True in the dictionary.
-The dictionary will map flags to the name given minus ONE “-” sign in
front. If you use TWO minus signs in the flag name (i.e. –verbose), the dictionary key will be the name with ONE minus sign in front (i.e. {“-verbose”:True}).
Parameters:rawargs (list) – List of positional/keyword arguments. As obtained from sys.argv.
Returns:
List of positional arguments (i.e. arguments without flags),
in order provided.
dict: Dictionary mapping flag (key is flag minus the first “-“) and
their values.
Return type:list
pytransit.transit_tools.convertToCombinedWig(dataset_list, annotationPath, outputPath, normchoice='nonorm')[source]

Normalizes the input datasets and outputs the result in CombinedWig format.

Parameters:
  • dataset_list (list) – List of paths to datasets in .wig format
  • annotationPath (str) – Path to annotation in .prot_table or GFF3 format.
  • outputPath (str) – Desired output path.
  • normchoice (str) – Choice for normalization method.
pytransit.transit_tools.convertToGeneCountSummary(dataset_list, annotationPath, outputPath, normchoice='nonorm')[source]

Normalizes the input datasets and outputs the result in CombinedWig format.

Parameters:
  • dataset_list (list) – List of paths to datasets in .wig format
  • annotationPath (str) – Path to annotation in .prot_table or GFF3 format.
  • outputPath (str) – Desired output path.
  • normchoice (str) – Choice for normalization method.
pytransit.transit_tools.convertToIGV(self, dataset_list, annotationPath, path, normchoice=None)[source]
pytransit.transit_tools.dirname(filepath)[source]
pytransit.transit_tools.fetch_name(filepath)[source]
pytransit.transit_tools.getTabTableData(path, colnames)[source]
pytransit.transit_tools.get_extended_pos_hash(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters:path (str) – Path to annotation in .prot_table or GFF3 format.
Returns:Dictionary of position to list of genes that share that position.
Return type:dict
pytransit.transit_tools.get_gene_info(path)[source]

Returns a dictionary that maps gene id to gene information.

Parameters:path (str) – Path to annotation in .prot_table or GFF3 format.
Returns:
Dictionary of gene id to tuple of information:
  • name
  • description
  • start coordinate
  • end coordinate
  • strand
Return type:dict
pytransit.transit_tools.get_pos_hash(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters:path (str) – Path to annotation in .prot_table or GFF3 format.
Returns:Dictionary of position to list of genes that share that position.
Return type:dict
pytransit.transit_tools.get_validated_data(wig_list, wxobj=None)[source]
Returns a tuple of (data, position) containing a matrix of raw read-counts
, and list of coordinates.
Parameters:
  • wig_list (list) – List of paths to wig files.
  • wxobj (object) – wxPython GUI object for warnings
Returns:

Two lists containing data and positions of the wig files given.

Return type:

tuple

Example:
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_validated_data(["data/glycerol_H37Rv_rep1.wig", "data/glycerol_H37Rv_rep2.wig"])
>>> print data
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

See also

get_file_types combine_replicates get_data_zero_fill pytransit.norm_tools.normalize_data

pytransit.transit_tools.parseCoords(strand, aa_start, aa_end, start, end)[source]
pytransit.transit_tools.transit_error(text)[source]
pytransit.transit_tools.transit_message(msg='', prefix='')[source]
pytransit.transit_tools.validate_annotation(annotation)[source]
pytransit.transit_tools.validate_both_datasets(ctrldata, expdata)[source]
pytransit.transit_tools.validate_control_datasets(ctrldata)[source]
pytransit.transit_tools.validate_filetypes(datasets, transposons, justWarn=True)[source]
pytransit.transit_tools.validate_transposons_used(datasets, transposons, justWarn=True)[source]
pytransit.transit_tools.validate_wig_format(wig_list, wxobj=None)[source]

Module contents