Python: module thinkbayes2

thinkbayes2

index
/home/downey/thinkstats/trunk/ThinkBayes2/code/thinkbayes2.py

Modules

bisect
copy
logging
math
numpy
pandas
random
re
scipy
thinkplot

Classes



__builtin__.object

Beta
Cdf
Dirichlet
FixedWidthVariables
HypothesisTest
Interpolator
Pdf

EstimatedPdf
ExponentialPdf
NormalPdf

exceptions.Exception(exceptions.BaseException)

UnimplementedMethodException

_DictWrapper(__builtin__.object)

Hist
Pmf

Joint
Suite

class Beta(__builtin__.object)

    Represents a Beta distribution. See http://en.wikipedia.org/wiki/Beta_distribution

Methods defined here:

EvalPdf(self, x)
Evaluates the PDF at x.

MakeCdf(self, steps=101)
Returns the CDF of this distribution.

MakePmf(self, steps=101, label=None)
Returns a Pmf of this distribution. Note: Normally, we just evaluate the PDF at a sequence of points and treat the probability density as a probability mass. But if alpha or beta is less than one, we have to be more careful because the PDF goes to infinity at x=0 and x=1.  In that case we evaluate the CDF and compute differences.

Mean(self)
Computes the mean of this distribution.

Random(self)
Generates a random variate from this distribution.

Sample(self, n)
Generates a random sample from this distribution. n: int sample size

Update(self, data)
Updates a Beta distribution. data: pair of int (heads, tails)

__init__(self, alpha=1, beta=1, label=None)
Initializes a Beta distribution.

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Cdf(__builtin__.object)

    Represents a cumulative distribution function. Attributes:     xs: sequence of values     ps: sequence of probabilities     label: string used as a graph label.

Methods defined here:

ConfidenceInterval = CredibleInterval(self, percentage=90)

Copy(self, label=None)
Returns a copy of this Cdf. label: string label for the new Cdf

CredibleInterval(self, percentage=90)
Computes the central credible interval. If percentage=90, computes the 90% CI. Args:     percentage: float between 0 and 100 Returns:     sequence of two floats, low and high

Items(self)
Returns a sorted sequence of (value, probability) pairs. Note: in Python3, returns an iterator.

MakePmf(self, label=None)
Makes a Pmf.

Max(self, k)
Computes the CDF of the maximum of k selections from this dist. k: int returns: new Cdf

Mean(self)
Computes the mean of a CDF. Returns:     float mean

Percentile(self, p)
Returns the value that corresponds to percentile p. Args:     p: number in the range [0, 100] Returns:     number value

PercentileRank(self, x)
Returns the percentile rank of the value x. x: potential value in the CDF returns: percentile rank in the range 0 to 100

Prob(self, x)
Returns CDF(x), the probability that corresponds to value x. Args:     x: number Returns:     float probability

ProbArray = Probs(self, xs)

Probs(self, xs)
Gets probabilities for a sequence of values. xs: any sequence that can be converted to NumPy array returns: NumPy array of cumulative probabilities

Random(self)
Chooses a random value from this distribution.

Render(self, **options)
Generates a sequence of points suitable for plotting. An empirical CDF is a step function; linear interpolation can be misleading. Note: options are ignored Returns:     tuple of (xs, ps)

Sample(self, n)
Generates a random sample from this distribution. Args:     n: int length of the sample

Scale(self, factor)
Multiplies the xs by a factor. factor: what to multiply by

Shift(self, term)
Adds a term to the xs. term: how much to add

Value(self, p)
Returns InverseCDF(p), the value that corresponds to probability p. Args:     p: number in the range [0, 1] Returns:     number value

ValueArray(self, ps)
Returns InverseCDF(p), the value that corresponds to probability p. Args:     ps: NumPy array of numbers in the range [0, 1] Returns:     NumPy array of values

Values(self)
Returns a sorted list of values.

__delitem__(self)

__eq__(self, other)

__getitem__(self, x)

__init__(self, obj=None, ps=None, label=None)
Initializes. If ps is provided, obj must be the corresponding list of values. obj: Hist, Pmf, Cdf, Pdf, dict, pandas Series, list of pairs ps: list of cumulative probabilities label: string label

__len__(self)

__repr__ = __str__(self)

__setitem__(self)

__str__(self)

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Dirichlet(__builtin__.object)

    Represents a Dirichlet distribution. See http://en.wikipedia.org/wiki/Dirichlet_distribution

Methods defined here:

Likelihood(self, data)
Computes the likelihood of the data. Selects a random vector of probabilities from this distribution. Returns: float probability

LogLikelihood(self, data)
Computes the log likelihood of the data. Selects a random vector of probabilities from this distribution. Returns: float log probability

MarginalBeta(self, i)
Computes the marginal distribution of the ith element. See http://en.wikipedia.org/wiki/Dirichlet_distribution #Marginal_distributions i: int Returns: Beta object

PredictivePmf(self, xs, label=None)
Makes a predictive distribution. xs: values to go into the Pmf Returns: Pmf that maps from x to the mean prevalence of x

Random(self)
Generates a random variate from this distribution. Returns: normalized vector of fractions

Update(self, data)
Updates a Dirichlet distribution. data: sequence of observations, in order corresponding to params

__init__(self, n, conc=1, label=None)
Initializes a Dirichlet distribution. n: number of dimensions conc: concentration parameter (smaller yields more concentration) label: string label

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class EstimatedPdf(Pdf)

    Represents a PDF estimated by KDE.

Method resolution order:

EstimatedPdf

Pdf

__builtin__.object

Methods defined here:

Density(self, xs)
Evaluates this Pdf at xs. returns: float or NumPy array of probability density

GetLinspace(self)
Get a linspace for plotting. Returns: numpy array

__init__(self, sample, label=None)
Estimates the density function based on a sample. sample: sequence of data label: string

__str__(self)

Methods inherited from Pdf:

Items(self)
Generates a sequence of (value, probability) pairs.

MakePmf(self, **options)
Makes a discrete version of this Pdf. options can include label: string low: low end of range high: high end of range n: number of places to evaluate Returns: new Pmf

Render(self, **options)
Generates a sequence of points suitable for plotting. Returns:     tuple of (xs, densities)

Data descriptors inherited from Pdf:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class ExponentialPdf(Pdf)

    Represents the PDF of an exponential distribution.

Method resolution order:

ExponentialPdf

Pdf

__builtin__.object

Methods defined here:

Density(self, xs)
Evaluates this Pdf at xs. xs: scalar or sequence of floats returns: float or NumPy array of probability density

GetLinspace(self)
Get a linspace for plotting. Returns: numpy array

__init__(self, lam=1, label=None)
Constructs an exponential Pdf with given parameter. lam: rate parameter label: string

__str__(self)

Methods inherited from Pdf:

Items(self)
Generates a sequence of (value, probability) pairs.

MakePmf(self, **options)
Makes a discrete version of this Pdf. options can include label: string low: low end of range high: high end of range n: number of places to evaluate Returns: new Pmf

Render(self, **options)
Generates a sequence of points suitable for plotting. Returns:     tuple of (xs, densities)

Data descriptors inherited from Pdf:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class FixedWidthVariables(__builtin__.object)

    Represents a set of variables in a fixed width file.

Methods defined here:

ReadFixedWidth(self, filename, **options)
Reads a fixed width ASCII file. filename: string filename returns: DataFrame

__init__(self, variables, index_base=0)
Initializes. variables: DataFrame index_base: are the indices 0 or 1 based? Attributes: colspecs: list of (start, end) index tuples names: list of string variable names

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Hist(_DictWrapper)

    Represents a histogram, which is a map from values to frequencies. Values can be any hashable type; frequencies are integer counters.

Method resolution order:

Hist

_DictWrapper

__builtin__.object

Methods defined here:

Freq(self, x)
Gets the frequency associated with the value x. Args:     x: number value Returns:     int frequency

Freqs(self, xs)
Gets frequencies for a sequence of values.

IsSubset(self, other)
Checks whether the values in this histogram are a subset of the values in the given histogram.

Subtract(self, other)
Subtracts the values in the given histogram from this histogram.

Methods inherited from _DictWrapper:

Copy(self, label=None)
Returns a copy. Make a shallow copy of d.  If you want a deep copy of d, use copy.deepcopy on the whole object. label: string label for the new Hist returns: new _DictWrapper with the same type

Exp(self, m=None)
Exponentiates the probabilities. m: how much to shift the ps before exponentiating If m is None, normalizes so that the largest prob is 1.

GetDict(self)
Gets the dictionary.

Incr(self, x, term=1)
Increments the freq/prob associated with the value x. Args:     x: number value     term: how much to increment by

Items(self)
Gets an unsorted sequence of (value, freq/prob) pairs.

Largest(self, n=10)
Returns the largest n values, with frequency/probability. n: number of items to return

Log(self, m=None)
Log transforms the probabilities. Removes values with probability 0. Normalizes so that the largest logprob is 0.

MakeCdf(self, label=None)
Makes a Cdf.

MaxLike(self)
Returns the largest frequency/probability in the map.

Mult(self, x, factor)
Scales the freq/prob associated with the value x. Args:     x: number value     factor: how much to multiply by

Print(self)
Prints the values and freqs/probs in ascending order.

Remove(self, x)
Removes a value. Throws an exception if the value is not there. Args:     x: value to remove

Render(self, **options)
Generates a sequence of points suitable for plotting. Note: options are ignored Returns:     tuple of (sorted value sequence, freq/prob sequence)

Scale(self, factor)
Multiplies the values by a factor. factor: what to multiply by Returns: new object

Set(self, x, y=0)
Sets the freq/prob associated with the value x. Args:     x: number value     y: number freq or prob

SetDict(self, d)
Sets the dictionary.

Smallest(self, n=10)
Returns the smallest n values, with frequency/probability. n: number of items to return

Total(self)
Returns the total of the frequencies/probabilities in the map.

Values(self)
Gets an unsorted sequence of values. Note: one source of confusion is that the keys of this dictionary are the values of the Hist/Pmf, and the values of the dictionary are frequencies/probabilities.

__contains__(self, value)

__delitem__(self, value)

__eq__(self, other)

__getitem__(self, value)

__hash__(self)

__init__(self, obj=None, label=None)
Initializes the distribution. obj: Hist, Pmf, Cdf, Pdf, dict, pandas Series, list of pairs label: string label

__iter__(self)

__len__(self)

__repr__ = __str__(self)

__setitem__(self, value, prob)

__str__(self)

iterkeys(self)
Returns an iterator over keys.

Data descriptors inherited from _DictWrapper:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class HypothesisTest(__builtin__.object)

    Represents a hypothesis test.

Methods defined here:

MakeModel(self)
Build a model of the null hypothesis.

MaxTestStat(self)
Returns the largest test statistic seen during simulations.

PValue(self, iters=1000)
Computes the distribution of the test statistic and p-value. iters: number of iterations returns: float p-value

PlotCdf(self, label=None)
Draws a Cdf with vertical lines at the observed test stat.

RunModel(self)
Run the model of the null hypothesis. returns: simulated data

TestStatistic(self, data)
Computes the test statistic. data: data in whatever form is relevant

__init__(self, data)
Initializes. data: data in whatever form is relevant

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Interpolator(__builtin__.object)

    Represents a mapping between sorted sequences; performs linear interp. Attributes:     xs: sorted list     ys: sorted list

Methods defined here:

Lookup(self, x)
Looks up x and returns the corresponding value of y.

Reverse(self, y)
Looks up y and returns the corresponding value of x.

__init__(self, xs, ys)

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Joint(Pmf)

    Represents a joint distribution. The values are sequences (usually tuples)

Method resolution order:

Joint

Pmf

_DictWrapper

__builtin__.object

Methods defined here:

Conditional(self, i, j, val, label=None)
Gets the conditional distribution of the indicated variable. Distribution of vs[i], conditioned on vs[j] = val. i: index of the variable we want j: which variable is conditioned on val: the value the jth variable has to have Returns: Pmf

Marginal(self, i, label=None)
Gets the marginal distribution of the indicated variable. i: index of the variable we want Returns: Pmf

MaxLikeInterval(self, percentage=90)
Returns the maximum-likelihood credible interval. If percentage=90, computes a 90% CI containing the values with the highest likelihoods. percentage: float between 0 and 100 Returns: list of values from the suite

Methods inherited from Pmf:

AddConstant(self, other)
Computes the Pmf of the sum a constant and values from self. other: a number returns: new Pmf

AddPmf(self, other)
Computes the Pmf of the sum of values drawn from self and other. other: another Pmf returns: new Pmf

CredibleInterval(self, percentage=90)
Computes the central credible interval. If percentage=90, computes the 90% CI. Args:     percentage: float between 0 and 100 Returns:     sequence of two floats, low and high

Max(self, k)
Computes the CDF of the maximum of k selections from this dist. k: int returns: new Cdf

MaximumLikelihood(self)
Returns the value with the highest probability. Returns: float probability

Mean(self)
Computes the mean of a PMF. Returns:     float mean

Normalize(self, fraction=1.0)
Normalizes this PMF so the sum of all probs is fraction. Args:     fraction: what the total should be after normalization Returns: the total probability before normalizing

Percentile(self, percentage)
Computes a percentile of a given Pmf. Note: this is not super efficient.  If you are planning to compute more than a few percentiles, compute the Cdf. percentage: float 0-100 returns: value from the Pmf

Prob(self, x, default=0)
Gets the probability associated with the value x. Args:     x: number value     default: value to return if the key is not there Returns:     float probability

ProbGreater(self, x)
Probability that a sample from this Pmf exceeds x. x: number returns: float probability

ProbLess(self, x)
Probability that a sample from this Pmf is less than x. x: number returns: float probability

Probs(self, xs)
Gets probabilities for a sequence of values.

Random(self)
Chooses a random element from this PMF. Note: this is not very efficient.  If you plan to call this more than a few times, consider converting to a CDF. Returns:     float value from the Pmf

SubPmf(self, other)
Computes the Pmf of the diff of values drawn from self and other. other: another Pmf returns: new Pmf

Var(self, mu=None)
Computes the variance of a PMF. mu: the point around which the variance is computed;         if omitted, computes the mean returns: float variance

__add__(self, other)
Computes the Pmf of the sum of values drawn from self and other. other: another Pmf or a scalar returns: new Pmf

__ge__(self, obj)
Greater than or equal. obj: number or _DictWrapper returns: float probability

__gt__(self, obj)
Greater than. obj: number or _DictWrapper returns: float probability

__le__(self, obj)
Less than or equal. obj: number or _DictWrapper returns: float probability

__lt__(self, obj)
Less than. obj: number or _DictWrapper returns: float probability

__sub__(self, other)
Computes the Pmf of the diff of values drawn from self and other. other: another Pmf returns: new Pmf

Methods inherited from _DictWrapper:

Copy(self, label=None)
Returns a copy. Make a shallow copy of d.  If you want a deep copy of d, use copy.deepcopy on the whole object. label: string label for the new Hist returns: new _DictWrapper with the same type

Exp(self, m=None)
Exponentiates the probabilities. m: how much to shift the ps before exponentiating If m is None, normalizes so that the largest prob is 1.

GetDict(self)
Gets the dictionary.

Incr(self, x, term=1)
Increments the freq/prob associated with the value x. Args:     x: number value     term: how much to increment by

Items(self)
Gets an unsorted sequence of (value, freq/prob) pairs.

Largest(self, n=10)
Returns the largest n values, with frequency/probability. n: number of items to return

Log(self, m=None)
Log transforms the probabilities. Removes values with probability 0. Normalizes so that the largest logprob is 0.

MakeCdf(self, label=None)
Makes a Cdf.

MaxLike(self)
Returns the largest frequency/probability in the map.

Mult(self, x, factor)
Scales the freq/prob associated with the value x. Args:     x: number value     factor: how much to multiply by

Print(self)
Prints the values and freqs/probs in ascending order.

Remove(self, x)
Removes a value. Throws an exception if the value is not there. Args:     x: value to remove

Render(self, **options)
Generates a sequence of points suitable for plotting. Note: options are ignored Returns:     tuple of (sorted value sequence, freq/prob sequence)

Scale(self, factor)
Multiplies the values by a factor. factor: what to multiply by Returns: new object

Set(self, x, y=0)
Sets the freq/prob associated with the value x. Args:     x: number value     y: number freq or prob

SetDict(self, d)
Sets the dictionary.

Smallest(self, n=10)
Returns the smallest n values, with frequency/probability. n: number of items to return

Total(self)
Returns the total of the frequencies/probabilities in the map.

Values(self)
Gets an unsorted sequence of values. Note: one source of confusion is that the keys of this dictionary are the values of the Hist/Pmf, and the values of the dictionary are frequencies/probabilities.

__contains__(self, value)

__delitem__(self, value)

__eq__(self, other)

__getitem__(self, value)

__hash__(self)

__init__(self, obj=None, label=None)
Initializes the distribution. obj: Hist, Pmf, Cdf, Pdf, dict, pandas Series, list of pairs label: string label

__iter__(self)

__len__(self)

__repr__ = __str__(self)

__setitem__(self, value, prob)

__str__(self)

iterkeys(self)
Returns an iterator over keys.

Data descriptors inherited from _DictWrapper:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class NormalPdf(Pdf)

    Represents the PDF of a Normal distribution.

Method resolution order:

NormalPdf

Pdf

__builtin__.object

Methods defined here:

Density(self, xs)
Evaluates this Pdf at xs. xs: scalar or sequence of floats returns: float or NumPy array of probability density

GetLinspace(self)
Get a linspace for plotting. Returns: numpy array

__init__(self, mu=0, sigma=1, label=None)
Constructs a Normal Pdf with given mu and sigma. mu: mean sigma: standard deviation label: string

__str__(self)

Methods inherited from Pdf:

Items(self)
Generates a sequence of (value, probability) pairs.

MakePmf(self, **options)
Makes a discrete version of this Pdf. options can include label: string low: low end of range high: high end of range n: number of places to evaluate Returns: new Pmf

Render(self, **options)
Generates a sequence of points suitable for plotting. Returns:     tuple of (xs, densities)

Data descriptors inherited from Pdf:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Pdf(__builtin__.object)

    Represents a probability density function (PDF).

Methods defined here:

Density(self, x)
Evaluates this Pdf at x. Returns: float or NumPy array of probability density

GetLinspace(self)
Get a linspace for plotting. Not all subclasses of Pdf implement this. Returns: numpy array

Items(self)
Generates a sequence of (value, probability) pairs.

MakePmf(self, **options)
Makes a discrete version of this Pdf. options can include label: string low: low end of range high: high end of range n: number of places to evaluate Returns: new Pmf

Render(self, **options)
Generates a sequence of points suitable for plotting. Returns:     tuple of (xs, densities)

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Pmf(_DictWrapper)

    Represents a probability mass function. Values can be any hashable type; probabilities are floating-point. Pmfs are not necessarily normalized.

Method resolution order:

Pmf

_DictWrapper

__builtin__.object

Methods defined here:

AddConstant(self, other)
Computes the Pmf of the sum a constant and values from self. other: a number returns: new Pmf

AddPmf(self, other)
Computes the Pmf of the sum of values drawn from self and other. other: another Pmf returns: new Pmf

CredibleInterval(self, percentage=90)
Computes the central credible interval. If percentage=90, computes the 90% CI. Args:     percentage: float between 0 and 100 Returns:     sequence of two floats, low and high

Max(self, k)
Computes the CDF of the maximum of k selections from this dist. k: int returns: new Cdf

MaximumLikelihood(self)
Returns the value with the highest probability. Returns: float probability

Mean(self)
Computes the mean of a PMF. Returns:     float mean

Normalize(self, fraction=1.0)
Normalizes this PMF so the sum of all probs is fraction. Args:     fraction: what the total should be after normalization Returns: the total probability before normalizing

Percentile(self, percentage)
Computes a percentile of a given Pmf. Note: this is not super efficient.  If you are planning to compute more than a few percentiles, compute the Cdf. percentage: float 0-100 returns: value from the Pmf

Prob(self, x, default=0)
Gets the probability associated with the value x. Args:     x: number value     default: value to return if the key is not there Returns:     float probability

ProbGreater(self, x)
Probability that a sample from this Pmf exceeds x. x: number returns: float probability

ProbLess(self, x)
Probability that a sample from this Pmf is less than x. x: number returns: float probability

Probs(self, xs)
Gets probabilities for a sequence of values.

Random(self)
Chooses a random element from this PMF. Note: this is not very efficient.  If you plan to call this more than a few times, consider converting to a CDF. Returns:     float value from the Pmf

SubPmf(self, other)
Computes the Pmf of the diff of values drawn from self and other. other: another Pmf returns: new Pmf

Var(self, mu=None)
Computes the variance of a PMF. mu: the point around which the variance is computed;         if omitted, computes the mean returns: float variance

__add__(self, other)
Computes the Pmf of the sum of values drawn from self and other. other: another Pmf or a scalar returns: new Pmf

__ge__(self, obj)
Greater than or equal. obj: number or _DictWrapper returns: float probability

__gt__(self, obj)
Greater than. obj: number or _DictWrapper returns: float probability

__le__(self, obj)
Less than or equal. obj: number or _DictWrapper returns: float probability

__lt__(self, obj)
Less than. obj: number or _DictWrapper returns: float probability

__sub__(self, other)
Computes the Pmf of the diff of values drawn from self and other. other: another Pmf returns: new Pmf

Methods inherited from _DictWrapper:

Copy(self, label=None)
Returns a copy. Make a shallow copy of d.  If you want a deep copy of d, use copy.deepcopy on the whole object. label: string label for the new Hist returns: new _DictWrapper with the same type

Exp(self, m=None)
Exponentiates the probabilities. m: how much to shift the ps before exponentiating If m is None, normalizes so that the largest prob is 1.

GetDict(self)
Gets the dictionary.

Incr(self, x, term=1)
Increments the freq/prob associated with the value x. Args:     x: number value     term: how much to increment by

Items(self)
Gets an unsorted sequence of (value, freq/prob) pairs.

Largest(self, n=10)
Returns the largest n values, with frequency/probability. n: number of items to return

Log(self, m=None)
Log transforms the probabilities. Removes values with probability 0. Normalizes so that the largest logprob is 0.

MakeCdf(self, label=None)
Makes a Cdf.

MaxLike(self)
Returns the largest frequency/probability in the map.

Mult(self, x, factor)
Scales the freq/prob associated with the value x. Args:     x: number value     factor: how much to multiply by

Print(self)
Prints the values and freqs/probs in ascending order.

Remove(self, x)
Removes a value. Throws an exception if the value is not there. Args:     x: value to remove

Render(self, **options)
Generates a sequence of points suitable for plotting. Note: options are ignored Returns:     tuple of (sorted value sequence, freq/prob sequence)

Scale(self, factor)
Multiplies the values by a factor. factor: what to multiply by Returns: new object

Set(self, x, y=0)
Sets the freq/prob associated with the value x. Args:     x: number value     y: number freq or prob

SetDict(self, d)
Sets the dictionary.

Smallest(self, n=10)
Returns the smallest n values, with frequency/probability. n: number of items to return

Total(self)
Returns the total of the frequencies/probabilities in the map.

Values(self)
Gets an unsorted sequence of values. Note: one source of confusion is that the keys of this dictionary are the values of the Hist/Pmf, and the values of the dictionary are frequencies/probabilities.

__contains__(self, value)

__delitem__(self, value)

__eq__(self, other)

__getitem__(self, value)

__hash__(self)

__init__(self, obj=None, label=None)
Initializes the distribution. obj: Hist, Pmf, Cdf, Pdf, dict, pandas Series, list of pairs label: string label

__iter__(self)

__len__(self)

__repr__ = __str__(self)

__setitem__(self, value, prob)

__str__(self)

iterkeys(self)
Returns an iterator over keys.

Data descriptors inherited from _DictWrapper:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Suite(Pmf)

    Represents a suite of hypotheses and their probabilities.

Method resolution order:

Suite

Pmf

_DictWrapper

__builtin__.object

Methods defined here:

Likelihood(self, data, hypo)
Computes the likelihood of the data under the hypothesis. hypo: some representation of the hypothesis data: some representation of the data

LogLikelihood(self, data, hypo)
Computes the log likelihood of the data under the hypothesis. hypo: some representation of the hypothesis data: some representation of the data

LogUpdate(self, data)
Updates a suite of hypotheses based on new data. Modifies the suite directly; if you want to keep the original, make a copy. Note: unlike Update, LogUpdate does not normalize. Args:     data: any representation of the data

LogUpdateSet(self, dataset)
Updates each hypothesis based on the dataset. Modifies the suite directly; if you want to keep the original, make a copy. dataset: a sequence of data returns: None

MakeOdds(self)
Transforms from probabilities to odds. Values with prob=0 are removed.

MakeProbs(self)
Transforms from odds to probabilities.

Print(self)
Prints the hypotheses and their probabilities.

Update(self, data)
Updates each hypothesis based on the data. data: any representation of the data returns: the normalizing constant

UpdateSet(self, dataset)
Updates each hypothesis based on the dataset. This is more efficient than calling Update repeatedly because it waits until the end to Normalize. Modifies the suite directly; if you want to keep the original, make a copy. dataset: a sequence of data returns: the normalizing constant

Methods inherited from Pmf:

AddConstant(self, other)
Computes the Pmf of the sum a constant and values from self. other: a number returns: new Pmf

AddPmf(self, other)
Computes the Pmf of the sum of values drawn from self and other. other: another Pmf returns: new Pmf

CredibleInterval(self, percentage=90)
Computes the central credible interval. If percentage=90, computes the 90% CI. Args:     percentage: float between 0 and 100 Returns:     sequence of two floats, low and high

Max(self, k)
Computes the CDF of the maximum of k selections from this dist. k: int returns: new Cdf

MaximumLikelihood(self)
Returns the value with the highest probability. Returns: float probability

Mean(self)
Computes the mean of a PMF. Returns:     float mean

Normalize(self, fraction=1.0)
Normalizes this PMF so the sum of all probs is fraction. Args:     fraction: what the total should be after normalization Returns: the total probability before normalizing

Percentile(self, percentage)
Computes a percentile of a given Pmf. Note: this is not super efficient.  If you are planning to compute more than a few percentiles, compute the Cdf. percentage: float 0-100 returns: value from the Pmf

Prob(self, x, default=0)
Gets the probability associated with the value x. Args:     x: number value     default: value to return if the key is not there Returns:     float probability

ProbGreater(self, x)
Probability that a sample from this Pmf exceeds x. x: number returns: float probability

ProbLess(self, x)
Probability that a sample from this Pmf is less than x. x: number returns: float probability

Probs(self, xs)
Gets probabilities for a sequence of values.

Random(self)
Chooses a random element from this PMF. Note: this is not very efficient.  If you plan to call this more than a few times, consider converting to a CDF. Returns:     float value from the Pmf

SubPmf(self, other)
Computes the Pmf of the diff of values drawn from self and other. other: another Pmf returns: new Pmf

Var(self, mu=None)
Computes the variance of a PMF. mu: the point around which the variance is computed;         if omitted, computes the mean returns: float variance

__add__(self, other)
Computes the Pmf of the sum of values drawn from self and other. other: another Pmf or a scalar returns: new Pmf

__ge__(self, obj)
Greater than or equal. obj: number or _DictWrapper returns: float probability

__gt__(self, obj)
Greater than. obj: number or _DictWrapper returns: float probability

__le__(self, obj)
Less than or equal. obj: number or _DictWrapper returns: float probability

__lt__(self, obj)
Less than. obj: number or _DictWrapper returns: float probability

__sub__(self, other)
Computes the Pmf of the diff of values drawn from self and other. other: another Pmf returns: new Pmf

Methods inherited from _DictWrapper:

Copy(self, label=None)
Returns a copy. Make a shallow copy of d.  If you want a deep copy of d, use copy.deepcopy on the whole object. label: string label for the new Hist returns: new _DictWrapper with the same type

Exp(self, m=None)
Exponentiates the probabilities. m: how much to shift the ps before exponentiating If m is None, normalizes so that the largest prob is 1.

GetDict(self)
Gets the dictionary.

Incr(self, x, term=1)
Increments the freq/prob associated with the value x. Args:     x: number value     term: how much to increment by

Items(self)
Gets an unsorted sequence of (value, freq/prob) pairs.

Largest(self, n=10)
Returns the largest n values, with frequency/probability. n: number of items to return

Log(self, m=None)
Log transforms the probabilities. Removes values with probability 0. Normalizes so that the largest logprob is 0.

MakeCdf(self, label=None)
Makes a Cdf.

MaxLike(self)
Returns the largest frequency/probability in the map.

Mult(self, x, factor)
Scales the freq/prob associated with the value x. Args:     x: number value     factor: how much to multiply by

Remove(self, x)
Removes a value. Throws an exception if the value is not there. Args:     x: value to remove

Render(self, **options)
Generates a sequence of points suitable for plotting. Note: options are ignored Returns:     tuple of (sorted value sequence, freq/prob sequence)

Scale(self, factor)
Multiplies the values by a factor. factor: what to multiply by Returns: new object

Set(self, x, y=0)
Sets the freq/prob associated with the value x. Args:     x: number value     y: number freq or prob

SetDict(self, d)
Sets the dictionary.

Smallest(self, n=10)
Returns the smallest n values, with frequency/probability. n: number of items to return

Total(self)
Returns the total of the frequencies/probabilities in the map.

Values(self)
Gets an unsorted sequence of values. Note: one source of confusion is that the keys of this dictionary are the values of the Hist/Pmf, and the values of the dictionary are frequencies/probabilities.

__contains__(self, value)

__delitem__(self, value)

__eq__(self, other)

__getitem__(self, value)

__hash__(self)

__init__(self, obj=None, label=None)
Initializes the distribution. obj: Hist, Pmf, Cdf, Pdf, dict, pandas Series, list of pairs label: string label

__iter__(self)

__len__(self)

__repr__ = __str__(self)

__setitem__(self, value, prob)

__str__(self)

iterkeys(self)
Returns an iterator over keys.

Data descriptors inherited from _DictWrapper:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class UnimplementedMethodException(exceptions.Exception)

    Exception if someone calls a method that should be overridden.

Method resolution order:

UnimplementedMethodException

exceptions.Exception

exceptions.BaseException

__builtin__.object

Data descriptors defined here:

__weakref__

list of weak references to the object (if defined)

Methods inherited from exceptions.Exception:

__init__(...)
x.__init__(...) initializes x; see help(type(x)) for signature

Data and other attributes inherited from exceptions.Exception:

__new__ = <built-in method __new__ of type object>
T.__new__(S, ...) -> a new object with type S, a subtype of T

Methods inherited from exceptions.BaseException:

__delattr__(...)
x.__delattr__('name') <==> del x.name

__getattribute__(...)
x.__getattribute__('name') <==> x.name

__getitem__(...)
x.__getitem__(y) <==> x[y]

__getslice__(...)
x.__getslice__(i, j) <==> x[i:j] Use of negative indices is not supported.

__reduce__(...)

__repr__(...)
x.__repr__() <==> repr(x)

__setattr__(...)
x.__setattr__('name', value) <==> x.name = value

__setstate__(...)

__str__(...)
x.__str__() <==> str(x)

__unicode__(...)

Data descriptors inherited from exceptions.BaseException:

__dict__

args

message

Functions


BinomialCoef(n, k)
Compute the binomial coefficient "n choose k". n: number of trials k: number of successes Returns: float

CentralMoment(xs, k)
Computes the kth central moment of xs.

CoefDetermination(ys, res)
Computes the coefficient of determination (R^2) for given residuals. Args:     ys: dependent variable     res: residuals      Returns:     float coefficient of determination

CohenEffectSize(group1, group2)
Compute Cohen's d. group1: Series or NumPy array group2: Series or NumPy array returns: float

Corr(xs, ys)
Computes Corr(X, Y). Args:     xs: sequence of values     ys: sequence of values Returns:     Corr(X, Y)

CorrelatedGenerator(rho)
Generates standard normal variates with serial correlation. rho: target coefficient of correlation Returns: iterable

CorrelatedNormalGenerator(mu, sigma, rho)
Generates normal variates with serial correlation. mu: mean of variate sigma: standard deviation of variate rho: target coefficient of correlation Returns: iterable

Cov(xs, ys, meanx=None, meany=None)
Computes Cov(X, Y). Args:     xs: sequence of values     ys: sequence of values     meanx: optional float mean of xs     meany: optional float mean of ys Returns:     Cov(X, Y)

CredibleInterval(pmf, percentage=90)
Computes a credible interval for a given distribution. If percentage=90, computes the 90% CI. Args:     pmf: Pmf object representing a posterior distribution     percentage: float between 0 and 100 Returns:     sequence of two floats, low and high

EvalBinomialPmf(k, n, p)
Evaluates the binomial pmf. Returns the probabily of k successes in n trials with probability p.

EvalExponentialCdf(x, lam)
Evaluates CDF of the exponential distribution with parameter lam.

EvalExponentialPdf(x, lam)
Computes the exponential PDF. x: value lam: parameter lambda in events per unit time returns: float probability density

EvalLognormalCdf(x, mu=0, sigma=1)
Evaluates the CDF of the lognormal distribution. x: float or sequence mu: mean parameter sigma: standard deviation parameter              Returns: float or sequence

EvalNormalCdf(x, mu=0, sigma=1)
Evaluates the CDF of the normal distribution. Args:     x: float     mu: mean parameter          sigma: standard deviation parameter              Returns:     float

EvalNormalCdfInverse(p, mu=0, sigma=1)
Evaluates the inverse CDF of the normal distribution. See http://en.wikipedia.org/wiki/Normal_distribution#Quantile_function   Args:     p: float     mu: mean parameter          sigma: standard deviation parameter              Returns:     float

EvalNormalPdf(x, mu, sigma)
Computes the unnormalized PDF of the normal distribution. x: value mu: mean sigma: standard deviation returns: float probability density

EvalPoissonPmf(k, lam)
Computes the Poisson PMF. k: number of events lam: parameter lambda in events per unit time returns: float probability

FitLine(xs, inter, slope)
Fits a line to the given data. xs: sequence of x returns: tuple of numpy arrays (sorted xs, fit ys)

IQR(xs)
Computes the interquartile of a sequence. xs: sequence or anything else that can initialize a Cdf returns: pair of floats

Jitter(values, jitter=0.5)
Jitters the values by adding a uniform variate in (-jitter, jitter). values: sequence jitter: scalar magnitude of jitter returns: new numpy array

LeastSquares(xs, ys)
Computes a linear least squares fit for ys as a function of xs. Args:     xs: sequence of values     ys: sequence of values Returns:     tuple of (intercept, slope)

LogBinomialCoef(n, k)
Computes the log of the binomial coefficient. http://math.stackexchange.com/questions/64716/ approximating-the-logarithm-of-the-binomial-coefficient n: number of trials k: number of successes Returns: float

MakeCdfFromDict(d, label=None)
Makes a CDF from a dictionary that maps values to frequencies. Args:    d: dictionary that maps values to frequencies.    label: string label for the data. Returns:     Cdf object

MakeCdfFromHist(hist, label=None)
Makes a CDF from a Hist object. Args:    hist: Pmf.Hist object    label: string label for the data. Returns:     Cdf object

MakeCdfFromItems(items, label=None)
Makes a cdf from an unsorted sequence of (value, frequency) pairs. Args:     items: unsorted sequence of (value, frequency) pairs     label: string label for this CDF Returns:     cdf: list of (value, fraction) pairs

MakeCdfFromList(seq, label=None)
Creates a CDF from an unsorted sequence. Args:     seq: unsorted sequence of sortable values     label: string label for the cdf Returns:    Cdf object

MakeCdfFromPmf(pmf, label=None)
Makes a CDF from a Pmf object. Args:    pmf: Pmf.Pmf object    label: string label for the data. Returns:     Cdf object

MakeExponentialPmf(lam, high, n=200)
Makes a PMF discrete approx to an exponential distribution. lam: parameter lambda in events per unit time high: upper bound n: number of values in the Pmf returns: normalized Pmf

MakeHistFromDict(d, label=None)
Makes a histogram from a map from values to frequencies. Args:     d: dictionary that maps values to frequencies     label: string label for this histogram Returns:     Hist object

MakeHistFromList(t, label=None)
Makes a histogram from an unsorted sequence of values. Args:     t: sequence of numbers     label: string label for this histogram Returns:     Hist object

MakeJoint(pmf1, pmf2)
Joint distribution of values from pmf1 and pmf2. Assumes that the PMFs represent independent random variables. Args:     pmf1: Pmf object     pmf2: Pmf object Returns:     Joint pmf of value pairs

MakeMixture(metapmf, label='mix')
Make a mixture distribution. Args:   metapmf: Pmf that maps from Pmfs to probs.   label: string label for the new Pmf. Returns: Pmf object.

MakeNormalPmf(mu, sigma, num_sigmas, n=201)
Makes a PMF discrete approx to a Normal distribution. mu: float mean sigma: float standard deviation num_sigmas: how many sigmas to extend in each direction n: number of values in the Pmf returns: normalized Pmf

MakePmfFromDict(d, label=None)
Makes a PMF from a map from values to probabilities. Args:     d: dictionary that maps values to probabilities     label: string label for this PMF Returns:     Pmf object

MakePmfFromHist(hist, label=None)
Makes a normalized PMF from a Hist object. Args:     hist: Hist object     label: string label Returns:     Pmf object

MakePmfFromItems(t, label=None)
Makes a PMF from a sequence of value-probability pairs Args:     t: sequence of value-probability pairs     label: string label for this PMF Returns:     Pmf object

MakePmfFromList(t, label=None)
Makes a PMF from an unsorted sequence of values. Args:     t: sequence of numbers     label: string label for this PMF Returns:     Pmf object

MakePoissonPmf(lam, high, step=1)
Makes a PMF discrete approx to a Poisson distribution. lam: parameter lambda in events per unit time high: upper bound of the Pmf returns: normalized Pmf

MakeSuiteFromDict(d, label=None)
Makes a suite from a map from values to probabilities. Args:     d: dictionary that maps values to probabilities     label: string label for this suite Returns:     Suite object

MakeSuiteFromHist(hist, label=None)
Makes a normalized suite from a Hist object. Args:     hist: Hist object     label: string label Returns:     Suite object

MakeSuiteFromList(t, label=None)
Makes a suite from an unsorted sequence of values. Args:     t: sequence of numbers     label: string label for this suite Returns:     Suite object

MakeUniformPmf(low, high, n)
Make a uniform Pmf. low: lowest value (inclusive) high: highest value (inclusize) n: number of values

MapToRanks(t)
Returns a list of ranks corresponding to the elements in t. Args:     t: sequence of numbers Returns:     list of integer ranks, starting at 1

Mean(xs)
Computes mean. xs: sequence of values returns: float mean

MeanVar(xs, ddof=0)
Computes mean and variance. Based on http://stackoverflow.com/questions/19391149/ numpy-mean-and-variance-from-single-function xs: sequence of values ddof: delta degrees of freedom returns: pair of float, mean and var

Median(xs)
Computes the median (50th percentile) of a sequence. xs: sequence or anything else that can initialize a Cdf returns: float

NormalProbability(ys, jitter=0.0)
Generates data for a normal probability plot. ys: sequence of values jitter: float magnitude of jitter added to the ys returns: numpy arrays xs, ys

NormalProbabilityPlot(sample, label=None, fit_color='0.8')
Makes a normal probability plot with a fitted line. sample: sequence of numbers label: string label for the data fit_color: color string for the fitted line

Odds(p)
Computes odds for a given probability. Example: p=0.75 means 75 for and 25 against, or 3:1 odds in favor. Note: when p=1, the formula for odds divides by zero, which is normally undefined.  But I think it is reasonable to define Odds(1) to be infinity, so that's what this function does. p: float 0-1 Returns: float odds

PearsonMedianSkewness(xs)
Computes the Pearson median skewness.

PercentileRow(array, p)
Selects the row from a sorted array that maps to percentile p. p: float 0--100 returns: NumPy array (one row)

PercentileRows(ys_seq, percents)
Selects rows from a sequence that map to percentiles. ys_seq: sequence of unsorted rows percents: list of percentiles (0-100) to select returns: list of NumPy arrays

PmfProbEqual(pmf1, pmf2)
Probability that a value from pmf1 equals a value from pmf2. Args:     pmf1: Pmf object     pmf2: Pmf object Returns:     float probability

PmfProbGreater(pmf1, pmf2)
Probability that a value from pmf1 is less than a value from pmf2. Args:     pmf1: Pmf object     pmf2: Pmf object Returns:     float probability

PmfProbLess(pmf1, pmf2)
Probability that a value from pmf1 is less than a value from pmf2. Args:     pmf1: Pmf object     pmf2: Pmf object Returns:     float probability

Probability(o)
Computes the probability corresponding to given odds. Example: o=2 means 2:1 odds in favor, or 2/3 probability o: float odds, strictly positive Returns: float probability

Probability2(yes, no)
Computes the probability corresponding to given odds. Example: yes=2, no=1 means 2:1 odds in favor, or 2/3 probability. yes, no: int or float odds in favor

RandomSeed(x)
Initialize the random and np.random generators. x: int seed

RandomSum(dists)
Chooses a random value from each dist and returns the sum. dists: sequence of Pmf or Cdf objects returns: numerical sum

RawMoment(xs, k)
Computes the kth raw moment of xs.

ReadStataDct(dct_file, **options)
Reads a Stata dictionary file. dct_file: string filename options: dict of options passed to open() returns: FixedWidthVariables object

RenderExpoCdf(lam, low, high, n=101)
Generates sequences of xs and ps for an exponential CDF. lam: parameter low: float high: float n: number of points to render returns: numpy arrays (xs, ps)

RenderNormalCdf(mu, sigma, low, high, n=101)
Generates sequences of xs and ps for a Normal CDF. mu: parameter sigma: parameter low: float high: float n: number of points to render returns: numpy arrays (xs, ps)

RenderParetoCdf(xmin, alpha, low, high, n=50)
Generates sequences of xs and ps for a Pareto CDF. xmin: parameter alpha: parameter low: float high: float n: number of points to render returns: numpy arrays (xs, ps)

Resample(xs, n=None)
Draw a sample from xs with the same length as xs. xs: sequence n: sample size (default: len(xs)) returns: NumPy array

ResampleRows(df)
Resamples rows from a DataFrame. df: DataFrame returns: DataFrame

ResampleRowsWeighted(df, column='finalwgt')
Resamples a DataFrame using probabilities proportional to given column. df: DataFrame column: string column name to use as weights returns: DataFrame

Residuals(xs, ys, inter, slope)
Computes residuals for a linear fit with parameters inter and slope. Args:     xs: independent variable     ys: dependent variable     inter: float intercept     slope: float slope Returns:     list of residuals

SampleRows(df, nrows, replace=False)
Choose a sample of rows from a DataFrame. df: DataFrame nrows: number of rows replace: whether to sample with replacement returns: DataDf

SampleSum(dists, n)
Draws a sample of sums from a list of distributions. dists: sequence of Pmf or Cdf objects n: sample size returns: new Pmf of sums

SerialCorr(series, lag=1)
Computes the serial correlation of a series. series: Series lag: integer number of intervals to shift returns: float correlation

Skewness(xs)
Computes skewness.

Smooth(xs, sigma=2, **options)
Smooths a NumPy array with a Gaussian filter. xs: sequence sigma: standard deviation of the filter

SpearmanCorr(xs, ys)
Computes Spearman's rank correlation. Args:     xs: sequence of values     ys: sequence of values Returns:     float Spearman's correlation

StandardNormalCdf(x)
Evaluates the CDF of the standard Normal distribution. See http://en.wikipedia.org/wiki/Normal_distribution #Cumulative_distribution_function Args:     x: float              Returns:     float

StandardizedMoment(xs, k)
Computes the kth standardized moment of xs.

Std(xs, mu=None, ddof=0)
Computes standard deviation. xs: sequence of values mu: option known mean ddof: delta degrees of freedom returns: float

Trim(t, p=0.01)
Trims the largest and smallest elements of t. Args:     t: sequence of numbers     p: fraction of values to trim off each end Returns:     sequence of values

TrimmedMean(t, p=0.01)
Computes the trimmed mean of a sequence of numbers. Args:     t: sequence of numbers     p: fraction of values to trim off each end Returns:     float

TrimmedMeanVar(t, p=0.01)
Computes the trimmed mean and variance of a sequence of numbers. Side effect: sorts the list. Args:     t: sequence of numbers     p: fraction of values to trim off each end Returns:     float

Var(xs, mu=None, ddof=0)
Computes variance. xs: sequence of values mu: option known mean ddof: delta degrees of freedom returns: float

main()

Data

ROOT2 = 1.4142135623730951
division = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192)
print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)

Data
		ROOT2 = 1.4142135623730951 division = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192) print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)