Quick Start¶

The principal pipeline which JMCtf is designed to streamline is the following:

Combine independent “analyses” into a joint distribution
Sample from the joint distribution
find_MLEs_long
build_test_stats_long

Getting more advanced, we can loop over the whole procedure to simultaneously test many alternate hypotheses and compute trial_corrections, as well as define new analysis classes to extend JMCtf with new component probability distributions.

A fast introduction to the package, then, is to see an example of this pipeline in action. So let’s speed through this! For details please see the longer descriptions given at the links above (or under the Basic usage header in the navigation bar).

First, we create some “Analysis” objects and combine them into a JointDistribution object for joint treatment:

>>> import numpy as np
>>> import tensorflow as tf
>>> from JMCTF import NormalAnalysis, BinnedAnalysis, JointDistribution
>>> norm = NormalAnalysis("Test normal", 5, 2) # (name, obs., std.dev.)
>>> bins = [("SR1", 10, 9, 2), # (name, obs., b, sigma_b)
...         ("SR2", 50, 55, 4)]
>>> binned = BinnedAnalysis("Test binned", bins)
>>> joint = JointDistribution([norm,binned])

Next, set some input (e.g. “null”) parameters for the analyses:

>>> free, fixed, nuis = joint.get_parameter_structure()
>>> print("free:", free)
{'Test normal': {'mu': 1}, 'Test binned': {'s': 2}}

>>> null = {'Test normal': {'mu': [[0.]], 'nuisance': None}, 'Test binned': {'s': [[0., 0.]], 'nuisance': None}}
>>> joint_null = joint.fix_parameters(null)

and sample from the joint distribution:

>>> samples = joint_null.sample(1e6)
>>> q_null, joint_fitted_null, all_pars_null = joint_null.fit_all(samples)
# Inspect shapes
>>> print({k: v.shape for k,v in samples.items()})

{'Test normal::x_theta': TensorShape([1000000, 1]),
 'Test normal::x': TensorShape([1000000, 1]),
 'Test binned::x': TensorShape([1000000, 1, 2]),
 'Test binned::n': TensorShape([1000000, 1, 2])}

# Plot all sample distributions
>>> import matplotlib.pyplot as plt
>>> from JMCTF.plotting import plot_sample_dist
>>> fig = plot_sample_dist(samples)
>>> plt.show()

Now let us find the maximum likelihood estimators for all parameters in the JointDistribution for each of these samples, and plot those too:

>>> from JMCTF.plotting import plot_MLE_dist
>>> q_fit, joint_fitted_to_null_samples, all_pars_fit = joint_null.fit_all(samples)
>>> fig = plot_MLE_dist(all_pars_fit)
>>> plt.show()

And the distribution of a log-likelihood-ratio test statistic:

>>> q_null = -2*joint_null.log_pdf(samples)
>>> LLR = q_null - q_fit
>>>

Combine independent “analyses” into a joint distribution

moved

Sample from the joint distribution

moved

Fit MLEs to samples under a hypothesis (or many hypotheses)¶

In the above example, we generated samples from a distribution that was fit to some data that we manually invented. But that is a weird thing to do. More conventionally we want to generate samples under a pre-defined null hypothesis, and use them to understand the distribution of some test statistic under that hypothesis. To do this, it is the parameter values that we want to fix manually, not the samples. We can supply these to the constructor of a JointDistribution object, however we can also introspect a parameter-less JointDistribution to determine what parameters are required in the event that we want to fix them:

>>> free, fixed, nuis = joint.get_parameter_structure()
>>> print(free)
{'Test normal': {'mu': 1}, 'Test binned': {'s': 2}}

As with the sample structure introspection, this dictionary tells us which free parameters we need to supply, and what their dimension should be. The other dictionaries, fixed and nuis, tell us the structure of fixed and nuisance parameters respectively. The analyses themselves know sensible null hypothesis values for these, so we don’t have to supply them, though we do have to explictly ask for the default values by setting the ‘nuisance’ dictionary key to None. See TODO for more details.

So, let us define a null hypothesis, sample from it, and then find MLEs for all parameters for all those sample:

>>> #TODO: Internally ensure float32 format to avoid this verbosity?
>>> null = {'Test normal': {'mu': np.array([[0.]],dtype='float32'), 'nuisance': None},
...         'Test binned': {'s': np.array([[0., 0.]],dtype='float32'), 'nuisance': None}}
>>> joint_null = joint.fix_parameters(null)
>>> # or alternatively one can supply parameters to the constructor:
>>> # joint_null = JointDistribution([norm,binned],null)
>>> samples = joint_null.sample(3)
>>> q_null, joint_fitted_null, all_pars_null = joint_null.fit_all(samples)
>>> print(to_numpy(all_pars_null))
{'Test normal':
 {'mu': array([[[-2.4318216]],
              [[ 0.7213395]],
              [[-1.833349 ]]], dtype=float32),
  'theta': array([[[0.]],
                  [[0.]],
                  [[0.]]], dtype=float32),
  'sigma_t': 0.0},
 'Test binned':
 {'s': array([[[ 0.68332815,  1.2073289 ]],
              [[-0.5019169 ,  0.8478795 ]],
              [[ 1.0781676 , -0.9483687 ]]], dtype=float32),
  'theta': array([[[-0.7318874, -1.5432839]],
                  [[-0.0951565, -1.0360898]],
                  [[-1.4436939,  1.2477738]]], dtype=float32)}}
>>> # Inspect shapes
>>> print({k1: {k2: v2.shape for k2,v2 in v1.items()}
...  for k1,v1 in to_numpy(all_pars_null).items()})
{'Test normal': {'mu': (3, 1, 1),
                 'theta': (3, 1, 1),
                 'sigma_t': ()},
 'Test binned': {'s': (3, 1, 2),
                 'theta': (3, 1, 2)}}

We now need to start discussing the input/output array shapes more carefully. You will notice that we have supplied the parameters as two-dimensional arrays, even though it seems like one dimension should be enough. This is because we can use the JointDistribution class to collect and fit samples from many hypothesis simultaneously. That is, we could supply many alternate input parameters at once (we will do this in section TODO). On the output side, the structure of the fitted MLEs also reflects this possibility. Take for example the shape of the MLE for the parameter ‘s’ from the Test binned analysis. This is (3, 1, 2): 3 is the sample dimension (we have three samples per hypothesis to fit), 1 is the hypothesis dimension (we only provided one input hypothesis), and 2 is the fundamental dimension of ‘s’ (we have two bins in the analysis, each characterised by a single parameter).

For simplicity, JMCTF is restricted to this three-dimensional form. Only one dimension for hypotheses, one dimension for samples, and one dimension for parameters is permitted. If you want to create e.g. a matrix of parameters in a custom Analysis class, it will need to be flattened when returning it to higher JMCTF classes, and if you want a multidimensional array of samples then you will need to sample them in 1D and then reshape. Likewise, arrays of input hypotheses will need to be appropriately flattened.

TODO refer to more detailed shape discussion elsewhere?

Build and analyse test statistics¶

Now that we understand the basic machinery, we can start to do some statistics!