Definitions
Mean or Expected Value: μx=⟨x⟩=1NN∑i=1xi
Sample Variance: S2x=1N−1N∑i=1(xi−⟨x⟩)2
Correlated Sample Variance: S2uv=1N−1N∑i=1(ui−⟨u⟩)(vi−⟨v⟩)
Standard Deviation: σx=Sx(N→∞)
Degrees of Freedom: DOF=N(Measurements)−N(Distribution Parameters)
"Uncertainty of Mean": σ√N
"Variance of Mean": σ2N
Sample Distributions
Sample distributions of measurement values are used to infer the parent distribution of a physical quantity.
Poisson: Similar to binomial, except that the average number of outcomes is less than the possible number of outcomes (p→0) (see Poisson Limit Theorem) and the possible number of outcomes is large. Discrete distribution used to model random and rare events such as photon counts in a PMT. Probability of x observations out of a large number with an expected value of λ is P(x;λ)=λxx!e−λ μ=λ σ2=λ If the distribution is presented as a histogram (un-normalized), the uncertainty in each bin is 1 (x events are observed or they are not). This implies that the standard deviation for each bin is σbin=√nbin .
Gaussian: Continuous distribution of measurement values, commonly used to describe physical quantities that are subject to broadening due to thermal (versus collisional) processes like Doppler shifts. The probability of observing the measurement value (not the number of observations) x of a quantity with a presumed Gaussian parent distribution with mean μ and standard deviation σ is P(x;μ,σ)=1σ√2πe−12(x−μσ)2 FWHM=2.354σ If the sample size is small, the mean and standard deviation of the parent distribution are necessarily poorly-determined and a T-distribution should be used. Typically, we replace the leading σ -dependent coefficient with a new fitting parameter for a total of three parameters since Gaussian distributions in measurements of raw data are always scaled by some number and are not normalized.
Lorentzian: Continuous distribution of measurement values, used to describe physical quantities that are subject to broadening due to a random (Poisson) reset process such as collisions in a gas. Also known as the Cauchy distribution. The probability of observing the measurement value x of a quantity with parent distribution characterized by expected value μ and FWHM of 2γ is P(x;μ,γ)=1πγγ2(x−μ)2+γ2 Again, we typically use a three-parameter fit as for the Gaussian distribution where we are fitting a physical measurement and not a normalized probability distribution. Also, note that the standard deviation is not defined for the Lorentzian and we use the FWHM instead.
Voigt: The line shape resulting from the convolution of a Gaussian and Lorentzian is a non-analytic function called the Voigt function. There exists a simple relationship between the FWHM of each type of profile fVfG=0.5346(fLfG)+√1+0.2166(fLfG) which can be used to estimate the ratio of the Lorentzian and Gaussian FWHM if a measurement technique provides a knob to increase and decrease the collisional-broadening process with respect to a roughly constant thermal broadening process (which may come from the measurement method itself).
Fitting Data
F Test: A future post will address this test and ANOVA in great detail.
Maximum-Likelihood: A future post will address this in great detail.
Orthogonal Distance Regression: ODR is a very useful fitting method for data which has known uncertainty of the independent as well as the dependent variables. Usually, we would like for the uncertainty in independent variables to be small so we can ignore this aspect of the analysis, but it never hurts to be thorough. In theory, ODR will provide a more accurate value of the uncertainty of the model fitting parameters without needing to go through the trouble of programming a Monte Carlo simulation that varies the independent variables. There is a very nice package for ODR in SciPy. My experience has shown that SciPy ODR needs to be nearly on top of the minimized solution so that running SciPy curve_fit or similar to provide an initial fit is almost always necessary.
References
Bevington, P.; Robinson, D. K. Data Reduction and Error Analysis for the Physical Sciences, 3rd ed.; McGraw Hill: Boston, 2003.
Casella, G.; Berger, R. L. Statistical Inference, 2nd ed.; Duxbury: Pacific Grove, CA, 2001.