Macro Library stats
A library of Stats functions. Version 1.30, Nov 20, 2021
nCr(n,r)
The Choose function
nPr(n,r)
The Permutations function
mean(array,[weights])
Finds the mean of an array of numbers. Optionally you can provide a
corresponding array of weights or frequencies to do a weighted mean.
variance(array,[weights])
the (sample) variance of an array of numbers. Optionally you can provide a
corresponding array of weights or frequencies to do a weighted variance.
stdev(array, [weights])
the (sample) standard deviation of an array of numbers. Optionally you can provide a
corresponding array of weights or frequencies to do a weighted stdev.
absmeandev(array)
the absolute mean deviation of an array of numbers
percentile(array,percentile)
example: percentile($a,30) would find the 30th percentile of the data
Calculates using the p/100*(N) method (e.g. Triola)
interppercentile(array, percentile, [mode])
Interpolated percentile. Finds the percentile using an interpolated method.
mode=1 (def): Matches Excel's PERCENTILE.EXC, JMP, and recommended by NIST
except that this function will return the lowest/highest value if needed.
mode=2: Matches Excel's PERCENTILE.INC (and older percentile)
mode=3: Matches Mathlab's prctile function
Nplus1percentile(array,percentile)
example: percentile($a,30) would find the 30th percentile of the data
Calculates using the p/100*(N+1) method (e.g. OpenStax).
quartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using percentiles.
TIquartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using the TI-84 method.
Excelquartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using the Excel method, matching the older
QUARTILE function or the newer QUARTILE.INC function.
Excelquartileexc(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using the Excel method, matching
QUARTILE.EXC function.
Nplus1quartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using the N+1 method, which is like
percentiles, but calculated using N+1 (OpenStax).
allquartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Uses all the quartile methods, and returns an "or" joined
string of all unique answers.
median(array)
returns the median of an array of numbers
modes(array)
Returns the mode or modes of the data array as a comma-separated list.
If all the values have the same frequency it returns DNE.
forceonemode(array)
Returns the mode of the data array. If the data does not have one unique
mode, the data array will be altered to only have one mode. Because this
function can alter the given array, be sure to call it before any functions that
use the array.
freqdist(array,label,start,classwidth)
display macro. Returns an HTML table that is a frequency distribution of
the data
array: array of data values
label: name of data values
start: first lower class limit
classwidth: width of the classes
frequency(array,start,classwidth, [end])
Returns an array of frequencies for the data grouped into classes
array: array of data values
start: first lower class limit
classwidth: width of the classes
end: end of last class. optional, but recommended to ensure the resulting array
includes the last class.
countif(array,condition)
Returns count of items in array that meet condition
array: array of data values
condition: a condition, using x for data values
Example: countif($a,"x<3 && x>2")
histogram(array,label,start,classwidth,[labelstart,upper,width,height,showgrid,fill,stroke])
display macro. Creates a histogram from a data set
array: array of data values
label: name of data values
start: first lower class limit
classwidth: width of the classes
labelstart (optional): value to start axis labeling at. Defaults to start
upper (optional): first upper class limit. Defaults to start+classwidth
width,height (optional): width and height in pixels of graph
showgrid (optional): the horizontal grid lines; default is true to show; set false to hide
fill (optional) = the fill color of the bins; default is blue
stroke (optional) = the line color of the bins; default is black
fdhistogram(freqarray,label,start,cw,[labelstart,upper,width,height,showgrid,fill,stroke])
display macro. Creates a histogram from frequency array
freqarray: array of frequencies
label: name of data values
start: first lower class limit
classwidth: width of the classes
labelstart (optional): value to start axis labeling at. Defaults to start
upper (optional): first upper class limit. Defaults to start+classwidth
width,height (optional): width and height in pixels of graph
showgrid (optional): the horizontal grid lines; default is true to show; set false to hide
fill (optional) = the fill color of the bins; default is blue
stroke (optional) = the line color of the bins; default is black
fdbargraph(barlabels,freqarray,label,[width,height,options])
barlabels: array of labels for the bars
freqarray: array of frequencies/heights for the bars
label: general label for bars
width,height (optional): width and height for graph
options (optional): array of options:
options['valuelabels'] = array of value labels, to be placed above bars
options['showgrid'] = false to hide the horizontal grid lines
options['vertlabel'] = label for vertical axis. Defaults to none
options['gap'] = gap (0 ≤ gap < 1) between bars
options['toplabel'] = label for top of chart
options['fill'] = fill color of the bars; default is blue
options['stroke'] = line color of the bars; default is black
piechart(percents, labels, {width, height})
create a piechart
percents: array of pie percents (should total 100%)
labels: array of labels for each pie piece
uses Google Charts API
normrand(mu,sigma,n, [rnd, positive])
returns an array of n random numbers that are normally distributed with given
mean mu and standard deviation sigma. Uses the Box-Muller transform.
specify rnd to round to that many digit
set positive to true to not include negative values
expdistrand(mu, n, [rnd])
returns an array of n random numbers that are exponentially distributed
with given mean mu.
specify rnd to round to that many digits (default 3)
boxplot(array,axislabel,[options])
draws a boxplot based on the data in array, with given axislabel
and optionally a datalabel (to topleft of boxplot)
array also be an array of dataarrays to do comparative boxplots
opts is an array of options:
"datalabels" = array of data labels for comparative boxplots
"showvals" = true to show 5 number summary above boxplot
"showoutliers" = true to put whiskers at values inside 1.5IQR fence and show outliers
"qmethod" = quartile method: "N", "TI", "Excel" or "Nplus1"
N: percentile method, using .25*n
Nplus1: percentile method, using .25*(n+1)
TI: TI calculator method, a mix of n and nplus1 methods
Excel: A method based on (n-1), with some linear interpolation
For backwards compatability, options can also just be an array of datalabels
normalcdf(z,[dec])
calculates the area under the standard normal distribution to the left of the
z-value z, to dec decimals (defaults to 4)
based on someone else's code - can't remember whose!
tcdf(t,df,[dec])
calculates the area under the t-distribution with "df" degrees of freedom
to the left of the t-value t
based on code from www.math.ucla.edu/~tom/distributions/tDist.html
invnormalcdf(p,[dec])
Inverse Normal CDF
finds the z-value with a left-tail area of p, to dec decimals (default 5)
from Odeh & Evans. 1974. AS 70. Applied Statistics. 23: 96-97
invtcdf(p,df,[dec])
the inverse Student's t-distribution
computes the t-value with a left-tail probability of p, with df degrees of freedom
to dec decimal places (default 4)
from Algorithm 396: Student's t-quantiles by G.W. Hill Comm. A.C.M., vol.13(10), 619-620, October 1970
linreg(xarray,yarray)
Computes the linear correlation coefficient, and slope and intercept of
regression line, based on array/list of x-values and array/list of y-values
Returns as array: r,slope,intercept
expreg(xarray,yarray)
Computes the exponential correlation coefficient, and base and intercept of
regression exponential, based on array/list of x-values and array/list of y-values
Returns as array: r,base,intercept
checklineagainstdata(xarray, yarray, student answer, [variable, alpha])
intended for checking a student answer for fitting a line to data. Determines
if the student answer is within the confidence bounds for the regression equation.
xarray, yarray: list/array of data values
student answer: the $stuanswers[$thisq] which is a line equation like "2x+3"
variable: defaults to "x"
alpha: for confidence bound. defaults to .05
return array(answer, showanswer) to be used to set $answer and $showanswer
checkdrawnlineagainstdata(xarray, yarray, student answer, [grade dots, alpha, grid])
intended for checking a student answer for drawing a line fit to data. Determines
if the student answer is within the confidence bounds for the regression equation.
xarray, yarray: list/array of data values
student answer from draw: the $stuanswers[$thisq]
grade dots: default false. If true, will grade that dots of xarray,yarray were plotted
alpha: for confidence bound. defaults to .05
grid: If you've modified the grid, include it here
return array(answer, showanswer) to be used to set $answer and $showanswer
binomialpdf(N,p,x)
Computes the probability of x successes out of N trials
where each trial has probability p of success
binomialcdf(N,p,x)
Computes the probably of <=x successes out of N trials
where each trial has probability p of success
chi2teststat(m)
Computes the test stat sum((E-O)^2/E) given a matrix of values
chi2cdf(x,df)
Computes the area to the left of x under the chi-squared distribution
with df degrees of freedom
invchi2cdf(p,df)
Computes the x value with left-tail probability p under the
chi-squared distribution with df degrees of freedom
fcdf(f,df1,df2)
Returns the area to right of the F-value f for the f-distribution
with df1 and df2 degrees of freedom (technically it's 1-CDF)
Algorithm is accurate to approximately 4-5 decimals
invfcdf(p,df1,df2)
Computes the f-value with probability of p to the right
with degrees of freedom df1 and df2
Algorithm is accurate to approximately 2-4 decimal places
Less accurate for smaller p-values
gamma_cdf(x,shape,[scale,offset])
Calculated the gamma cdf
gamma_inv(p,shape,[scale])
Calculates the inverse gamma cdf
beta_cdf(x,alpha,beta)
Calculated the gamma cdf
beta_inv(p,alpha,beta)
Calculates the inverse gamma cdf
mosaicplot(rowlabels, columnlabels, count matrix, [width, height])
creates a mosaic plot (See http://www.wamap.org/course/showlinkedtextpublic.php?cid=1383&id=82972)
rowlabels: an array of labels for the rows of the display
columnlabels: an array of labels for the columns of the display
count matrix: a 2-dimensional array. $m[1][5] will give the count for
rowlabel[1] and columnlabel[5]
width and height are optional, default to 300 by 300. Does not include labels
csvdownloadlink([filename],string,array,[string,array]...)
Creates a link that downloads the specified data in CSV format. For each column
provide a string header and an array of values. A filename (without the .csv) can
optionally be provided as a first argument.
dotplot(array,label,[dotspacing,labelspacing,width,height])
Display macro. Creates a dotplot from a data set
array: array of data values
label: title of the dotplot that will be placed below the horizontal axis
dot spacing (default 1): horiz spacing of dots; data will be rounded to nearest value
axis spacing (defaults to dot spacing): spacing of axis labels
width,height (default 300x150): width and height in pixels of graph
anova1way(arr1, arr2, [arr3, ...])
Function anova1way() performs one-way analysis of variance (ANOVA) on two or more groups and returns the ANOVA table as an array with each row corresponding to Factor A, error (residual), and totals.
Parameters:
- arr1, arr2, ...: Arrays in the form [2,3,4,5,...]; it also accepts unequal sample sizes.
Returns:
ANOVA table as an array in the following format.
array([SS_A, df_A, MS_A, F_A, P_A], [SS_E, df_E, MS_E], [SS_T, df_T])
where SS is sum of the squares, df is the degree of freedom, MS is mean square, F is F ratio, and P is P value.
And A, E, and T correspond to Factor A, error (residual), and total, respectively. This array can be used in anova_table() function to tabulate data for display.
anova1way_f(arr1, arr2, [arr3,...])
Function anova1way_f() performs one-way analysis of variance (ANOVA) on two or more groups and returns F ratio and the corresponding P value as an array.
Parameters:
- arr1, arr2, ...: Arrays in the form [2,3,4,5,...]; it also accepts unequal sample sizes.
Returns:
F ratio and the corresponding P value as an array in the form [F ratio, P value].
anova2way(arr, [replication = false])
Function anova2way() performs two-way analysis of variance (ANOVA) and returns ANOVA table as an array with each row corresponding to Factor A, Factor B,
their interaction (only with replication), error (residual), and totals.
Parameters:
- arr: An array in the following form:
for twoway WITH replication - example: arr=array([[4,5,6,5],[7,9,8,12],[10,12,11,9]],[[6,6,4,4],[13,15,12,12],[12,13,10,13]]) has two factors, Factor A with three and Factor B with two levels, and there are four replicates in each group.
and for twoway WITHOUT replication - example: arr=array([53,61,51], [47,55,51], [46,52,49], [50,58,54]) has two factors, Factor A with three and Factor B with four levels, and there are only one value in each group.
- replication: Optional - boolean (true or false)- it specifies whether the ANOVA with replication
(multiple observations for each group) or without replication (one observation per group)
is to be performed. The default is false - without replication.
Returns:
ANOVA table as an array in the following format. This array can be used in anova_table() to tabulate data for display.
[[SS_A, df_A, MS_A, F_A, P_A],[SS_B, df_B, MS_B, F_B, P_B],[SS_I, df_I, MS_I, F_I, P_I],[SS_E,df_E,MS_E],[SS_T,df_T]]
where SS is sum of the squares, df is the degree of freedom, MS is mean square, F is F ratio, and P is P value.
And A, B, I, E, and T correspond to Factor A, Factor B, their interaction (only with replication),
error (residual), and total, respectively.
anova2way_f(arr, [replication = false])
Function anova2way_f() performs two-way analysis of variance (ANOVA) and returns F ratio and the corresponding P value for Factor A, Factor B and their interaction (if replication is true).
Parameters:
- arr: An array in the following form:
for twoway WITH replication - example: arr=array([[4,5,6,5],[7,9,8,12],[10,12,11,9]],[[6,6,4,4],[13,15,12,12],[12,13,10,13]]) has two factors, Factor A with three and Factor B with two levels, and there are four replicates in each group.
and for twoway WITHOUT replication - example: arr=array([53,61,51], [47,55,51], [46,52,49], [50,58,54]) has two factors, Factor A with three and Factor B with four levels, and there are only one value in each group.
- replication: Optional - boolean (true or false) it specifies whether the ANOVA with replication
(multiple observations for each group) or without replication (one observation per group)
is to be performed. The default is false - without replication.
Returns:
F ratio and the corresponding P value for Factor A, Factor B and their Interaction (if replication is true)
as an array in the form array([F_A,P_A],[F_B,P_B],[F_I,P_I]).
anova_table(arr, [factor = 1, replication = false, roundto = 12, nameA = "Factor A", nameB = "Factor B"])
Function anova_table() returns ANOVA table for both oneway and twoway ANOVA - display only. The output of anova1way() and
anova2way() can be used as the input array for this function.
Parameters:
- arr: for oneway: arr=array([SS_A, df_A, MS_A, F_A, P_A],[SS_E,df_E,MS_E],[SS_T,df_T])
for twoway WITHOUT replication: arr=array([SS_A, df_A, MS_A, F_A, P_A],[SS_B, df_B, MS_B, F_B, P_B],[SS_E,df_E,MS_E],[SS_T,df_T])
and for twoway WITH replication: arr=array([SS_A, df_A, MS_A, F_A, P_A],[SS_B, df_B, MS_B, F_B, P_B],[SS_I, df_I, MS_I, F_I, P_I],[SS_E,df_E,MS_E],[SS_T,df_T])
- factor: number of factors considered in ANOVA - 1 for one-way and 2 for two-way. The default is 1, one-way ANOVA.
- replication: Optional - boolean (true or false)- it specifies whether the ANOVA with replication
(multiple observations for each group) or without replication (one observation per group)
is to be performed. The default is false - without replication.
- roundto: Optional - number of decimal places to which data should be rounded off;
the default is 12 decimal places.
- NameA: Optional - the name of factor A as string to be displayed in the table. Default is "Factor A".
- NameB: Optional - the name of factor B as string to be displayed in the table. Default is "Factor B".
Returns:
ANOVA table for displaying data.
student_t(arr1, arr2, [equalVar = False, paired = False, roundto = 12])
Function student_t() computes t statistic and corresponding P-value for two-sample student t-test.
Parameters:
- arr1, arr2: Arrays in the form [2,3,4,5,...]; unequal sample sizes are accepted for independent samples.
- equalVar: Optional - Boolean. Set to true for equal population variances; default is false.
- paired: Optional - Boolean. Set to true for paired (dependent) samples; default is false.
- roundto: Optional - number of decimal places to which data should be rounded off; default is 12 decimal places.
Returns:
t statistic, corresponding P-value (area to the right of t-value - one-tail), and degree of freedom for two sample student t-test as an array in the form [t, P-value, df].