In statistics, a data sample is a set of data collected from a statistical population by a defined procedure. Population denotes a set of similar items or events which is of interest for some question or experiment.
In mathematical terms, given a probability distribution F, a random sample of length n (where n may be any positive integer) is a set of realizations of n independent, identically distributed (i. i. d.) random variables with distribution F.
A sample concretely represents the results of n experiments in which the same quantity is measured. For example, if we want to estimate the average height of members of a particular population, we measure the heights of n individuals. Each measurement is drawn from the probability distribution F characterizing the population, so each measured height $ x_i $ is the realization of a random variable $ X_i $ with distribution F. Note that a set of random variables (i.e., a set of measurable functions) must not be confused with the realizations of these variables (which are the values that these random variables take). In other words, $ X_i $ is a function representing the measurement at the i-th experiment and $ x_i=X_i(\omega) $ is the value obtained when making the measurement.
The determination of the sample size (the act of choosing the number of observations or replicates to include in a statistical sample) is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. Larger sample sizes generally lead to increased precision when estimating unknown parameters.
For practical reasons, sample sizes within WikiProject Statistics are divided into four categories:
- Values should follow the official drop rates with very little deviation. Predictions will be exact.
- Significant deviations from official values almost surely indicate changes inside the game or wrong official values.
- Values usually follow the official drop rates. Predictions will be quite accurate.
- Significant deviations from official values often indicate changes inside the game or even wrong official values.
- Undocumented "hidden rules" can be regarded as proven.
- Values usually do not differ from official drop rates by more than 2 percent.
- Larger deviations may hint at undocumented rules.
- The "acceptable" status is the threshold for moving a box from the Candidates to the main section on Pro Kit Boxes with hidden rules.
- Values can deviate from official drop rates by several percent. Predictions bear the risk of being highly inaccurate.
- Values for smaller sub-groups like Engine or Tech cards can deviate even more as one more item of this kind can change percentages by 10 or more percent.
- Boxes with a "critical" status that are suspected to have hidden rules can be included in the Candidates section on Pro Kit Boxes with hidden rules.