In statistics, a **data sample** is a set of data collected from a statistical population by a defined procedure. *Population* denotes a set of similar items or events which is of interest for some question or experiment.

## **Definition**

In mathematical terms, given a probability distribution *F*, a random sample of length *n* (where *n* may be any positive integer) is a set of realizations of *n* independent, identically distributed (i. i. d.) random variables with distribution *F*.

A sample concretely represents the results of *n* experiments in which the same quantity is measured. For example, if we want to estimate the average height of members of a particular population, we measure the heights of *n* individuals. Each measurement is drawn from the probability distribution *F* characterizing the population, so each measured height $ x_i $ is the realization of a random variable $ X_i $ with distribution *F*. Note that a set of random variables (i.e., a set of measurable functions) must not be confused with the realizations of these variables (which are the values that these random variables take). In other words, $ X_i $ is a function representing the measurement at the *i*-th experiment and $ x_i=X_i(\omega) $ is the value obtained when making the measurement.

## **Sample size**

The determination of the sample size (the act of choosing the number of observations or replicates to include in a statistical sample) is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. Larger sample sizes generally lead to increased precision when estimating unknown parameters.

## **WikiProject Statistics**

For practical reasons, sample sizes within WikiProject Statistics are divided into four categories:

Excellent: Sample size >= 1,000

- Values should follow the official drop rates with very little deviation. Predictions will be exact.
- Significant deviations from official values almost surely indicate changes inside the game or wrong official values.

- Values usually follow the official drop rates. Predictions will be quite accurate.
- Significant deviations from official values often indicate changes inside the game or even wrong official values.
- Undocumented "hidden rules" can be regarded as proven.

Acceptable: Sample size >= 200

- Values usually do not differ from official drop rates by more than 2 percent.
- Larger deviations may hint at undocumented rules.
- The "acceptable" status is the threshold for moving a box from the
**Candidates**to the main section on Pro Kit Boxes with hidden rules.

- Values can deviate from official drop rates by several percent. Predictions bear the risk of being highly inaccurate.
- Values for smaller sub-groups like Engine or Tech cards can deviate even more as one more item of this kind can change percentages by 10 or more percent.
- Boxes with a "critical" status that are suspected to have hidden rules can be included in the
**Candidates**section on Pro Kit Boxes with hidden rules.