## FANDOM

5,204 Pages

In statistics, a data sample is a set of data collected from a statistical population by a defined procedure. Population denotes a set of similar items or events which is of interest for some question or experiment.

## Definition

In mathematical terms, given a probability distribution F, a random sample of length n (where n may be any positive integer) is a set of realizations of n independent, identically distributed (i. i. d.) random variables with distribution F.

A sample concretely represents the results of n experiments in which the same quantity is measured. For example, if we want to estimate the average height of members of a particular population, we measure the heights of n individuals. Each measurement is drawn from the probability distribution F characterizing the population, so each measured height $x_i$ is the realization of a random variable $X_i$ with distribution F. Note that a set of random variables (i.e., a set of measurable functions) must not be confused with the realizations of these variables (which are the values that these random variables take). In other words, $X_i$ is a function representing the measurement at the i-th experiment and $x_i=X_i(\omega)$ is the value obtained when making the measurement.

## Sample size

The determination of the sample size (the act of choosing the number of observations or replicates to include in a statistical sample) is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. Larger sample sizes generally lead to increased precision when estimating unknown parameters.

## WikiProject Statistics

For practical reasons, sample sizes within WikiProject Statistics are divided into four categories:

Excellent: Sample size >= 1,000

• Values should follow the official drop rates with very little deviation. Predictions will be exact.
• Significant deviations from official values almost surely indicate changes inside the game or wrong official values.

Good: Sample size >= 600

• Values usually follow the official drop rates. Predictions will be quite accurate.
• Significant deviations from official values often indicate changes inside the game or even wrong official values.
• Undocumented "hidden rules" can be regarded as proven.

Acceptable: Sample size >= 200

• Values usually do not differ from official drop rates by more than 2 percent.
• Larger deviations may hint at undocumented rules.
• The "acceptable" status is the threshold for moving a box from the Candidates to the main section on Pro Kit Boxes with hidden rules.

Critical: Sample size < 200

• Values can deviate from official drop rates by several percent. Predictions bear the risk of being highly inaccurate.
• Values for smaller sub-groups like Engine or Tech cards can deviate even more as one more item of this kind can change percentages by 10 or more percent.
• Boxes with a "critical" status that are suspected to have hidden rules can be included in the Candidates section on Pro Kit Boxes with hidden rules.
Community content is available under CC-BY-SA unless otherwise noted.