Changes: Sample

Latest revision as of 10:46, 22 July 2019

In statistics, a data sample is a set of data collected from a statistical population by a defined procedure. Population denotes a set of similar items or events which is of interest for some question or experiment.

Definition

In mathematical terms, given a probability distribution F, a random sample of length n (where n may be any positive integer) is a set of realizations of n independent, identically distributed (i. i. d.) random variables with distribution F.

A sample concretely represents the results of n experiments in which the same quantity is measured. For example, if we want to estimate the average height of members of a particular population, we measure the heights of n individuals. Each measurement is drawn from the probability distribution F characterizing the population, so each measured height $x_i$ is the realization of a random variable $X_i$ with distribution F. Note that a set of random variables (i.e., a set of measurable functions) must not be confused with the realizations of these variables (which are the values that these random variables take). In other words, $X_i$ is a function representing the measurement at the i-th experiment and $x_i=X_i(\omega)$ is the value obtained when making the measurement.

Sample size

The determination of the sample size (the act of choosing the number of observations or replicates to include in a statistical sample) is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. Larger sample sizes generally lead to increased precision when estimating unknown parameters.

WikiProject Statistics

For practical reasons, sample sizes within WikiProject Statistics are divided into four categories:

Excellent: Sample size >= 1,000

Values should follow the official drop rates with very little deviation. Predictions will be exact.
Significant deviations from official values almost surely indicate changes inside the game or wrong official values.

Good: Sample size >= 600

Values usually follow the official drop rates. Predictions will be quite accurate.
Significant deviations from official values often indicate changes inside the game or even wrong official values.
Undocumented "hidden rules" can be regarded as proven.

Acceptable: Sample size >= 200

Values usually do not differ from official drop rates by more than 2 percent.
Larger deviations may hint at undocumented rules.
The "acceptable" status is the threshold for moving a box from the Candidates to the main section on Pro Kit Boxes with hidden rules.

Critical: Sample size < 200

Values can deviate from official drop rates by several percent. Predictions bear the risk of being highly inaccurate.
Values for smaller sub-groups like Engine or Tech cards can deviate even more as one more item of this kind can change percentages by 10 or more percent.
Boxes with a "critical" status that are suspected to have hidden rules can be included in the Candidates section on Pro Kit Boxes with hidden rules.

@@ Line 1: / Line 1: @@
-In statistics, a '''data sample''' is a set of data collected from a statistical population by a defined procedure. ''Population'' denotes a set of similar items or events which is of interest for some question or experiment.
+In [[statistics]], a '''data sample''' is a set of data collected from a statistical population by a defined procedure. ''Population'' denotes a set of similar items or [[Event (probability theory)|events]] which is of interest for some question or [[experiment]].
-== Sample size ==
+== '''Definition''' ==
+In mathematical terms, given a [[probability distribution]] ''F'', a random sample of length ''n'' (where ''n'' may be any positive integer) is a set of realizations of ''n'' [[Independence|independent]], identically distributed ([[Independent and identically distributed random variables|{{nobr|i. i. d.}}]]) [[random variable]]s with distribution ''F''.
+A sample concretely represents the results of ''n'' experiments in which the same quantity is measured. For example, if we want to estimate the average height of members of a particular population, we measure the heights of ''n'' individuals. Each measurement is drawn from the probability distribution ''F'' characterizing the population, so each measured height <math>x_i</math> is the realization of a random variable <math>X_i</math> with distribution ''F''. Note that a set of random variables (i.e., a set of measurable functions) must not be confused with the realizations of these variables (which are the values that these random variables take). In other words, <math>X_i</math> is a function representing the measurement at the ''i''-th experiment and <math>x_i=X_i(\omega)</math> is the value obtained when making the measurement.
+== <span id="Anchor sample size"></span>'''Sample size''' ==
 The determination of the sample size (the act of choosing the number of observations or replicates to include in a statistical sample) is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. Larger sample sizes generally lead to increased precision when estimating unknown parameters.
-== WikiProject Statistics ==
+== '''WikiProject Statistics''' ==
-For practical reasons, sample sizes within [[Asphalt Wiki:WikiProject Statistics|WikiProject Statistics]] are divided into three categories:
+For practical reasons, sample sizes within [[Asphalt Wiki:WikiProject Statistics|WikiProject Statistics]] are divided into four categories:
 {{Evaluate sample size|1000|1000}}
-* Values should follow the official [[drop rate]]s with very little deviation. Predictions will be exact.
+* Values should follow the official [[drop rate]]s with very little [[deviation]]. Predictions will be exact.
 * Significant deviations from official values almost surely indicate changes inside the game or wrong official values.
 {{Evaluate sample size|600|1000|600}}
 * Values usually follow the official drop rates. Predictions will be quite accurate.
 * Significant deviations from official values often indicate changes inside the game or even wrong official values.
-* Undocumented "hidden" rules can be regarded as proven.
+* Undocumented "[[Pro Kit Boxes with hidden rules|hidden rules]]" can be regarded as proven.
 {{Evaluate sample size|500|1000|600|200}}
 * Values usually do not differ from official drop rates by more than 2 percent.
 * Larger deviations may hint at undocumented rules.
-* The "acceptable" status is the threshold for
+* The "acceptable" status is the threshold for moving a box from the '''Candidates''' to the main section on [[Pro Kit Boxes with hidden rules]].
-** including the '''Statistics''' section in a Pro Kit Box article
-** moving a box from the '''Candidates''' to the main section on [[Pro Kit Boxes with hidden rules]].
 {{Evaluate sample size|100|1000|600|200}}
 * Values can deviate from official drop rates by several percent. Predictions bear the risk of being highly inaccurate.