This post is devoted to the calculation of the p-value of both left-tailed and right-tailed z-test. The basic definition of the p-value is in the post about the p-value of the two-tailed test.
We will start with the left-tailed test and we will use the same assignment and number like in the previous post. The level of significance of the test and the value of the statistics is .
In case of the left-tailed z-test test, the p-value is the area under the probability density function from to the value of the statistics, in our case to .
The function which gives us the area under the curve for any value is the probability density function. We will use Microsoft Excel to get the p-value, more specifically, the function NORM.S.DIST (which we have used for the p-value of the two-tailed test). The only difference is that now we do not multiply the result by 2, so we write
to get 0.0256 as a result. The value is lower than 0.05 which confirms our decision to reject the null hypothesis.
We also know that for any level of significance lower than 0.0256 (e.g. 0.01) the null hypothesis would not be rejected. This is the reason why we must always add the level of significance to the statement about rejection of the null hypothesis.
We stay on the value of significance and the value of the statistics is . It is not surprising that the p-value is now the area under the curve of the probability density function from the value of the statistics to (i.e. to the right). You can see it in the figure below.
Getting the exact value is little different now. The distribution function always measures the area to the left. We know that the total area under the curve equal to 1. So we get the “remaining” area by subtracting the value of the statistics from 1. In Microsoft Excel, we will use the formula
to get the result 0.5897. The value is higher than but it is also higher than 0.5. It makes sense because it contains the whole area from to (which is 0.5) plus something more.
If we did not extract the value of the distribution function from 1 then we would get “only” 0.4102 which is the “remaining” white area. So these figures may come handy in checking results of calculations.
In the first article about z-test, we showed how the two-tailed test works. It is also possible to construct an one-tailed test. The one-tailed test differs with a sign of alternative hypothesis. The two-tail test had the inequality sign () in the alternative hypothesis formula. For an one-tailed test, there are two options:
left-tailed test which has less than () sign,
right-tailed test which has greater than () sign.
The formula for the test statistics stay the same, whereas the critical region is now different. It lays only in one side of the axis. More specifically, the whole critical region is in the left side for the left-tailed test or in the right side for the right-tailed test (pretty straightforward if you ask me). The computation of the p-value also changes.
Let’s start with the left-tailed test. Our assignment stays with a small modification. We will assume that the machine does not allow to set the length of the component to more than 190 mm, it can be set to shorter length though. So we only need to check if the components are in average shorter, or not. We now have a new set of 20 observations.
As we said, the null hypothesis stays and the alternative has sign, so:
The critical region consists only of one part now, as you can see at the figure below. Because the area under the probability density function must still equal to the significance level, the border values are closer to zero. You can compare the figure with the figure for two-tailed test.
The formula for the critical region is
Please note that is not divided by 2. We will keep the level of significance on 5 % so the critical region for our example is
We can get the value – 1.6449 from a statistical table. Please note we use the value for 0.95 because the normal distribution is symmetric. We only add the minus sign.
The value of the statistics is:
and it lies in the critical region so we reject the null hypothesis for . The conclusion of the test is that the expected value of the length of the component is lower than 190 mm, i.e. the machine was set incorrectly.
Now we will go through the last variant which is called right-tailed test. We will work with the same example with a one modification: We assume the machine could not be set to produce shorter components than 190 mm but it set be by mistake set to produce longer one. We will use a new sample of data.
The new hypotheses are:
It is not surprising that the critical region now lies in the right part of the x axis.
The formula for the critical region is
and if we stay on we get
The value of the statistics is now
and it does not lies in the critical region so we do not reject the null hypothesis for .
I have created a decision tree which can be used to select a proper statistical test for a statistical hypothesis. If we have a one sample of data and we want to check a hypothesis about a expected value, we can use z-test. This test assumes a variance of the test data is known. If the variance is not known one sample t-test needs to be used. The other assumption is that data has a normal distribution. z-test is one of the simplest statistical tests so we will use it to explain main principles of the hypothesis testing.
Let’s assume we are asked to solve this example: We have a machine which produces components of a specific length. The desired length is 190 mm. The inaccuracy of the machine is known and it is constant and characterized by standard deviation mm. The machine was set up by an employee and we want to check whether it was set correctly. To check this, we measured length of 20 sample components.
The measured values are in a following table.
Note: I tried to write this article as simple as possible. It contains links to more detail information. It uses no special software. A short tutorial for Excel and Python will be written.
At first, we need to formulate hypotheses. Two hypotheses are usually formulated: a null hypothesis and an alternative hypothesis (or ). The null hypothesis usually has an equal sign and the alternative hypothesis always contradicts the null hypothesis. The hypotheses for our examples are:
The null hypothesis : The expected value of components is 190 mm. ()
The alternative hypothesis : The expected value of components is not 190 mm. ()
As we know, the length of components will not be exactly 190 mm because it is affected by the inaccuracy of the machine. Even the average length will not be exactly 190 mm. But the key point of the testing is to say whether the to difference could by explained by the inaccuracy or it must have been caused by an error of the machine setting.
For example, if the average length of the component was 150 mm, it would be obviously caused by error. On the other hand, the average length 190.01 would suggest right setting. But what about 189.6 or 190.9? In these cases, it is impossible to decide out of one’s head and the hypothesis testing comes handy.
Before starting the actual calculation, we need to realize one more think. The outcome of our calculation does not be necessarily right. The reason is we base our decision only on a small sample (20 components), not all of them. For example, we might select a lot of shorter components and then the average length would be significantly lower than 190 mm and we would keen to reject the null hypothesis . This is called Type I error.
On the other hand, another error may happen. If the configuration of the machine was only slightly different (for example 189.99), we may not detect such a small difference. This situation is called Type II error. You can see all possible situation of the table below.
is not rejected
Type II error
Type I error
The good news is that one can set the probability of Type I error. The probability of this error is called a level of significance and it is denoted by . On the contrary, the probability of Type II error is unknown.
Computation of the Test
No we will go through the computation itself. We will start with a classical method which consists of following steps:
Definition of hypotheses
Selection of a test statistics
Computation of a critical region
Computation of a value of the test statistics
Interpretation of the result
We have already defined the hypotheses so we will go to the second point.
Selection of a Test Statistics
The test statistics is basically a formula. Each statistical test has its own test statistics so we basically select the formula by selecting the test. The test statistics of the z-test is
where denotes an average of the sample data, the hypothetical expected value (from ), denotes the standard deviation and a number of observations in the sample. We will use this formula in the 4th step.
Please note the bigger is the difference between the hypothetical expected value and the average value the further is the value of the statistics from 0.
Computation of a Critical Region
The critical region is used for the decision whether is rejected. There is a simple rule: If the value of the statistics is on element of the critical region, is rejected. Otherwise it is not rejected.
There is a simple logic behind calculation of the critical region: Assuming the is true, the critical region contains the least probable values of the test statistics. To be more specific, if is true then the measured average value will be probably close to the theoretical expected value. On the other hand, the big difference will be improbable. As we know, a big difference between these two values causes the value of the statistics to be significantly different from 0. So the extremely low and extremely high values are improbable and thus they should be in the critical region.
So we will simply cut the least probable value of the statistics. To do this, we need to know a statistical distribution of the statistics. The statistics of the z-test has normal (Gauss) distribution.
Let’s set the level of significance to . Both high and low values of the test statistics are suspicious so we will split the critical region into two parts. The first part will contain the least probable low values and the second part will contain the lest probable high probables.
We will split the the level of significance equally into two regions. So we need to identify the lowest values with total probability 2.5 % and the highest values with total probability 2.5 %. To do this, we will use the quantile function. We will denote the quantile function by . So we can write down a formula for the critical region as:
There are many ways how to get values of the quantile function. We can use tabled values which are part of each textbook, software like Microsoft Excel or programming language like Python or R. Let’s start with the most old-fashion way – the statistical table. We can use the table here. The desired quantile is . Because the normal distribution is symmetric the statistical tables contains values for quantile 0.5 and higher. So we need to get a value for quantile . Now we can find the desired value: 1.96.
The lower border value of the critical region is – 1.96. Now we can write down the critical region:
Computation of a Value of the Test Statistics
The computation is quite easy. We can substitute , and . The arithmetical mean of values is (you can check it in Excel or with a calculator). So the value of the statistics is
Interpretation of the Result
The interpretation is simple. The value of the statistics is not element of the critical area so we do not reject (at ).
It is never said that . As we said earlier, our result may be wrong because of possibility of Type II error. So we do not know a probability that our outcome is true.
Conclusion and Other Resources
This example was quite simple and many more things may be shown: one-tailed tests, calculation of p-value and tests for many other hypothesis.