practical nonparametric statistics pdf free download
A statistical method is called non-parametric if it makes no assumption on the population distribution or sample size. This is in contrast with most parametric methods in elementary statistics that assume that the data set used is quantitative, the population has a normal distribution and the sample size is sufficiently large. In general, conclusions drawn from non-parametric methods are not as powerful as the parametric ones. However, as non-parametric methods make fewer assumptions, they are more flexible, more robust, and applicable to non-quantitative data. This book is designed for students to acquire basic skills needed for solving real life problems where data meet minimal assumption and secondly to beef up their reading list as well as provide them with a "one shop stop" textbook on Nonparametric.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
C. A. HESSE, BSc, MPhil, PhD.
Senior Lecturer of Statistics,
Methodist University College Ghana.
J. B. OFOSU, BSc, PhD, FSS.
Professor of Statistics,
Methodist University College Ghana.
E. N. NORTEY, BSc, MPhil, PhD.
Senior Lecturer of Statistics,
University of Ghana.
Copyright © 2017
Akrong Publications Ltd.
All rights reserved
No part of this publication may be reproduced, in part or in whole, stored in a retrievable
system, or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior permission of the publisher.
Published and Printed by
AKRONG PUBLICATIONS LIMITED
P. O. BOX M. 31
ACCRA, GHANA
(0244 648 757, 0264 648 757)
ISBN: 978–9988– 2 –6059–0
Published, 2017
akrongh@yahoo.com.
PREFACE
A statistical method is called non-parametric if it makes no assumption on the population
distribution or sample size. This is in contrast with most parametric methods in elementary
statistics that assume that the data set used is quantitative, the population has a normal
distribution and the sample size is sufficiently large. In general, conclusions drawn from non-
parametric methods are not as powerful as the parametric ones. However, as non-parametric
methods make fewer assumptions, they are more flexible, more robust, and applicable to non-
quantitative data.
This book is designed for students to acquire basic skills needed for solving real life
problems where data meet minimal assumption and secondly to beef up their reading list as
well as provide them with a "one shop stop" textbook on Nonparametric.
Our Approach
This book is an introduction to basic ideas and techniques of nonparametric statistical methods
and is intended to prepare students of the sciences as well as the humanities, for a better
understanding of some underlying explanations of real life situations. Researchers will find
the text useful since it provides a step-by -step presentation of procedures, use of more practical
data sets, and new problems from real-life situations. The book continues to emphasize the
importance of nonparametric methods as a significant branch of modern statistics and equips
readers with the conceptual and technical skills necessary to select and apply the appropriate
procedures for any given situation.
Written by leading statisticians, Introduction to Nonparametric Statistical Methods ,
provides readers with crucial nonparametric techniques in a variety of settings, emphasizing
the assumptions underlying the methods. The book provides an extensive array of examples
that clearly illustrate how to use nonparametric approaches for handling one- or two-sample
location and dispersion problems, dichotomous data, one-way analysis of variance, rank tests,
goodness-of-fit tests and tests of randomness.
A wide range of topics is covered in this text although the treatment is limited to the
elementary level. There are solved, partly solved and unsolved assignments with every section,
to make the student or reader familiar with the methods introduced. C. A. Hesse
J. B. Ofosu
E. N. Nortey
July, 2017
CONTENTS
1. Preliminaries............................................................................................................. 1
1.1 Introduction ...................................................................................................... 1
1.2 Parametric and nonparametric methods ........................................................... 2
1.3 Parametric versus nonparametric methods ....................................................... 2
1.4 Classes of nonparametric methods ................................................................... 3
1.5 When to use nonparametric procedures ............................................................ 4
1.6 Advantages of nonparametric statistics ............................................................ 4
1.7 Disadvantages of nonparametric tests .............................................................. 6
1.8 The scope of this book ...................................................................................... 6
1.9 Format and organization ................................................................................... 6
2. One-Sample Nonparametric Methods ................................................................... 8
2.1 Introduction ...................................................................................................... 8
2.2 The one-sample sign test .................................................................................. 9
2.2.1 Assumptions ......................................................................................... 9
2.2.2 Hypotheses ............................................................................................ 10
2.2.3 Large sample approximation ................................................................ 14
2.2.4 Confidence interval for the median based on the sign test ................... 16
2.3 The Wilcoxon signed-ranks test ....................................................................... 18
2.3.1 Assumptions ......................................................................................... 18
2.3.2 Hypotheses ............................................................................................ 18
2.3.3 Test statistic .......................................................................................... 18
2.3.4 Carrying out the Wilcoxon signed ranks test ........................................ 19
2.3.5 Large sample approximation ................................................................ 22
2.3.6 Confidence Interval for the Median based on the Wilcoxon
Signed-Ranks Test ................................................................................ 25
2.4 The binomial test .............................................................................................. 30
2.4.1 Assumptions ......................................................................................... 31
2.4.2 Hypotheses ............................................................................................ 31
2.4.3 Large sample approximation ................................................................ 34
3.4.4 Large sample confidence interval for p ................................................ 35
2.5 The one-sample runs test for randomness ........................................................ 36
3. Procedures That Utilize Data from Two Independent Samples .......................... 40
3.1 Introduction ...................................................................................................... 40
3.2 The median test ................................................................................................. 41
3.2.1 Assumptions ......................................................................................... 41
3.2.2 Hypotheses ............................................................................................ 41
3.2.3 Large sample approximation ................................................................ 43
3.3 The Mann-Whitney (Wilcoxon rank-sum) test ................................................ 50
3.3.1 Assumptions ......................................................................................... 50
3.3.2 Hypotheses ............................................................................................ 50
3.3.3 Large-Sample Approximation .............................................................. 55
3.3.4 Confidence interval for difference between two population medians .. 57
3.4 The Wald-Wolfowitz two-sample runs test ...................................................... 61
3.5 The two-sample runs test for randomness ........................................................ 66
4. Procedures Using Data from Two Related Samples ............................................. 71
4.1 Introduction ...................................................................................................... 71
4.2 The sign test for two related samples ............................................................... 71
4.2.1 Introduction ........................................................................................... 72
4.2.2 Assumptions ......................................................................................... 72
4.2.3 Test procedure ...................................................................................... 72
4.2.4 Hypotheses ............................................................................................ 72
4.2.5 Confidence interval for the differences of the medians of two populations,
based on the sign test ............................................................................ 75
4.3 Wilcoxon matched-pairs signed-ranks test ....................................................... 76
4.3.1 Introduction ........................................................................................... 76
4.3.2 Assumptions ......................................................................................... 76
4.3.3 Test Procedure ...................................................................................... 76
4.3.4 Hypotheses ............................................................................................ 77
4.3.5 Large sample approximation ................................................................ 79
4.3.7 Confidence interval for the median of population differences
between pairs of measurements based on the matched -pair
signed ranks Wilcoxon test ................................................................... 80
4.4 A test for two related samples when the data consist of
frequencies (The McNemar test) ...................................................................... 87
5. Chi-Square Test of Homogeneity and Independence ........................................... 94
5.1 Introduction ...................................................................................................... 94
5.2 The chi-square test of homogeneity .................................................................. 94
5.3 The chi-square test of independence ................................................................ 104
6. Procedures Using Data from Three or More Independent Samples ................... 114
6.1 Introduction ...................................................................................................... 114
6.2 Extension of the median test ............................................................................. 114
6.3 The Kruskal-Wallis one-way analysis of variance by Ranks ........................... 120
6.4 The Jonckheere-Terpstra test for ordered alternatives ..................................... 134
7. Procedures Using Data from Three or More Related Samples ........................... 143
7.1 Introduction ...................................................................................................... 143
7.2 Data from a randomized complete block design .............................................. 144
7.3 Friedman two-way analysis of variance by ranks ............................................ 145
7.4 Page's test for ordered alternatives ................................................................... 155
8. Goodness-of-Fit Tests .............................................................................................. 161
8.1 Introduction ...................................................................................................... 161
8.2 The chi-square goodness-of-fit test .................................................................. 161
8.3 Kolmogorov-Smirnov goodness-of-fit test ....................................................... 173
8.3.1 The Kolmogorov–Smirnov goodness-of-fit test for a single sample ...... 173
8.3.2 The Kolmogorov–Smirnov two-sample test ........................................... 180
9. Rank Correlation ..................................................................................................... 190
9.1 Introduction ...................................................................................................... 190
9.2 Spearman's rank correlation coefficient ........................................................... 191
9.3 Kendall's rank correlation coefficient .............................................................. 199
Answers to Exercises ................................................................................................ 207
Appendix ................................................................................................................... 213
1.1 Introduction
The typical introductory courses in hypothesis-testing and confidence interval examine
primarily parametric statistical procedures. A main feature of these statistical procedures is the
assumption that we are working with random samples from normal populations. These
procedures are known as parametric methods because they are based on a particular
parametric family of distributions – in this case, the normal. For example, given a set of
independent observations from a normal distribution, we often want to infer something about
the unknown parameters. Here the t-test is usually used to determine whether or not the
hypothesized value
for the population mean should be rejected or not. More usefully, we
may construct a confidence interval for the 'true' population mean.
Parametric inference is sometimes inappropriate or even impossible. To assume that
samples come from any specified family of distributions may be unreasonable. For example,
we may not have examination marks for each candidate but know only the numbers of
candidates who obtained the ordered grades A , B+, B, B– , C+, C, D and F. Given these grade
distributions for two different courses, we may want to know if they indicate a difference in
performance between the two courses. In this case it is inappropriate to use the traditional
(parametric) method of analysis.
In this book we describe procedures called nonparametric and distribution-free methods.
Nonparametric methods provide an alternative series of statistical methods that require no or
very limited assumptions to be made about the data. These methods are most often used to
analyse data which do not meet the distributional requirements of parametric methods. In
particular, skewed data are frequently analysed by non-parametric methods, although data
transformation can sometimes make the data suitable for parametric analyses. These
procedures have considerable appeal. One of their advantages is that the data need not be
quantitative but can be categorical (such as yes or no) or rank data.
Generally, if both parametric and nonparametric methods are applicable to a particular
problem, we should use the more efficient parametric method.
Introducing Nonparametric Methods
1.2 Parametric and nonparametric methods
The word statistics has several meanings. It is used to describe a collection of data and also to
designate operations that may be performed with primary data. The scientific discipline called
statistical inference uses observed data – in this context called a sample – to make inference
about a larger observable collection of data called a population. We associate distributions
with populations. For example, if the random variable which describes a population is
then we say that the population is
Parametric methods are often those for which we know that the population is normal, or
we can approximate using a normal distribution after we invoke the central limit theorem.
Ultimately the classification of a method as parametric depends upon the assumptions that are
made about a population. A few parametric methods include the testing of a statistical
hypothesis about a population mean under two different conditions:
1. when sampling is from a normally distributed population with known variance,
2. when sampling is from a normally distributed population with unknown variance.
The nonparametric methods, however, are not based on the underlying assumptions and
thus do not require a population's distribution to be denoted by specific parameters.
1.3 Parametric versus nonparametric methods
The analysis of data often begins by considering the appropriateness of the normal distribution
as a model for describing the distribution of the population. If this distribution is reasonable,
or if the normal approximation is deemed adequate, then the analysis will be carried out using
normal -theory methods. If the normal distribution is not appropriate, it is common to consider
the possibility of a transformation of the data. For instance, a simple transformation of the
form
may yield data that are normally distributed, so that normal-theory methods
may be applied to the transformed data.
If neither of these approaches seems reasonable, there are two ways to proceed. It may be
possible to identify the type of distribution that is appropriate – say, exponential – and then
use the methods that specifically apply to that distribution. However, there may not be
sufficient data to ascertain the form of the distribution, or the data may come from a
distribution for which methods are not readily available. In such situations one hopes not to
make untenable assumptions, and this is where nonparametric methods come into play.
Introducing Nonparametric Methods
Nonparametric methods require minimal assumptions about the form of the distribution
of the population. For instance, it might be assumed that the data are from a population that
has continuous distribution, but no other assumptions are made. Or it might be assumed that
the population distribution depends on location and scale parameters, but the functional form
of the distribution, whether normal or whatever, is not specified. By contrast, parametric
methods require that the form of the population distribution be completely specified except for
finite number of parameters. For instance, the familiar one-sample t-test for means assumes
that observations are selected from a population that has a normal distribution, and the only
values not known are the population mean and standard deviation. The simplicity of
nonparametric methods, the widespread availability of such methods in statistical packages,
and the desirable statistical properties of such methods make them attractive additions to the
data analyst's tool kit.
1.4 Classes of nonparametric methods
Nonparametric methods may be classified according to their function, such as two-sample
tests, tests for trends, and so on. This is generally how this book is organized. However,
methods may also be classified according to the statistical ideas upon which they are based.
Here, we consider the ideas that underlie the methods discussed in this book.
The typical introductory course in statistics examines primary parametric statistical
procedures. Recall that these procedures include tests based on the Student's t -distribution,
analysis of variance, correlation analysis and regression analysis. A characteristic of these
procedures is the fact that the appropriateness of their use for the purpose of inference depends
on certain assumptions. Inferential procedures in analysis of variance, for example, assume
that samples have been drawn from normally distributed populations with equal variances.
Since populations do not always meet the assumptions underlying parametric tests, we
frequently need inferential procedures whose validity do not depend on rigid assumptions.
Nonparametric statistical procedures fill this need in many instances, since they are valid under
very general assumptions. As we shall discuss more fully later, nonparametric procedures also
satisfy other needs of the researcher.
By convention, two types of statistical procedures are treated as nonparametric:
(1) truly nonparametric procedures and (2) distribution-free procedures. Strictly speaking,
Introducing Nonparametric Methods
nonparametric procedures are not concerned with population parameters. For example, in this
book we shall discuss tests for randomness where we are concerned with some characteristic
other than the value of a population parameter. The validity of distribution-free procedures
does not depend on the functional form of the population from which the sample has been
drawn. It is customary to refer to both types of procedure as nonparametric. Kendal
and Sundrum (1953) discussed the differences between the terms nonparametric and
distribution-free.
1.5 When to use nonparametric procedures
The following are some situations in which the use of a nonparametric procedure is
appropriate.
1. The hypothesis to be tested does not involve a population parameter.
2. The data have been measured on a scale weaker than that required for the parametric
procedure that would otherwise be employed. For example, the data may consist of count
data or rank data, thereby precluding the use of some otherwise appropriate parametric
procedure.
3. The assumptions necessary for the valid use of a parametric procedure are not met. In
many instances, the design of a research project may suggest a certain parametric
procedure. Examination of the data, however, may reveal that one or more assumptions
underlying the test are grossly violated. In that case, a nonparametric procedure is
frequently the only alternative.
4. Results are needed in a hurry and calculations must be done by hand.
The literature in nonparametric statistics is extensive. A bibliography by Savage (1962)
contained some 3 000 entries. An up-to-date bibliography would undoubtedly contain many
times that number.
1.6 Advantages of nonparametric statistics
The following are some of the advantages of the available nonparametric statistical procedures.
Introducing Nonparametric Methods
1. Make fewer assumptions.
Nonparametric Statistical Procedures are procedures that generally do not need rigid
parametric assumptions with regards to the populations from which the data are taken.
2. Wider scope.
Since there are fewer assumptions that are made about the sample being studied,
nonparametric statistics are usually wider in scope as compared to parametric statistics that
actually assume a distribution.
3. Need not involve population parameters.
Parametric tests involve specific probability distributions (e.g., the normal distribution)
and the tests involve estimation of the key parameters of that distribution (e.g., the mean
or difference in means) from the sample data. However, nonparametric tests need not
involve population parameters.
4. The chance of their being improperly used is small.
Since most nonparametric procedures depend on a minimum set of assumptions, the
chance of their being improperly used is small.
5. Applicable even when data is measured on a weak measurement scale.
For interval or ratio data, you may use a parametric test depending on the shape of the
distribution. Non-parametric test can be performed even when you are working with data
that is nominal or ordinal.
6. Easy to understand.
Researchers with minimum preparation in Mathematics and Statistics usually find
nonparametric procedures easy to understand.
7. Computations can quickly and easily be performed.
Nonparametric tests usually can be performed quickly and easily without automated
instruments (calculators and computers). They are designed for small numbers of data,
including counts, classifications and ratings.
Introducing Nonparametric Methods
1.7 Disadvantages of nonparametric tests
Nonparametric procedures are not without disadvantages. The following are some of the more
important disadvantages.
1. May Waste Information.
The researcher may waste information when parametric procedures are more appropriate
to use. If the assumptions of the parametric methods can be met, it is generally more
efficient to use them.
2. Difficult to compute by hand for large samples.
For large sample sizes, data manipulations tend to become more laborious, unless
computer software is available.
3. Tables not widely available.
Often special tables of critical values are needed for the test statistic, and these values
cannot always be generated by computer software. On the other hand, the critical values
for the parametric tests are readily available and generally easy to incorporate in computer
programs
1.8 The scope of this book
The emphasis in this book is on the application of nonparametric statistical methods. Wherever
available, the examples and exercises use real data, gleaned primary from the results of
research published in various journals. We hope that the use of real situations and real data
will make the book more interesting to you. We have included problems from a wide variety
of statistical techniques described. We have included, also, a wide variety of statistical
techniques. The techniques we discuss are those most likely to prove helpful to the researcher
and most likely to appear in the research literature. In this text we have covered not only
hypothesis testing, but interval estimation as well.
1.9 Format and organization
In presenting these statistical procedures, we have adopted a format designed to make it easy
for you to use the book. Each hypothesis-testing procedure is broken down into four
components: (1) assumptions, (2) hypothesis, (3) test statistics, and (4) decision rule.
Introducing Nonparametric Methods
Thus, for a given test, you can quickly determine the assumptions on which the test is
based, the hypotheses that are appropriate, how to compute the test statistic, and how to
determine whether to reject the null hypothesis. First, we discuss these topics in general, and
then we use an example to illustrate the application of the test.
Where appropriate for a given test, we discuss ties, the large-sample approximation, and
the power efficiency. For each procedure, we cite references that you may consult if you are
interested in learning more about the procedure or in further pursuing a related topic. Finally
we provide exercises for each procedure. These exercises serve two purposes: They illustrate
appropriate uses of a test, and they give you a chance to determine whether you have mastered
the computational techniques, and learnt how to set the hypotheses and use the applicable
decision rule.
In the remaining chapters, we cite two types of reference: those that are cited in the body
of the text and refer you to the statistical literature, and those that are cited in the examples and
exercises and refer you to the research literature.
References
Armitage, P. (1971). Statistical Methods in Medical Research, Oxford and Edinburgh:
Blackwell Scientific Publications.
Colton, T. (1974). Statistics in Medicine, Boston: Little Brown.
Dunn, Olive J., (1964). Basic Statistics: A Primer for the Biomedical Sciences, New York:
Wiley.
Kendall, M. G. and Sundrum (1953). Distribution-Free Methods and Order Properties. Rev.
Int. Statist. Inst. 21, 124 – 134.
Savage, I. R. (1962). Bibliography on Nonparametric Statistics. Harvard University Press.
Remington, R. D. and Schork, M. A. (1970). Statistics with Applications to the Biological and
Health Sciences, Englewood Cliffs, N.J.: Prentice-Hall.
2.1 Introduction
In classical parametric tests (which assume that the population from which the sample data
have been drawn is normally distributed), the parameter of interest is the population mean. In
this chapter, we shall be concerned with the nonparametric analog of the one-sample z and t
tests. These are nonparametric procedures (which utilize data consisting of a single set of
observations) that are appropriate when the location parameter is the median, rather than the
mean.
Several nonparametric procedures are available for making inferences about the median.
Two of the nonparametric tests which are useful in situations where the conditions for
the parametric z and t tests are not met, are the one-sample sign test and the Wilcoxon
signed-ranks test.
Recall that the median of a set of data is defined as the middle value when data are
arranged in order of magnitude. For continuous distributions, we define the median as the
point
for which the probability that a value selected at random from the distribution is less
than
and the probability that a value selected at random from the distribution is greater than
When the population from which the sample has been drawn is
symmetric, any conclusions about the median are applicable to the mean, since in symmetrical
distributions the mean and the median coincide.
In this chapter, we shall also discuss procedures for making inferences concerning the
population proportion and testing for randomness and the presence of trend.
Wherever possible, we shall observe the following format in presenting the hypothesis-
testing procedures.
1. Assumptions
We list the assumptions necessary for the validity of the test, and describe the data on
which the calculations are based.
2. Parameter of interest
From the problem context, we identify the parameter of interest.
One-Sample Nonparametric Methods
One-Sample Nonparametric Methods
3. Hypotheses
We state the null hypothesis
and the alternative hypothesis
4. Test statistic
We write down a formula or direction for computing the relevant test statistic. When we
give a formula, we describe the methodology for evaluating it.
5. Significance level
We choose a significance level .
6. Decision rule
We determine the critical region. The Appendix gives appropriate tables for the distribution
of the test statistic. From these tables, we can determine the critical values of the test statistic
corresponding to the chosen .
7. Value of the test statistic
We compute the value of the test statistic from the sample data.
8. Decision
If the computed value of the test statistic is as extreme as or more extreme than a critical
value, we reject
is true. If we cannot reject
we conclude that
there is not enough information to warrant its falsity.
2.2 The one-sample sign test
The sign test is perhaps the oldest of all nonparametric procedures. Let
be an
observed random sample of size n from a population with median
The sign test utilizes only
the signs of the differences between the observed values
and the hypothesized median
Thus, the data is converted into a series of plus (+) and minus (–) signs.
2.2.1 Assumptions
1. The sample available for analysis is a random sample of independent measurements from
a population with an unknown median
2. The variable of interest is measured on at least an ordinal scale.
3. The variable of interest is continuous.
One-Sample Nonparametric Methods
2.2.2 Hypotheses
The hypothesis to be tested concerns the value of the population median. To test the hypothesis
is a specified median value, against a corresponding one-sided or two-sided
alternative, we use the Sign Test. The test statistic S depends on the alternative hypothesis,
(a) One -sided test
For a one sided test, the alternative hypothesis is either
1 0 1 0
: or : . HH
then the test statistic is defined by
= Number of +signs when the differences
are computed,
i = 1, 2, ...n.
If the alternative hypothesis is true, then we should expect
to
yield significantly fewer positive (+) signs than negative (−) signs. Thus, a smaller
number of (+) signs leads to the rejection of
is true, we expect the
number of (−) signs to be equal to that of the (+) signs and hence
1
00
2
( ) ( ) . P S P S
is true, S has the binomial distribution with parameters
Decision rule
The p-value of the test is defined by
o0
is true , p P S s H
is the observed value of the test statistic
One-Sample Nonparametric Methods
(ii ) For a one-sided test, we test
= Number of observations less than
= Number of –signs when the differences
are computed,
i = 1, 2, ...n.
If the alternative hypothesis is true, then we should expect
to yield less
negative (−) signs than would be expected if the null hypothesis were true. Likewise,
when
has the binomial distribution with parameters
Decision rule
The p-value of the test is defined by
o0
is true , p P S s H
(b) Two-sided test
If we wish to test
then the test statistic is defined by
is the number of –signs and
is the number of +signs when the differences
are computed.
We should reject the null hypothesis if we have too few negative (–) signs or too few
positive (+) signs. When
has the binomial distribution with parameters
One-Sample Nonparametric Methods
Decision Rule
The p-value of the test is defined by
o0
2 is true , p P S s H
is the observed value of the test statistic
Problem with zero differences
We assume that the variable of interest is continuous. Therefore, in theory, no zero
differences should occur when we compute
In practice, however, zero differences do occur. The usual procedure is to discard
observations leading to zero differences and reduce n accordingly. In that case the
hypothesis may be re-stated in probability terms. For example, a two-sided case will
have its null hypothesis as
00
0.5. P X P X
Example 2.1
Appearance transit times for 11 patients with significantly occluded right coronary arteries are
given below:
Can we conclude, at the 0.05 level of significance, that the median appearance transit time in
the population from which the data were drawn, is different from 3.50 seconds?
Solution
The parameter of interest is
the median appearance transit time in the population. We wish
to test the hypothesis
level of significance. Since this is a two-sided test, the test statistic is
is the number of observations less than 3.50 and
is the number of observations
greater than 3.50. When
One-Sample Nonparametric Methods
Note: We discard one observation which has the same value as the hypothesized median,
leaving us with a usable sample size of 10.
Let
be the observed value of the test statistic. We reject
at the 0.05 level of significance
when
o
2 10 , 0.5 . p P S s
The observed value of the test statistic is therefore
given by
Since this is a two-sided test, the p-value of the test is given by
2 110, 0.5 2 0.0107 .0.0214 p P S
Since the p-value of the test, 0.0214, is less than 0.05, we reject
at the 0.05 level of
significance and conclude that the population median is not 3.50.
Example 2.2
The following data are IQs of arrested drug abusers who are aged 16 years or older. Is there
any evidence that the median IQ of drug abusers in the population is greater than 107?
Use
Solution
The parameter of interest is
the median IQ of drug abusers in the population. We wish to
test the hypothesis
level of significance. The test statistic is
is the number of observations less than 107. When
One-Sample Nonparametric Methods
Note: We discard one observation which has the same value as the hypothesized median,
leaving us with a usable sample size of 14.
Let
be the observed value of the test statistic. We reject
at the 0.05 level of significance
when
where the p-value of the test is given by
The following table gives the signs of
The observed value of the test statistic is
Since this is a one-sided test, the p-value of the test is given by
614, 0.5 0. 95 . 33 p P S
Since the p-value of the test, 0.3953, is greater than 0.05, we fail to reject
at the 0.05 level
of significance. Hence, there is not enough evidence to conclude that the median IQ of the
subjects in the population is greater than 107.
2.2.3 Large sample approximation
If the sample size is larger than 15, we can use the normal approximation to the binomial
distribution with a continuity correction. Thus, if n is large and
then it can be
shown that S is approximately normally distribution with mean
Thus, for the sign test, when
and n > 15, we can use the
test statistic
11
22
1
11 2
22
.
S n S n
n
n
Z
……………………………………………….....(2.1)
When
For the large sample
approximation, it is common to use a continuity correction, by replacing S by
in the
definition of Z. Equation (2.1) then becomes
…………………………………………………………..(2.2)
One-Sample Nonparametric Methods
Example 2.3
The following data gives the ages, in years, of a random sample of 20 students from Besease
Senior High School. It is believed that the median age of students in this school is smaller than
22 years. Based on these data, is there sufficient evidence to conclude that the median age of
students from Besease Senior High School is smaller than 22 years?
Solution
The parameter of interest is
the median age of students from Besease Senior High School.
We are interested in testing the null hypothesis
greater than 22
= number of +signs when the differences
are computed, i = 1, 2, ...20.
When
Since n > 15, we use the normal approximation to the
binomial distribution with a continuity correction. The test statistic then becomes
0.5 0.5 20
0.5 20 .
S
Z
is true, Z is N(0, 1). Let
denote the observed value of the test statistic Z. We
reject
at the 0.05 level of significance when
o 0.05 1.645. z z z
The following table
gives the signs of
Thus, the observed value of the statistic S is 5. This gives,
5 0.5 0.5 20
o0.5 20 2.0125. z
One-Sample Nonparametric Methods
at the 0.05 level of significance and conclude
that the median age of students of Besease Senior High School is less than 22 years.
2.2.4 Confidence interval for the median based on the sign test
The
consists of those values of
for which we
would not reject a two-sided null hypothesis
level of significance.
We designate the lower limit of our confidence interval by
We determine the largest positive or negative signs, (i.e. the value
When the data values are arranged in order of magnitude, the
the upper limit of the confidence interval, we count the ordered sample
values backwards from the largest. The
observation from the largest value locates
value.
Example 2.4
Construct a 95% confidence interval for the median of the population from which the
following sample data have been drawn, using the sign test.
Solution
The point estimate of the population median is the sample median which is the mean of
the two middle values in the ordered array. Thus,
the sample median =
we consult a table of the binomial distribution and find that
Thus, we note that we cannot obtain an exact 95% confidence interval for the median.
Since 100[1 – 2(0.0105)] = 97.9, which is larger than 95 and 100[1 – 2(0.0383)] = 92.34,
which is smaller than 95.
This method of constructing confidence intervals for the median does not usually yield
intervals with exactly the usual coefficients of 0.90 , 0.95 , and 0.99.
One-Sample Nonparametric Methods
In practice, we choose between a wider interval and a higher confidence or the narrower
interval and lower confidence.
Suppose we choose
Therefore the 5th value in the ordered array is
and the 12th (i.e. 16 – 4) value in the ordered array is
.
The confidence coefficient is therefore 100[1 – 2(0.0383)] = 92.34. We say that we are
92.34% confident that the population median is between 1.99 and 4.01.
Large Sample Approximation
We find k such that
11
22
11
22
2
S n k n
nn
P
.
Making k the subject of the above equation, we obtain
22
1 1 1
2 2 2
. k n z n n z n
If the resulting value is not an integer, we use the closest integer.
Example 2.5
Refer to Example 2.3. Construct a 95% confidence interval for
1
2
1 20 1.96 20 5.6 6. ks
One-Sample Nonparametric Methods
Therefore the 6th observation in the ordered array is
observation
in the ordered array is
Hence the 95% confidence interval
for
2.3 The Wilcoxon signed-ranks test
As we have seen, the sign test utilizers only the signs of the differences between observed
values and the hypothesized median. For testing
there is another procedure that
uses the magnitude of the differences when these are available. The Wilcoxon signed-ranks
procedure makes use of additional information to rank the differences between the sample
measurements and the hypothesized median. The Wilcoxon signed-ranks test uses more
information than the sign test, making it a more powerful test when the sampled population is
symmetric. However, the sign test is preferred when the sampled population is not symmetric.
2.3.1 Assumptions
1. The sample available for analysis is a random sample of size n from a population with an
unknown median
2. The variable of interest is measured on a continuous scale.
3. The sampled population is symmetric.
4. The scale of measurement is at least interval.
5. The observations are independent.
2.3.2 Hypotheses
The parameter of interest is
the population median. To test the hypothesis
is the hypothesized median, against a corresponding one-sided or two-sided
alternative, we can also use the Wilcoxon signed-ranks test.
2.3.3 Test statistic
To obtain the test statistic, we use the following procedure.
1. Subtract the hypothesized median
that is, for each
observation
One-Sample Nonparametric Methods
is equal to the hypothesized median,
eliminate it from the
calculations and reduce the sample size accordingly.
3. Rank the differences
from the smallest to largest without regard to their signs. If two
or more
are tied, assign each tied value the mean of the rank positions of the tied
differences.
4. Assign to each rank the sign of the difference of which it is ranked.
5. Obtain the sum of the ranks with positive signs; call it
Obtain the sum of the ranks
with negative signs; call it
7. For a given sample, we do not expect
2.3.4 Carrying out the Wilcoxon signed ranks test
When the null hypothesis,
is true, we do not expect a great difference between
Consequently, a sufficiently
small value of
or a sufficiently small value of
(a) One -sided test : To test
at the α level of significance.
Test statistic
A sufficiently small value of
leads to the rejection of the null hypothesis
The test
statistic therefore is
is less than or equal to
the tabulated W value for n and a preselected
(b) One-sided test: To test
One-Sample Nonparametric Methods
at the α level of significance.
Test statistic
For a sufficiently small
The test statistic therefore is
since a small value causes us to reject the null hypothesis.
Decision rule
We reject
is less than or equal to
the tabulated W value for n and a preselected value of
(c) For a two-sided test, we test
at the α level of significance.
Test statistic
The test statistic is
since a small value of either
causes us to reject the null hypothesis.
Decision rule
We reject
is less than or equal to
the tabulated W value for n and a preselected value of
The distribution of W
1. The smallest value W can take is zero (0) and the largest value that W can take is the sum
of the integers from 1 to n: that is,
W is therefore a discrete random variable
whose support ranges between 0 and
2. It can be shown that the probability mass function of the discrete random variable W is
given by
()
2
( ) ( ) ,
n
cw
P W w f w
where c( w ) = the number of possible ways to assign a +sign or a −sign to the first n integers
so that the sum of the ranks with +signs (or –signs) is equal to w.
One-Sample Nonparametric Methods
Example 2.6
The following are the systolic blood pressures (mmHg) of 13 patients undergoing a drug
therapy for hypertension:
Can we conclude on the basis of these data that the median systolic blood pressure is less than
165 mmHg? Take α = 0.05.
Solution Table 2.1: Computation of test statistic
The parameter of interest is
the median
systolic blood pressure of the population. We
wish to test the hypothesis
level of significance. Using the
Wilcoxon signed rank test, the test statistic is
is the sum of the ranks with positive
signs.
We reject
at the 0.05 level of
significance if
is the observed value of the test statistic.
From Table 2.1,
The value of the test statistic is
therefore
Since 27.5 > 17 , we fail to reject
We conclude that the median
systolic blood pressure of the subjects in the population is not less than 165 mmHg.
Example 2.7
Refer to Example 2.2. Use the Wilcoxon signed-ranks test to determine if there is any evidence
that the median IQ of drug abusers in the population is different from 107. Use
One-Sample Nonparametric Methods
Solution Table 2.2: Computation of test statistic
Let
denote the median IQ of drug abusers
who are aged 16 years or older. We wish to
test the hypothesis
level of significance. The test
statistic is
are the sums of the ranks
with negative and positive signs, respectively.
We reject
at the 0.05 level of
significance if
is the observed value of the test statistic.
From Table 2.2,
The value of the test statistic is
Since 40.5 > 21, we fail to reject
We
conclude that the median IQ of the subjects in the population may be 107.
2.3.5 Large sample approximation
Theorem 2.1
Proof
When
is true, W can be defined as
When the null hypothesis is true,
and
One-Sample Nonparametric Methods
( 1) ( 1)
1 1 1 1
2 2 2 2 2 4
1 1 1
( ) ( ) 0 .
n n n n n n n
i
i i i
E W E W i i
2
2
2 2 2 2 2 2
1 1 1 1 1
2 2 2 2 4 4
( ) ( ) ( ) 0 .
i
ii
i
V W E W E W i i i i
( 1)(2 1) ( 1)(2 1)
22
1 1 1
4 4 4 6 24
11
( ) .
nn
n n n n n n
ii
V W i i
Theorem 2.2
Proof
If W is a random variable with mean
then by the central
limit theorem,
( 1)
4
( 1)(2 1)
24
nn
n n n
W
Z
is approximately N(0, 1).
Adjustment for Ties
We can incorporate an adjustment for ties among nonzero differences in the large sample
approximation in the following way.
Let t be the number of absolute differences tied for a particular nonzero rank. Then the
correction factor is
When the null hypothesis is true, for large n:
follows an approximate standard normal distribution N(0, 1).
One-Sample Nonparametric Methods
We can subtract this quantity from the expression in the denominator under the square root
sign.
Thus the adjusted statistic for a large sample approximation is
( 1)
43
( 1)(2 1)
24 48
.
nn
n n n tt
W
Z
We illustrate the calculation of an adjustment for ties in the following data:
Table 2.3: Computation of correction factor
Example 2.8
The following data show the life span, in years, of a random sample of 21 recorded deaths in
a certain country. It has been known in the past years that the median life span in the country
is 50 years. Can we conclude from these data that the median life span in the country has
improved? Use α = 0.05
One-Sample Nonparametric Methods
Solution Table 2.4: Computation of test statistic
Let
denote the median life span in the
population . We wish to test the hypothesis
( 1)
43
( 1)(2 1)
24 48
.
nn
n n n tt
W
Z
is true, W is N(0, 1).
Reject
at the 0.05 level of significance if
is the computed
value of Z. From Table 2.4,
The value of the test statistic is
21 22
4
21 22 43 70 10
24 48
23
o 3.2175. w
Since –3.2175 < – 1.645, we reject
at the 0.05
level of significance. We therefore conclude that,
the median life span in the country has improved
significantly.
2.3.6 Confidence Interval for the Median, based on the Wilcoxon Signed-Ranks Test
Arithmetic Procedure
Step1: Find the means,
of all possible pairs of observation
from the sample
observation
2
, 1 .
ij
xx
ij
u i j n
One-Sample Nonparametric Methods
such averages, distributed symmetrically about the median.
Step 2: Arrange the
in an increasing order of magnitude.
Step 3: The median of the
is a point estimate of the population median.
Step 4: Find, from the Wilcoxon Signed Ranks Test table,
corresponding to the
sample size n and appropriate value of p as determined by the desired confidence
level. When the confidence coefficient is
If the exact value
of p cannot be found in the Wilcoxon signed ranks test table, we choose a closer
neighbouring value.
Step 5: The end points of the confidence interval are the k th smallest and k th largest values of
where k = t + 1, where t is either value in the column labelled T corresponding
to n and the value of p selected (see Wayne, 1978).
Example 2.9
Determine the 95% confidence interval for the population median by the Wilcoxon Signed-
ranks procedure using the following data:
Solution
All the 55 possible pairs of means from the observations are given in the Table 2.5.
Table 2.5: All possible pairs of means from the observations
Thus, a point estimate of the population median
is the 28th observation of the ordered data
in Table 2.5. This is 32. From the Wilcoxon signed ranks test table,
One-Sample Nonparametric Methods
k = t + 1 = 9. Therefore the 9th observation in the ordered array in Table 2.5 is the lower limit
and the 9th observation from the largest value locates the upper limit
Therefore the 95% confidence interval for
Large Sample Approximation
With samples larger than 30, we cannot use the Wilcoxon signed-ranks table to determine k.
A large sample approximation of k is however given by (see Wayne, 1978)
2
( 1) 1
4( 1)(2 1) .
24
nn n n n
kz
Exercise 2(a)
1. The median age of the onset of diabetes is thought to be 45 years. The ages at onset of a
random sample of 16 people with diabetes are:
Perform the
(a) sign test, (b) Wilcoxon signed-ranks test,
to determine if there is any evidence to conclude that the median age of the onset of
diabetes differs significantly from 45 years. Take α = 0.05.
2. Recent studies of the private practices of physicians who saw no Medicaid patients
suggested that the median length of each patient visit was 22 minutes. It is believed that
the median visit length in practices with a large Medicaid load is shorter than 22 minutes.
A random sample of 20 visits in practices with a large Medicaid load yielded, in order,
the following visit lengths:
(a) Use the large sample approximation of the sign test to determine if there is
sufficient evidence to conclude, at the 1% level of significance, that the average visit
length in practices with a large Medicaid load is shorter than 22 minutes?
(b) Based on the sign test, construct a 95% confidence interval for the median visit length
in practices with a large Medicaid load.
3. The following are the blood glucose levels of 12 patients who attend St. Thomas Hospital:
One-Sample Nonparametric Methods
Perform the Wilcoxon signed ranks test to determine if we can conclude on the basis of
these data that the average glucose level in the population is greater than 96 mg/dl? Take
α = 0.05.
4. From a random sample of 14 students from Accra Catholic Senior High School, the body
masses of 9 students were found to be less than 38 kg whilst those of 4 students exceeded
38 kg with the remaining students recording exactly 38 kg. Can we conclude, based on a
sign test, that the average body mass of students from the school is less than 38 kg?
5. In a sample of 25 adolescents who served as the subjects in an immunologic study, one
variable of interest was the diameter of skin test reaction to an antigen. The sample
observations, in mm erythema, were as follows:
Use the large sample approximation of the Wilcoxon signed ranks test to determine if
we can conclude from these data that the population average is less than 30 mm.
Take α = 0.05.
6. Barrett (1991) reported data on eight cases of umbilical cord prolapse. The maternal ages
were 25, 28, 17, 26, 27, 18, 25, and 30.
(a) Perform the Wilcoxon signed ranks test to determine if there is enough evidence,
based on the data, that the average age of the population from which the sample may
be presumed to have been drawn is greater than 20 years. Take α = 0.01.
(b) Based on the Wilcoxon signed ranks test, construct a 99% confidence interval for the
population median.
7. Out of a random sample of 100 recorded deaths in a certain country during the past year,
68 of them were more than 65 years whilst the remaining 32 were below 65 years. Perform
a sign test to determine if we can we conclude that the average life span in the country is
greater than 65 years. Use α = 0.05.
8. Recent studies of the private practices of physicians who saw no Medicaid patients
suggested that the median length of each patient visit was 22 minutes. It is believed that
the median visit length in practices with a large Medicaid load is shorter than 22 minutes.
A random sample of 20 visits in practices with a large Medicaid load yielded, in order, the
following visit lengths:
One-Sample Nonparametric Methods
Based on the large sample approximation of the sign test, is there sufficient evidence to
conclude that the average visit length in practices with a large Medicaid load is shorter
than 22 minutes?
9. To determine whether the median life span of certain spices of animal is greater than 5
years, a random sample of 25 observations were made and life span in years is the
following:
At 0.05 level of significant, use the large sample approximation of the sign test to
determine if the average life span is greater than 5 years.
10. A physician states that the median number of times he sees each of his patients during the
year is five. In order to evaluate the validity of this statement, he randomly selects ten of
his patients and determines the number of office visits each of them made during the past
year. He obtains the following values for the ten patients in his sample: 9, 10, 8, 4, 8, 3,
0, 10, 15, 9. Do the data support his contention that the median number of times he sees a
patient is five?
11. Moore and Ogletree (1973) investigated the readiness of pupils at the beginning of the
first grade. They compared scores on a readiness test of pupils who had attended a head
start program for a full year with the scores of those who had not. The readiness test scores
of 10 pupils who did not attend a Head Start program are as follows: 33, 19, 40, 35, 51,
41, 27, 55, 39, 21. Can we conclude, based on the Wilcoxon signed ranks test, that the
median score of the population represented by this sample is less than 45.3? Take
= 0.05.
12. Abu-Ayyash (1972) found that the median education of heads of households living in
mobile homes in a certain area was 11.6 years. Suppose that a similar survey conducted
in another area revealed the educational levels of heads of households as shown in the
following data.
Based on the sign test, can we conclude that the average educational level of the
population represented by this sample is less than 11.6 years? Take = 0.05.
13. Lenzer et al. (1973) reported the endurance score of animals during a 48-hour session of
discrimination responding. The median score for an animal with electrodes implanted in
One-Sample Nonparametric Methods
the hypothalamus was 97.5. Suppose that the experiment was duplicated in another
laboratory, except that electrodes were implanted in the forebrain in 12 animals. Assume
that investigators observed the endurance score shown in the following table.
Use the one-sample sign test to see whether the investigators may conclude at the 0.05
level of significance that the median endurance score of animals with electrodes implanted
in the forebrain is less than 97.5.
14. Iwamoto (1971) found that the mean weight of a sample of a particular species of adult
female monkey from a certain locality was 8.41 kg. Suppose that a sample of adult females
of the same species from another locality yielded the weights as shown in the following
table. By using the one-sample sign test, can we conclude, at the 0.05 level of significance,
that the median weight of the population from which this second sample was drawn is
greater than 8.41 kg?
2.4 The binomial test
Inferences concerning proportions are required in many areas. The population proportion is a
parameter of frequent interest in research and decision-making activities. The politician is
interested in knowing what proportion of voters will vote for him in the next election. All
manufacturing firms are concerned about the proportion of defective items when a shipment
is made. A market analyst may wish to know the proportion of families in a certain area who
have central air conditioning. A sociologist may want to know the proportion of heads of
household in a certain area who are women. Many questions of interest to the health worker
relate to the population proportion. What proportion of patients who receive a particular
treatment recover? What proportion of a population has a certain disease?
When it is impossible or impractical to survey the total population, researchers base
decision regarding population proportions, on inferences made by analyzing samples drawn
from the population. As usual, inference may take the form of interval estimation or hypothesis
testing.
Sometimes, we want to draw inferences concerning the total number, the proportion or
percentage of units in the population that possess some characteristic or attribute or fall into
some defined class. A random sample of size
is drawn from a population. Suppose we wish
One-Sample Nonparametric Methods
to estimate the proportion,
of units in the population that belong to some definite class in
the population.
Testing hypotheses about population proportions is carried out in much the same way as
for median when the assumptions necessary for the test are satisfied.
2.4.1 Assumptions
1. The data consist of a sample of the outcomes of n repetitions of some process. Each
outcome consists of either a 'success' or a 'failure'. The proportion of the sample having
a characteristic of interest is
an estimate of the population proportion p, where S
is the number of successes (the total number of sampling units with a particular
characteristic of interest).
2. The n trials are independent.
3. The probability of a success p, remains constant from trial to trial.
2.4.2 Hypotheses
One-sided and two-sided tests may be made, depending on the question being asked. In other
words, we can test
against one of the alternatives
Test statistic
Since we are interested in the number of successes S, our test statistic is S. When
has the binomial distribution with parameters
Decision rule
Sufficiently small values of S lead to the rejection of
denote the observed
value of S. We reject
at the α level of significance if the
o0
-value , . p P S s n p
One-Sample Nonparametric Methods
Test statistic
The test statistic therefore is S. When
Decision rule
For sufficiently large values of S, we reject
at α level of
significance if the
is the observed value
of S.
(c) Two -sided test
Here, we test
Test statistic
The test statistic therefore is S. When
Decision rule
For sufficiently large or sufficiently small values of S, we reject
The hypothesized
proportion is
whilst the observed sample proportion
is the observed
value of S. The p-value of the test is defined by
o 0 0
o 0 0
ˆ
2 , , if ,
-value
ˆ
2 , , if .
P S s n p p p
pP S s n p p p
at the α level of significance if the
Example 2.10
In a survey of injection drug users in a large city, Coates et al. (1991) found that 2 out of 12
were HIV positive. We wish to know if we can conclude, at the 10% level of significance, that
fewer than 40% of the injection drug users in the sampled population are HIV positive.
Solution
The parameter of interest is p, the proportion of injection drug users in the sampled population
who are HIV positive. We wish to test
One-Sample Nonparametric Methods
The test statistic is S, the number of injection drug users in the
sample who are HIV positive. When
is true, S has the binomial distribution with
parameters
denote the observed value of the test statistic. We reject
at the 0.1 level of
significance if the
o
-value 12, 0.4 . p P S s
-value 212, 0.4 0.0834. p P S
Since the p-value, 0.0834 < 0.1, we reject
at the 10% level of significance and conclude
that fewer than 40% of the injection drug users in the sampled population are HIV positive.
Example 2.11
A researcher found anterior sub-capsular vacuoles in the eyes of 6 out of 15 diabetic patients.
Using the binomial test, can we conclude that the population proportion with the condition of
interest is greater than 0.2? Use = 0.05.
Solution
The parameter of interest is
the proportion of diabetic patients in the population with
anterior sub-capsular vacuoles in the eyes. We wish to test
The test statistic is S, the number of diabetic patients in the sample with anterior sub-capsular
vacuoles in the eyes. When
denote the observed value of the test statistic. We reject
at the 0.05 level of
significance if the
o
-value 15, 0.2 . p P S s
-value 615, 0.2 1 615, 0.2 1 0.9819 0.0181. p P S P S
Since the p-value 0.0181 < 0.05, we reject
at the 0.05 level of significance and conclude
that the population proportion
One-Sample Nonparametric Methods
2.4.3 Large sample approximation
1. If S is a binomial random variable with parameters n and
then the expectation and
variance of S are given by
2. Thus, when the null hypothesis is true, and n is large,
follows an approximate standard normal distribution, N(0, 1).
3. The normal approximation to the binomial distribution is good if
3. Note that the sign-test discussed earlier is a special case of the binomial test, in which
Example 2.12
A commonly prescribed drug for relieving nervous tension is believed to be only 60%
effective. Experimental results with a new drug administered to a random sample of 100 adults
who were suffering from nervous tension show that 70 received relief. Is this sufficient
evidence to conclude that the new drug is superior to the one commonly prescribed? Use
α = 0.05.
Solution
The parameter of interest is p, the proportion of adults in the population who received relief
from nervous tension. We wish to test
at α = 0.05 level of significance. The test statistic is
are greater than 5 and so Z is
approximately N(0, 1) when H 0 is true. We reject H 0 if z, the computed Z value is greater than
70 100 0.6
100 0.6 0.4 2. . 0412 z
Since 2.0412 > 1.645, we reject H 0 at the 0.05 level of significance. We conclude that the new
drug is superior to the one commonly prescribed.
One-Sample Nonparametric Methods
2.4.4 Large sample confidence interval for p
If
is the proportion of observations in a random sample of size n that belongs to a class of
interest, then an approximate 100(1 – )% confidence interval of the proportion p of the
population that belongs to this class is (see Ofosu & Hesse, 2011)
11
22
ˆ ˆ ˆ ˆ
(1 ) (1 )
11
ˆˆ
,
p p p p
nn
p z p p z
is the proportion of the sample with the characteristic of interest.
Example 2.13
In a certain university, the proportion of students who have diabetes mellitus is p. Of the 500
students selected at random from the university, 6 had diabetes mellitus.
(a) Find a point estimate of p. (b) Construct a 90% confidence interval for p.
Solution
(a) A point estimate of p is given by
are of sufficient magnitude to justify the
use of the formula for constructing a confidence interval for p. To construct a 90%
confidence interval, we put
. This gives = 0.10. From the standard normal
table, we find that
. Hence a 90% confidence interval for p is
0.012 0.988 0.012 0.988
500 500
0.012 1.645 0.012 1.645 , p
Exercise 2(b)
1. A researcher found that 66% of a sample of 14 infants had completed the hepatitis B
vaccine series. Can we conclude on the basis of these data that, in the sampled population,
more than 60% have completed the series? Use α = 0.01.
2. A health survey of 12 male inmates 50 years of age and older residing in a state's
correctional facilities was made. They found that 22% of the respondents reported a history
of venereal disease. On the basis of these findings, can we conclude that in the sampled
population, more than 15% have a history of venereal disease? Use α = 0.05.
One-Sample Nonparametric Methods
3. The fraction of defective integrated circuits produced in a photolithography process is
being studied. A random sample of 300 circuits is tested, revealing 13 defectives. Use the
data to test H o:
against H 1 : p 0.05. Use α = 0.05.
4. A commonly prescribed drug for relieving nervous tension is believed to be only 70%
effective. Experimental results with a new drug administered to a random sample of 10
adults who were suffering from nervous tension show that 8 received relief. Is this
sufficient evidence to conclude that the new drug is superior to the one commonly
prescribed? Use α = 0.05.
5. Suppose that, in the past, 40% of all adults favoured capital punishment. Do we have reason
to believe that the proportion of adults favouring capital punishment today has increased
if, in a random sample of 15 adults, 8 favour capital punishment? Use α = 0.05.
2.5 The one-sample runs test for randomness
In many situations we want to know whether we can conclude that a set of observations
constitute a random sample from an infinite population. Test for randomness is of major
importance because the assumption of randomness underlies statistical inference (see Ofosu
& Hesse, 2011). In addition, tests for randomness are important for time series analysis. The
runs test procedure is used to examine whether or not a sequence of sample values is random.
Consider, for example, the following sequence of sample values
Each observation is denoted by a '+' sign if it is more than the previous observation and by a
'– ' sign if it is less than the previous observation as shown in the following table.
A run is a sequence of signs of the same kind bounded by signs of other kind. In this case, we
doubt the sequence's randomness, since there are only two runs.
If the order of occurrence were
One-Sample Nonparametric Methods
we would doubt the sequence's randomness because there are too many runs (10 in this
instance).
Too few runs indicate that the sequence is not random (has persistency) whilst too many
runs also indicate that the sequence is not random (is zigzag). Let us now consider the one
sample runs test. This procedure helps us to decide whether a sequence of sample values is the
result of a random process.
Assumptions
The data available for analysis consist of a sequence of sample values, recorded in the order
of their occurrence.
Hypotheses
We wish to test
The sequence of sample values is random, against
The sequence of sample values is not random.
Test Statistic
The test statistic is R, the total number of runs.
Decision Rule
Since the null hypothesis does not specify the direction, a two-sided test is appropriate. The
critical value,
for the test is obtained from Table A.5, in the Appendix, for a given sample
size n and at a desired level of significance α. If
Tied Values
If an observation is equal to its preceding observation, denote it by zero. While counting the
number of runs, ignore it and reduce the value of n accordingly.
Large Sample Sizes
If
then the test statistic can be approximated by
One-Sample Nonparametric Methods
at the level of significance α if
where z is the computed value of Z.
Example 2.15
The following are the blood glucose levels of 12 patients who attend St. Thomas Hospital:
Test, at the 0.05 level of significance whether the sequence is random?
The sequence is random, against
The sequence is not random.
The test statistic is
the number of runs.
We reject
at the 0.05 level of significance if
where r is
the observed value of R and
is the critical value. It can be seen that:
Here n = 11 and the number of runs r = 7. From the table of critical values for runs up and
down test,
(see Table A.5, in the Appendix)
Note: Since two consecutive observations are the same, that is 110, we use n = 11 instead
of n = 12.
Since
at the 0.05 level of significance and therefore conclude
that the sequence is random.
Exercise 2(c)
1. The following data show the average daily temperatures recorded at Accra, Ghana, for 15
consecutive days during June 2017.
One-Sample Nonparametric Methods
Test, at the 0.05 level of significance, if we can conclude that the pattern of temperature
is random?
2. The following data show the inflation rate in Ghana from 2006 to 2017. Test, at the 0.05
level of significance, if we can conclude that the pattern of year inflation is random?
References
Abu-Ayyash, A. Y. (1972). The mobile home: A neglected phenomenon in geographic
research. Geog. Bull., 5, 28 – 30.
Barrett, J. M. (1991). Funic reduction for the management of umbilical cord prolapse.
American Journal of Obstetrics and Gynaecology. 165 , 654-657.
Coates, R., Millson, M., Myers, T. (1991). The benefits of HIV Antibody testing of saliva in
field research. Canadian Journal of Public Health, 82, 397-398.
Iwamoto, M. (1971). Morphological studies of Macaca Fuscata: VI, Somatometry. Primates,
12, 151 – 174.
Lenzer, Irmingard I., and White, C. A. (1973). Statistical effects in continuous reinforcement
and successive sensory discrimination situations. Physiolog. Psychol, 1, 77 – 82.
Moore, R. C and Ogletree, E. J. (1973). A comparison of the readiness and intelligence of first
grade children with and without a full year of Head Start training. Education, 93, 266 – 270.
Ofosu, J. B., & Hesse, C. A. (2011). Elementary Statistical Methods. EPP Books Services,
Accra.
Wayne, W. D. (1978). Applied nonparametric statistics. Houghton Mifflin company, London.
... We used the Wilcoxon signed-rank test (nonparametric bivariate analysis) to observe attitudinal changes after the provision of the information by the sensor. This statistical test allowed us to compare measurements at 2 points in time (Akrong et al. 2017) and determine if there were significant differences in the subjective evaluation of air pollution and the perception of risk that the participants report before and after being informed about the indoor situation by the sensor. We report the scores' mean and standard deviation to facilitate interpretation. ...
In southern Chile, epidemiological studies have linked high levels of air pollution produced by the use of wood-burning stoves with the incidence of numerous diseases. Using a quasi-experimental design, this study explores the potential of participatory sensing strategies to transform experiences, perceptions, attitudes, and daily routine activities in 15 households equipped with wood-burning stoves in the city of Temuco, Chile. The results suggest that the experience of using a low-cost sensor improves household members' awareness levels of air pollution. However, the information provided by the sensors does not seem to improve the participants' self-efficacy to control air quality and protect themselves from pollution. The high degree of involvement with the participatory sensing experience indicates that the distribution of low-cost sensors could be a key element in the risk communication policies.
- Christian Akrong Hesse
The purpose of this book is to acquaint the reader with the increasing number of applications of statistics in engineering and the applied sciences. It can be used as a textbook for a first course in statistical methods in Universities and Polytechnics. The book can also be used by decision makers and researchers to either gain basic understanding or to extend their knowledge of some of the most commonly used statistical methods. Our goal is to introduce the basic theory without getting too involved in mathematical detail, and thus to enable a larger proportion of the book to be devoted to practical applications. Because of this, some results are stated without proof, where this is unlikely to affect the reader's comprehension. However, we have tried to avoid the cook-book approach to statistics by carefully explaining the basic concepts of the subject, such as probability and sampling distributions; these the reader must understand. The worst abuses of statistics occur when scientists try to analyze their data by substituting measurements into statistical formulae which they do not understand.
- Irmingard I. Lenzer
- Carol A. White
Nine rats with forebrain and hypothalamic electrodes were trained on a sensory discrimination task with S–intervals ranging from 12 to 60 sec. Their performance during a 48-h discrimination session was compared to that during a 48-h CRF session. The response measures showed that forebrain Ss performed more poorly than hypothalamic Ss during the discrimination session, even though performance of forebrain animals was better in the discrimination session than in the CRF session when time until the first 5-min pause was considered. In the CRF session, there was no difference in performance of forebrain and hypothalamic animals, except in terms of time till first pause. The results are discussed in terms of cumulative effects of forebrain and hypothalamic ESB.
- M.G. Kendall
- R.M. Sundrum
Dans cette étude, les auteurs passent en revue les méthodes d'inférence connues différemment sous le nom de "non-parametric" et "distribution free". Ils expliquent quelques difficultés de définition des "paramètres" d'une distribution, et définissent une hypothèse comme étant paramétrique si: (a) elle fait une assertion concernant une distribution de fréquence; (b) elle spécifie complètement la distribution excepté pour un nombre fini de paramètres; (c) elle est considérée dans le cadre d'hypothèses alternatives dont la distribution a la même forme. Les auteurs en concluent que l'expression "non-paramétrique" ne doit pas être appliquée à "inférence" mais seulement à des hypothèses. Les auteurs considèrent alors la définition de "distribution free" en rapport avec des tests d'hypothèses. Ils constatent qu'un test comporte quatre éléments: (a) une certaine fonction des observations; (b) la distribution d'échantillonnage de cette fonction des observations; (c) la région critique dans le cas que l'hypothèse est correcte; (d) la forme de la région critique sous des hypothèses alternatives. Un test est défini comme étant "distribution free" seulement si au moins les trois premiers de ces éléments sont indépendants de la distribution de base. Certains types de tests ne satisfont pas à ces exigences et, par conséquent, ne sont pas considérés comme "distribution free". Ce sont (1) des tests dépendant d'inégalités du type Tchebycheff, (2) des tests de la valeur d'un ajustement ("goodness of fit"), (3) des tests conditionnels. Les auteurs considèrent alors quelques tests strictement "distribution free" (tests pour la valeur centrale, tests d'homogénéité, et tests d'indépendance) en se référant spécialement à ceux basés sur le rang des observations. Il apparaît que tous les tests qui sont "distribution free" dans le sens défini par les auteurs, peuvent être dérivés de propriétés de rang.
- Mitsuo Iwamoto
Measurements of various parts of the head and body and weighing the body were carried out on about 170 adult Japanese monkeys (Macaca fuscata) and the results are noted with separate statistics for respective local groups. Intraspecific comparisons in the Japanese monkey and interspecific comparisons in macaques are discussed from the somatometrical point of view. Among macaques, the Japanese monkey has a comparatively large body, a very short tail, relatively wide biacromial and biiliac breadths, and markedly la ge intermembral and intercrural indices. The Japanese monkey itself shows various local variations. The most conspicuous difference is to be found between the so-called Yaku monkey living on Yaku islet (Yakushima), south of Kyushu, and the monkeys living in other parts of Japan, and, therefore, it is understandable that the Yaku monkey has been distinguished as a subspecies (M. f. yakui) of the Japanese monkey. The Yaku monkey has a somewhat small body, a relatively large head, wide hips, and slender hands and feet.
- Ruth C. Moore
Typescript. Thesis (M.S. Ed.)--Chicago State University. Includes bibliographical references (leaves 58-61).
Posted by: eustoliamaccarthye0194611.blogspot.com
Source: https://www.researchgate.net/publication/322677728_INTRODUCTION_TO_NONPARAMETRIC_STATISTICAL_METHODS
Post a Comment for "practical nonparametric statistics pdf free download"