practical nonparametric statistics pdf free download

A statistical method is called non-parametric if it makes no assumption on the population distribution or sample size. This is in contrast with most parametric methods in elementary statistics that assume that the data set used is quantitative, the population has a normal distribution and the sample size is sufficiently large. In general, conclusions drawn from non-parametric methods are not as powerful as the parametric ones. However, as non-parametric methods make fewer assumptions, they are more flexible, more robust, and applicable to non-quantitative data. This book is designed for students to acquire basic skills needed for solving real life problems where data meet minimal assumption and secondly to beef up their reading list as well as provide them with a "one shop stop" textbook on Nonparametric.

Discover the world's research

20+ million members
135+ million publications
700k+ research projects

Join for free

INTRODUCTION TO

NONPARAMETRIC

STATISTICAL METHODS

C. A. HESSE, BSc, MPhil, PhD.

Senior Lecturer of Statistics,

Methodist University College Ghana.

J. B. OFOSU, BSc, PhD, FSS.

Professor of Statistics,

Methodist University College Ghana.

E. N. NORTEY, BSc, MPhil, PhD.

Senior Lecturer of Statistics,

University of Ghana.

Akrong Publications Ltd.

No part of this publication may be reproduced, in part or in whole, stored in a retrievable

system, or transmitted in any form or by any means, electronic, mechanical, photocopying,

recording or otherwise, without prior permission of the publisher.

Published and Printed by

AKRONG PUBLICATIONS LIMITED

P. O. BOX M. 31

ACCRA, GHANA

(0244 648 757, 0264 648 757)

ISBN: 978–9988– 2 –6059–0

Published, 2017

akrongh@yahoo.com.

PREFACE

A statistical method is called non-parametric if it makes no assumption on the population

distribution or sample size. This is in contrast with most parametric methods in elementary

statistics that assume that the data set used is quantitative, the population has a normal

distribution and the sample size is sufficiently large. In general, conclusions drawn from non-

parametric methods are not as powerful as the parametric ones. However, as non-parametric

methods make fewer assumptions, they are more flexible, more robust, and applicable to non-

quantitative data.

This book is designed for students to acquire basic skills needed for solving real life

problems where data meet minimal assumption and secondly to beef up their reading list as

well as provide them with a "one shop stop" textbook on Nonparametric.

Our Approach

This book is an introduction to basic ideas and techniques of nonparametric statistical methods

and is intended to prepare students of the sciences as well as the humanities, for a better

understanding of some underlying explanations of real life situations. Researchers will find

the text useful since it provides a step-by -step presentation of procedures, use of more practical

data sets, and new problems from real-life situations. The book continues to emphasize the

importance of nonparametric methods as a significant branch of modern statistics and equips

readers with the conceptual and technical skills necessary to select and apply the appropriate

procedures for any given situation.

Written by leading statisticians, Introduction to Nonparametric Statistical Methods ,

provides readers with crucial nonparametric techniques in a variety of settings, emphasizing

the assumptions underlying the methods. The book provides an extensive array of examples

that clearly illustrate how to use nonparametric approaches for handling one- or two-sample

location and dispersion problems, dichotomous data, one-way analysis of variance, rank tests,

goodness-of-fit tests and tests of randomness.

A wide range of topics is covered in this text although the treatment is limited to the

elementary level. There are solved, partly solved and unsolved assignments with every section,

to make the student or reader familiar with the methods introduced. C. A. Hesse

J. B. Ofosu

E. N. Nortey

July, 2017

CONTENTS

1. Preliminaries............................................................................................................. 1

1.1 Introduction ...................................................................................................... 1

1.2 Parametric and nonparametric methods ........................................................... 2

1.3 Parametric versus nonparametric methods ....................................................... 2

1.4 Classes of nonparametric methods ................................................................... 3

1.5 When to use nonparametric procedures ............................................................ 4

1.6 Advantages of nonparametric statistics ............................................................ 4

1.7 Disadvantages of nonparametric tests .............................................................. 6

1.8 The scope of this book ...................................................................................... 6

1.9 Format and organization ................................................................................... 6

2. One-Sample Nonparametric Methods ................................................................... 8

2.1 Introduction ...................................................................................................... 8

2.2 The one-sample sign test .................................................................................. 9

2.2.1 Assumptions ......................................................................................... 9

2.2.2 Hypotheses ............................................................................................ 10

2.2.3 Large sample approximation ................................................................ 14

2.2.4 Confidence interval for the median based on the sign test ................... 16

2.3 The Wilcoxon signed-ranks test ....................................................................... 18

2.3.1 Assumptions ......................................................................................... 18

2.3.2 Hypotheses ............................................................................................ 18

2.3.3 Test statistic .......................................................................................... 18

2.3.4 Carrying out the Wilcoxon signed ranks test ........................................ 19

2.3.5 Large sample approximation ................................................................ 22

2.3.6 Confidence Interval for the Median based on the Wilcoxon

Signed-Ranks Test ................................................................................ 25

2.4 The binomial test .............................................................................................. 30

2.4.1 Assumptions ......................................................................................... 31

2.4.2 Hypotheses ............................................................................................ 31

2.4.3 Large sample approximation ................................................................ 34

3.4.4 Large sample confidence interval for p ................................................ 35

2.5 The one-sample runs test for randomness ........................................................ 36

3. Procedures That Utilize Data from Two Independent Samples .......................... 40

3.1 Introduction ...................................................................................................... 40

3.2 The median test ................................................................................................. 41

3.2.1 Assumptions ......................................................................................... 41

3.2.2 Hypotheses ............................................................................................ 41

3.2.3 Large sample approximation ................................................................ 43

3.3 The Mann-Whitney (Wilcoxon rank-sum) test ................................................ 50

3.3.1 Assumptions ......................................................................................... 50

3.3.2 Hypotheses ............................................................................................ 50

3.3.3 Large-Sample Approximation .............................................................. 55

3.3.4 Confidence interval for difference between two population medians .. 57

3.4 The Wald-Wolfowitz two-sample runs test ...................................................... 61

3.5 The two-sample runs test for randomness ........................................................ 66

4. Procedures Using Data from Two Related Samples ............................................. 71

4.1 Introduction ...................................................................................................... 71

4.2 The sign test for two related samples ............................................................... 71

4.2.1 Introduction ........................................................................................... 72

4.2.2 Assumptions ......................................................................................... 72

4.2.3 Test procedure ...................................................................................... 72

4.2.4 Hypotheses ............................................................................................ 72

4.2.5 Confidence interval for the differences of the medians of two populations,

based on the sign test ............................................................................ 75

4.3 Wilcoxon matched-pairs signed-ranks test ....................................................... 76

4.3.1 Introduction ........................................................................................... 76

4.3.2 Assumptions ......................................................................................... 76

4.3.3 Test Procedure ...................................................................................... 76

4.3.4 Hypotheses ............................................................................................ 77

4.3.5 Large sample approximation ................................................................ 79

4.3.7 Confidence interval for the median of population differences

between pairs of measurements based on the matched -pair

signed ranks Wilcoxon test ................................................................... 80

4.4 A test for two related samples when the data consist of

frequencies (The McNemar test) ...................................................................... 87

5. Chi-Square Test of Homogeneity and Independence ........................................... 94

5.1 Introduction ...................................................................................................... 94

5.2 The chi-square test of homogeneity .................................................................. 94

5.3 The chi-square test of independence ................................................................ 104

6. Procedures Using Data from Three or More Independent Samples ................... 114

6.1 Introduction ...................................................................................................... 114

6.2 Extension of the median test ............................................................................. 114

6.3 The Kruskal-Wallis one-way analysis of variance by Ranks ........................... 120

6.4 The Jonckheere-Terpstra test for ordered alternatives ..................................... 134

7. Procedures Using Data from Three or More Related Samples ........................... 143

7.1 Introduction ...................................................................................................... 143

7.2 Data from a randomized complete block design .............................................. 144

7.3 Friedman two-way analysis of variance by ranks ............................................ 145

7.4 Page's test for ordered alternatives ................................................................... 155

8. Goodness-of-Fit Tests .............................................................................................. 161

8.1 Introduction ...................................................................................................... 161

8.2 The chi-square goodness-of-fit test .................................................................. 161

8.3 Kolmogorov-Smirnov goodness-of-fit test ....................................................... 173

8.3.1 The Kolmogorov–Smirnov goodness-of-fit test for a single sample ...... 173

8.3.2 The Kolmogorov–Smirnov two-sample test ........................................... 180

9. Rank Correlation ..................................................................................................... 190

9.1 Introduction ...................................................................................................... 190

9.2 Spearman's rank correlation coefficient ........................................................... 191

9.3 Kendall's rank correlation coefficient .............................................................. 199

Answers to Exercises ................................................................................................ 207

Appendix ................................................................................................................... 213

1.1 Introduction

The typical introductory courses in hypothesis-testing and confidence interval examine

primarily parametric statistical procedures. A main feature of these statistical procedures is the

assumption that we are working with random samples from normal populations. These

procedures are known as parametric methods because they are based on a particular

parametric family of distributions – in this case, the normal. For example, given a set of

independent observations from a normal distribution, we often want to infer something about

the unknown parameters. Here the t-test is usually used to determine whether or not the

hypothesized value

for the population mean should be rejected or not. More usefully, we

may construct a confidence interval for the 'true' population mean.

Parametric inference is sometimes inappropriate or even impossible. To assume that

samples come from any specified family of distributions may be unreasonable. For example,

we may not have examination marks for each candidate but know only the numbers of

candidates who obtained the ordered grades A , B+, B, B– , C+, C, D and F. Given these grade

distributions for two different courses, we may want to know if they indicate a difference in

performance between the two courses. In this case it is inappropriate to use the traditional

(parametric) method of analysis.

In this book we describe procedures called nonparametric and distribution-free methods.

Nonparametric methods provide an alternative series of statistical methods that require no or

very limited assumptions to be made about the data. These methods are most often used to

analyse data which do not meet the distributional requirements of parametric methods. In

particular, skewed data are frequently analysed by non-parametric methods, although data

transformation can sometimes make the data suitable for parametric analyses. These

procedures have considerable appeal. One of their advantages is that the data need not be

quantitative but can be categorical (such as yes or no) or rank data.

Generally, if both parametric and nonparametric methods are applicable to a particular

problem, we should use the more efficient parametric method.

Introducing Nonparametric Methods

1.2 Parametric and nonparametric methods

The word statistics has several meanings. It is used to describe a collection of data and also to

designate operations that may be performed with primary data. The scientific discipline called

statistical inference uses observed data – in this context called a sample – to make inference

about a larger observable collection of data called a population. We associate distributions

with populations. For example, if the random variable which describes a population is

then we say that the population is

Parametric methods are often those for which we know that the population is normal, or

we can approximate using a normal distribution after we invoke the central limit theorem.

Ultimately the classification of a method as parametric depends upon the assumptions that are

made about a population. A few parametric methods include the testing of a statistical

hypothesis about a population mean under two different conditions:

1. when sampling is from a normally distributed population with known variance,

2. when sampling is from a normally distributed population with unknown variance.

The nonparametric methods, however, are not based on the underlying assumptions and

thus do not require a population's distribution to be denoted by specific parameters.

1.3 Parametric versus nonparametric methods

The analysis of data often begins by considering the appropriateness of the normal distribution

as a model for describing the distribution of the population. If this distribution is reasonable,

or if the normal approximation is deemed adequate, then the analysis will be carried out using

normal -theory methods. If the normal distribution is not appropriate, it is common to consider

the possibility of a transformation of the data. For instance, a simple transformation of the

form

may yield data that are normally distributed, so that normal-theory methods

may be applied to the transformed data.

If neither of these approaches seems reasonable, there are two ways to proceed. It may be

possible to identify the type of distribution that is appropriate – say, exponential – and then

use the methods that specifically apply to that distribution. However, there may not be

sufficient data to ascertain the form of the distribution, or the data may come from a

distribution for which methods are not readily available. In such situations one hopes not to

make untenable assumptions, and this is where nonparametric methods come into play.

Introducing Nonparametric Methods

Nonparametric methods require minimal assumptions about the form of the distribution

of the population. For instance, it might be assumed that the data are from a population that

has continuous distribution, but no other assumptions are made. Or it might be assumed that

the population distribution depends on location and scale parameters, but the functional form

of the distribution, whether normal or whatever, is not specified. By contrast, parametric

methods require that the form of the population distribution be completely specified except for

finite number of parameters. For instance, the familiar one-sample t-test for means assumes

that observations are selected from a population that has a normal distribution, and the only

values not known are the population mean and standard deviation. The simplicity of

nonparametric methods, the widespread availability of such methods in statistical packages,

and the desirable statistical properties of such methods make them attractive additions to the

data analyst's tool kit.

1.4 Classes of nonparametric methods

Nonparametric methods may be classified according to their function, such as two-sample

tests, tests for trends, and so on. This is generally how this book is organized. However,

methods may also be classified according to the statistical ideas upon which they are based.

Here, we consider the ideas that underlie the methods discussed in this book.

The typical introductory course in statistics examines primary parametric statistical

procedures. Recall that these procedures include tests based on the Student's t -distribution,

analysis of variance, correlation analysis and regression analysis. A characteristic of these

procedures is the fact that the appropriateness of their use for the purpose of inference depends

on certain assumptions. Inferential procedures in analysis of variance, for example, assume

that samples have been drawn from normally distributed populations with equal variances.

Since populations do not always meet the assumptions underlying parametric tests, we

frequently need inferential procedures whose validity do not depend on rigid assumptions.

Nonparametric statistical procedures fill this need in many instances, since they are valid under

very general assumptions. As we shall discuss more fully later, nonparametric procedures also

satisfy other needs of the researcher.

By convention, two types of statistical procedures are treated as nonparametric:

(1) truly nonparametric procedures and (2) distribution-free procedures. Strictly speaking,

Introducing Nonparametric Methods

nonparametric procedures are not concerned with population parameters. For example, in this

book we shall discuss tests for randomness where we are concerned with some characteristic

other than the value of a population parameter. The validity of distribution-free procedures

does not depend on the functional form of the population from which the sample has been

drawn. It is customary to refer to both types of procedure as nonparametric. Kendal

and Sundrum (1953) discussed the differences between the terms nonparametric and

distribution-free.

1.5 When to use nonparametric procedures

The following are some situations in which the use of a nonparametric procedure is

appropriate.

1. The hypothesis to be tested does not involve a population parameter.

2. The data have been measured on a scale weaker than that required for the parametric

procedure that would otherwise be employed. For example, the data may consist of count

data or rank data, thereby precluding the use of some otherwise appropriate parametric

procedure.

3. The assumptions necessary for the valid use of a parametric procedure are not met. In

many instances, the design of a research project may suggest a certain parametric

procedure. Examination of the data, however, may reveal that one or more assumptions

underlying the test are grossly violated. In that case, a nonparametric procedure is

frequently the only alternative.

4. Results are needed in a hurry and calculations must be done by hand.

The literature in nonparametric statistics is extensive. A bibliography by Savage (1962)

contained some 3 000 entries. An up-to-date bibliography would undoubtedly contain many

times that number.

1.6 Advantages of nonparametric statistics

The following are some of the advantages of the available nonparametric statistical procedures.

Introducing Nonparametric Methods

1. Make fewer assumptions.

Nonparametric Statistical Procedures are procedures that generally do not need rigid

parametric assumptions with regards to the populations from which the data are taken.

2. Wider scope.

Since there are fewer assumptions that are made about the sample being studied,

nonparametric statistics are usually wider in scope as compared to parametric statistics that

actually assume a distribution.

3. Need not involve population parameters.

Parametric tests involve specific probability distributions (e.g., the normal distribution)

and the tests involve estimation of the key parameters of that distribution (e.g., the mean

or difference in means) from the sample data. However, nonparametric tests need not

involve population parameters.

4. The chance of their being improperly used is small.

Since most nonparametric procedures depend on a minimum set of assumptions, the

chance of their being improperly used is small.

5. Applicable even when data is measured on a weak measurement scale.

For interval or ratio data, you may use a parametric test depending on the shape of the

distribution. Non-parametric test can be performed even when you are working with data

that is nominal or ordinal.

6. Easy to understand.

Researchers with minimum preparation in Mathematics and Statistics usually find

nonparametric procedures easy to understand.

7. Computations can quickly and easily be performed.

Nonparametric tests usually can be performed quickly and easily without automated

instruments (calculators and computers). They are designed for small numbers of data,

including counts, classifications and ratings.

Introducing Nonparametric Methods

1.7 Disadvantages of nonparametric tests

Nonparametric procedures are not without disadvantages. The following are some of the more

important disadvantages.

1. May Waste Information.

The researcher may waste information when parametric procedures are more appropriate

to use. If the assumptions of the parametric methods can be met, it is generally more

efficient to use them.

2. Difficult to compute by hand for large samples.

For large sample sizes, data manipulations tend to become more laborious, unless

computer software is available.

3. Tables not widely available.

Often special tables of critical values are needed for the test statistic, and these values

cannot always be generated by computer software. On the other hand, the critical values

for the parametric tests are readily available and generally easy to incorporate in computer

programs

1.8 The scope of this book

The emphasis in this book is on the application of nonparametric statistical methods. Wherever

available, the examples and exercises use real data, gleaned primary from the results of

research published in various journals. We hope that the use of real situations and real data

will make the book more interesting to you. We have included problems from a wide variety

of statistical techniques described. We have included, also, a wide variety of statistical

techniques. The techniques we discuss are those most likely to prove helpful to the researcher

and most likely to appear in the research literature. In this text we have covered not only

hypothesis testing, but interval estimation as well.

1.9 Format and organization

In presenting these statistical procedures, we have adopted a format designed to make it easy

for you to use the book. Each hypothesis-testing procedure is broken down into four

components: (1) assumptions, (2) hypothesis, (3) test statistics, and (4) decision rule.

Introducing Nonparametric Methods

Thus, for a given test, you can quickly determine the assumptions on which the test is

based, the hypotheses that are appropriate, how to compute the test statistic, and how to

determine whether to reject the null hypothesis. First, we discuss these topics in general, and

then we use an example to illustrate the application of the test.

Where appropriate for a given test, we discuss ties, the large-sample approximation, and

the power efficiency. For each procedure, we cite references that you may consult if you are

interested in learning more about the procedure or in further pursuing a related topic. Finally

we provide exercises for each procedure. These exercises serve two purposes: They illustrate

appropriate uses of a test, and they give you a chance to determine whether you have mastered

the computational techniques, and learnt how to set the hypotheses and use the applicable

decision rule.

In the remaining chapters, we cite two types of reference: those that are cited in the body

of the text and refer you to the statistical literature, and those that are cited in the examples and

exercises and refer you to the research literature.

References

Armitage, P. (1971). Statistical Methods in Medical Research, Oxford and Edinburgh:

Blackwell Scientific Publications.

Colton, T. (1974). Statistics in Medicine, Boston: Little Brown.

Dunn, Olive J., (1964). Basic Statistics: A Primer for the Biomedical Sciences, New York:

Wiley.

Kendall, M. G. and Sundrum (1953). Distribution-Free Methods and Order Properties. Rev.

Int. Statist. Inst. 21, 124 – 134.

Savage, I. R. (1962). Bibliography on Nonparametric Statistics. Harvard University Press.

Remington, R. D. and Schork, M. A. (1970). Statistics with Applications to the Biological and

Health Sciences, Englewood Cliffs, N.J.: Prentice-Hall.

2.1 Introduction

In classical parametric tests (which assume that the population from which the sample data

have been drawn is normally distributed), the parameter of interest is the population mean. In

this chapter, we shall be concerned with the nonparametric analog of the one-sample z and t

tests. These are nonparametric procedures (which utilize data consisting of a single set of

observations) that are appropriate when the location parameter is the median, rather than the

mean.

Several nonparametric procedures are available for making inferences about the median.

Two of the nonparametric tests which are useful in situations where the conditions for

the parametric z and t tests are not met, are the one-sample sign test and the Wilcoxon

signed-ranks test.

Recall that the median of a set of data is defined as the middle value when data are

arranged in order of magnitude. For continuous distributions, we define the median as the

point

for which the probability that a value selected at random from the distribution is less

than

and the probability that a value selected at random from the distribution is greater than

When the population from which the sample has been drawn is

symmetric, any conclusions about the median are applicable to the mean, since in symmetrical

distributions the mean and the median coincide.

In this chapter, we shall also discuss procedures for making inferences concerning the

population proportion and testing for randomness and the presence of trend.

Wherever possible, we shall observe the following format in presenting the hypothesis-

testing procedures.

1. Assumptions

We list the assumptions necessary for the validity of the test, and describe the data on

which the calculations are based.

2. Parameter of interest

From the problem context, we identify the parameter of interest.

One-Sample Nonparametric Methods

3. Hypotheses

We state the null hypothesis

and the alternative hypothesis

4. Test statistic

We write down a formula or direction for computing the relevant test statistic. When we

give a formula, we describe the methodology for evaluating it.

5. Significance level

We choose a significance level  .

6. Decision rule

We determine the critical region. The Appendix gives appropriate tables for the distribution

of the test statistic. From these tables, we can determine the critical values of the test statistic

corresponding to the chosen  .

7. Value of the test statistic

We compute the value of the test statistic from the sample data.

8. Decision

If the computed value of the test statistic is as extreme as or more extreme than a critical

value, we reject

is true. If we cannot reject

we conclude that

there is not enough information to warrant its falsity.

2.2 The one-sample sign test

The sign test is perhaps the oldest of all nonparametric procedures. Let

be an

observed random sample of size n from a population with median

The sign test utilizes only

the signs of the differences between the observed values

and the hypothesized median

Thus, the data is converted into a series of plus (+) and minus (–) signs.

2.2.1 Assumptions

1. The sample available for analysis is a random sample of independent measurements from

a population with an unknown median

2. The variable of interest is measured on at least an ordinal scale.

3. The variable of interest is continuous.

One-Sample Nonparametric Methods

2.2.2 Hypotheses

The hypothesis to be tested concerns the value of the population median. To test the hypothesis

is a specified median value, against a corresponding one-sided or two-sided

alternative, we use the Sign Test. The test statistic S depends on the alternative hypothesis,

(a) One -sided test

For a one sided test, the alternative hypothesis is either

1 0 1 0

: or : .       HH

then the test statistic is defined by

= Number of +signs when the differences

are computed,

i = 1, 2, ...n.

If the alternative hypothesis is true, then we should expect

yield significantly fewer positive (+) signs than negative (−) signs. Thus, a smaller

number of (+) signs leads to the rejection of

is true, we expect the

number of (−) signs to be equal to that of the (+) signs and hence

( ) ( ) .       P S P S

is true, S has the binomial distribution with parameters

Decision rule

The p-value of the test is defined by

 

is true , p P S s H 

is the observed value of the test statistic

One-Sample Nonparametric Methods

(ii ) For a one-sided test, we test

= Number of observations less than

= Number of –signs when the differences

are computed,

i = 1, 2, ...n.

If the alternative hypothesis is true, then we should expect

to yield less

negative (−) signs than would be expected if the null hypothesis were true. Likewise,

when

has the binomial distribution with parameters

Decision rule

The p-value of the test is defined by

 

is true , p P S s H 

(b) Two-sided test

If we wish to test

then the test statistic is defined by

is the number of –signs and

is the number of +signs when the differences

are computed.

We should reject the null hypothesis if we have too few negative (–) signs or too few

positive (+) signs. When

has the binomial distribution with parameters

One-Sample Nonparametric Methods

Decision Rule

The p-value of the test is defined by

 

2 is true , p P S s H 

is the observed value of the test statistic

Problem with zero differences

 We assume that the variable of interest is continuous. Therefore, in theory, no zero

differences should occur when we compute

 In practice, however, zero differences do occur. The usual procedure is to discard

observations leading to zero differences and reduce n accordingly. In that case the

hypothesis may be re-stated in probability terms. For example, a two-sided case will

have its null hypothesis as

   

0.5.       P X P X

Example 2.1

Appearance transit times for 11 patients with significantly occluded right coronary arteries are

given below:

Can we conclude, at the 0.05 level of significance, that the median appearance transit time in

the population from which the data were drawn, is different from 3.50 seconds?

Solution

The parameter of interest is

the median appearance transit time in the population. We wish

to test the hypothesis

level of significance. Since this is a two-sided test, the test statistic is

is the number of observations less than 3.50 and

is the number of observations

greater than 3.50. When

One-Sample Nonparametric Methods

Note: We discard one observation which has the same value as the hypothesized median,

leaving us with a usable sample size of 10.

Let

be the observed value of the test statistic. We reject

at the 0.05 level of significance

when

 

2 10 , 0.5 . p P S s 

The observed value of the test statistic is therefore

given by

Since this is a two-sided test, the p-value of the test is given by

 

2 110, 0.5 2 0.0107 .0.0214 p P S     

Since the p-value of the test, 0.0214, is less than 0.05, we reject

at the 0.05 level of

significance and conclude that the population median is not 3.50.

Example 2.2

The following data are IQs of arrested drug abusers who are aged 16 years or older. Is there

any evidence that the median IQ of drug abusers in the population is greater than 107?

Use

Solution

The parameter of interest is

the median IQ of drug abusers in the population. We wish to

test the hypothesis

level of significance. The test statistic is

is the number of observations less than 107. When

One-Sample Nonparametric Methods

Note: We discard one observation which has the same value as the hypothesized median,

leaving us with a usable sample size of 14.

Let

be the observed value of the test statistic. We reject

at the 0.05 level of significance

when

where the p-value of the test is given by

The following table gives the signs of

The observed value of the test statistic is

Since this is a one-sided test, the p-value of the test is given by

 

614, 0.5 0. 95 . 33 p P S   

Since the p-value of the test, 0.3953, is greater than 0.05, we fail to reject

at the 0.05 level

of significance. Hence, there is not enough evidence to conclude that the median IQ of the

subjects in the population is greater than 107.

2.2.3 Large sample approximation

If the sample size is larger than 15, we can use the normal approximation to the binomial

distribution with a continuity correction. Thus, if n is large and

then it can be

shown that S is approximately normally distribution with mean

Thus, for the sign test, when

and n > 15, we can use the

test statistic

11 2

S n S n

Z





……………………………………………….....(2.1)

When

For the large sample

approximation, it is common to use a continuity correction, by replacing S by

in the

definition of Z. Equation (2.1) then becomes

…………………………………………………………..(2.2)

One-Sample Nonparametric Methods

Example 2.3

The following data gives the ages, in years, of a random sample of 20 students from Besease

Senior High School. It is believed that the median age of students in this school is smaller than

22 years. Based on these data, is there sufficient evidence to conclude that the median age of

students from Besease Senior High School is smaller than 22 years?

Solution

The parameter of interest is

the median age of students from Besease Senior High School.

We are interested in testing the null hypothesis

greater than 22

= number of +signs when the differences

are computed, i = 1, 2, ...20.

When

Since n > 15, we use the normal approximation to the

binomial distribution with a continuity correction. The test statistic then becomes

 

0.5 0.5 20

0.5 20 .

Z  



is true, Z is N(0, 1). Let

denote the observed value of the test statistic Z. We

reject

at the 0.05 level of significance when

o 0.05 1.645. z z z



   

The following table

gives the signs of

Thus, the observed value of the statistic S is 5. This gives,

 

5 0.5 0.5 20

o0.5 20 2.0125. z  

  

One-Sample Nonparametric Methods

at the 0.05 level of significance and conclude

that the median age of students of Besease Senior High School is less than 22 years.

2.2.4 Confidence interval for the median based on the sign test

 The

consists of those values of

for which we

would not reject a two-sided null hypothesis

level of significance.

 We designate the lower limit of our confidence interval by

 We determine the largest positive or negative signs, (i.e. the value

 When the data values are arranged in order of magnitude, the

the upper limit of the confidence interval, we count the ordered sample

values backwards from the largest. The

observation from the largest value locates

value.

Example 2.4

Construct a 95% confidence interval for the median of the population from which the

following sample data have been drawn, using the sign test.

Solution

 The point estimate of the population median is the sample median which is the mean of

the two middle values in the ordered array. Thus,

the sample median =

we consult a table of the binomial distribution and find that

 Thus, we note that we cannot obtain an exact 95% confidence interval for the median.

Since 100[1 – 2(0.0105)] = 97.9, which is larger than 95 and 100[1 – 2(0.0383)] = 92.34,

which is smaller than 95.

 This method of constructing confidence intervals for the median does not usually yield

intervals with exactly the usual coefficients of 0.90 , 0.95 , and 0.99.

One-Sample Nonparametric Methods

 In practice, we choose between a wider interval and a higher confidence or the narrower

interval and lower confidence.

 Suppose we choose

Therefore the 5th value in the ordered array is

and the 12th (i.e. 16 – 4) value in the ordered array is

 The confidence coefficient is therefore 100[1 – 2(0.0383)] = 92.34. We say that we are

92.34% confident that the population median is between 1.99 and 4.01.

Large Sample Approximation

We find k such that

S n k n

P 









Making k the subject of the above equation, we obtain

1 1 1

2 2 2

. k n z n n z n





   





If the resulting value is not an integer, we use the closest integer.

Example 2.5

Refer to Example 2.3. Construct a 95% confidence interval for

 

1 20 1.96 20 5.6 6. ks



    

One-Sample Nonparametric Methods

Therefore the 6th observation in the ordered array is

observation

in the ordered array is

Hence the 95% confidence interval

for

2.3 The Wilcoxon signed-ranks test

As we have seen, the sign test utilizers only the signs of the differences between observed

values and the hypothesized median. For testing

there is another procedure that

uses the magnitude of the differences when these are available. The Wilcoxon signed-ranks

procedure makes use of additional information to rank the differences between the sample

measurements and the hypothesized median. The Wilcoxon signed-ranks test uses more

information than the sign test, making it a more powerful test when the sampled population is

symmetric. However, the sign test is preferred when the sampled population is not symmetric.

2.3.1 Assumptions

1. The sample available for analysis is a random sample of size n from a population with an

unknown median

2. The variable of interest is measured on a continuous scale.

3. The sampled population is symmetric.

4. The scale of measurement is at least interval.

5. The observations are independent.

2.3.2 Hypotheses

The parameter of interest is

the population median. To test the hypothesis

is the hypothesized median, against a corresponding one-sided or two-sided

alternative, we can also use the Wilcoxon signed-ranks test.

2.3.3 Test statistic

To obtain the test statistic, we use the following procedure.

1. Subtract the hypothesized median

that is, for each

observation

One-Sample Nonparametric Methods

is equal to the hypothesized median,

eliminate it from the

calculations and reduce the sample size accordingly.

3. Rank the differences

from the smallest to largest without regard to their signs. If two

or more

are tied, assign each tied value the mean of the rank positions of the tied

differences.

4. Assign to each rank the sign of the difference of which it is ranked.

5. Obtain the sum of the ranks with positive signs; call it

Obtain the sum of the ranks

with negative signs; call it

7. For a given sample, we do not expect

2.3.4 Carrying out the Wilcoxon signed ranks test

When the null hypothesis,

is true, we do not expect a great difference between

Consequently, a sufficiently

small value of

or a sufficiently small value of

(a) One -sided test : To test

at the α level of significance.

Test statistic

A sufficiently small value of

leads to the rejection of the null hypothesis

The test

statistic therefore is

is less than or equal to

the tabulated W value for n and a preselected

(b) One-sided test: To test

One-Sample Nonparametric Methods

at the α level of significance.

Test statistic

For a sufficiently small

The test statistic therefore is

since a small value causes us to reject the null hypothesis.

Decision rule

We reject

is less than or equal to

the tabulated W value for n and a preselected value of

at the α level of significance.

Test statistic

The test statistic is

since a small value of either

causes us to reject the null hypothesis.

Decision rule

We reject

is less than or equal to

the tabulated W value for n and a preselected value of

The distribution of W

1. The smallest value W can take is zero (0) and the largest value that W can take is the sum

of the integers from 1 to n: that is,

W is therefore a discrete random variable

whose support ranges between 0 and

2. It can be shown that the probability mass function of the discrete random variable W is

given by

()

( ) ( ) ,

P W w f w   

where c( w ) = the number of possible ways to assign a +sign or a −sign to the first n integers

so that the sum of the ranks with +signs (or –signs) is equal to w.

One-Sample Nonparametric Methods

Example 2.6

The following are the systolic blood pressures (mmHg) of 13 patients undergoing a drug

therapy for hypertension:

Can we conclude on the basis of these data that the median systolic blood pressure is less than

165 mmHg? Take α = 0.05.

Solution Table 2.1: Computation of test statistic

The parameter of interest is

the median

systolic blood pressure of the population. We

wish to test the hypothesis

level of significance. Using the

Wilcoxon signed rank test, the test statistic is

is the sum of the ranks with positive

signs.

We reject

at the 0.05 level of

significance if

is the observed value of the test statistic.

From Table 2.1,

The value of the test statistic is

therefore

Since 27.5 > 17 , we fail to reject

We conclude that the median

systolic blood pressure of the subjects in the population is not less than 165 mmHg.

Example 2.7

Refer to Example 2.2. Use the Wilcoxon signed-ranks test to determine if there is any evidence

that the median IQ of drug abusers in the population is different from 107. Use

One-Sample Nonparametric Methods

Solution Table 2.2: Computation of test statistic

Let

denote the median IQ of drug abusers

who are aged 16 years or older. We wish to

test the hypothesis

level of significance. The test

statistic is

are the sums of the ranks

with negative and positive signs, respectively.

We reject

at the 0.05 level of

significance if

is the observed value of the test statistic.

From Table 2.2,

The value of the test statistic is

Since 40.5 > 21, we fail to reject

conclude that the median IQ of the subjects in the population may be 107.

2.3.5 Large sample approximation

Theorem 2.1

Proof

When

is true, W can be defined as

When the null hypothesis is true,

and

One-Sample Nonparametric Methods

   

( 1) ( 1)

1 1 1 1

2 2 2 2 2 4

1 1 1

( ) ( ) 0 .

n n n n n n n

i i i

E W E W i i 

  



      

  



 

   

 

2 2 2 2 2 2

1 1 1 1 1

2 2 2 2 4 4

( ) ( ) ( ) 0 .

V W E W E W i i i i



       



( 1)(2 1) ( 1)(2 1)

1 1 1

4 4 4 6 24

( ) .

n n n n n n

V W i i    



    



Theorem 2.2

Proof

If W is a random variable with mean

then by the central

limit theorem,

( 1)

( 1)(2 1)

n n n

Z







is approximately N(0, 1).

Adjustment for Ties

 We can incorporate an adjustment for ties among nonzero differences in the large sample

approximation in the following way.

 Let t be the number of absolute differences tied for a particular nonzero rank. Then the

correction factor is

When the null hypothesis is true, for large n:

follows an approximate standard normal distribution N(0, 1).

One-Sample Nonparametric Methods

 We can subtract this quantity from the expression in the denominator under the square root

sign.

 Thus the adjusted statistic for a large sample approximation is

( 1)

( 1)(2 1)

24 48

n n n tt

Z

 







 We illustrate the calculation of an adjustment for ties in the following data:

Table 2.3: Computation of correction factor

Example 2.8

The following data show the life span, in years, of a random sample of 21 recorded deaths in

a certain country. It has been known in the past years that the median life span in the country

is 50 years. Can we conclude from these data that the median life span in the country has

improved? Use α = 0.05

One-Sample Nonparametric Methods

Solution Table 2.4: Computation of test statistic

Let

denote the median life span in the

population . We wish to test the hypothesis

( 1)

( 1)(2 1)

24 48

n n n tt

Z



 







is true, W is N(0, 1).

Reject

at the 0.05 level of significance if

is the computed

value of Z. From Table 2.4,

The value of the test statistic is

21 22

21 22 43 70 10

24 48

o 3.2175. w

  



  

Since –3.2175 < – 1.645, we reject

at the 0.05

level of significance. We therefore conclude that,

the median life span in the country has improved

significantly.

2.3.6 Confidence Interval for the Median, based on the Wilcoxon Signed-Ranks Test

Arithmetic Procedure

Step1: Find the means,

of all possible pairs of observation

from the sample

observation

, 1 .

u i j n



   

One-Sample Nonparametric Methods

such averages, distributed symmetrically about the median.

Step 2: Arrange the

in an increasing order of magnitude.

Step 3: The median of the

is a point estimate of the population median.

Step 4: Find, from the Wilcoxon Signed Ranks Test table,

corresponding to the

sample size n and appropriate value of p as determined by the desired confidence

level. When the confidence coefficient is

If the exact value

of p cannot be found in the Wilcoxon signed ranks test table, we choose a closer

neighbouring value.

Step 5: The end points of the confidence interval are the k th smallest and k th largest values of

where k = t + 1, where t is either value in the column labelled T corresponding

to n and the value of p selected (see Wayne, 1978).

Example 2.9

Determine the 95% confidence interval for the population median by the Wilcoxon Signed-

ranks procedure using the following data:

Solution

All the 55 possible pairs of means from the observations are given in the Table 2.5.

Table 2.5: All possible pairs of means from the observations

Thus, a point estimate of the population median

is the 28th observation of the ordered data

in Table 2.5. This is 32. From the Wilcoxon signed ranks test table,

One-Sample Nonparametric Methods

k = t + 1 = 9. Therefore the 9th observation in the ordered array in Table 2.5 is the lower limit

and the 9th observation from the largest value locates the upper limit

Therefore the 95% confidence interval for

Large Sample Approximation

With samples larger than 30, we cannot use the Wilcoxon signed-ranks table to determine k.

A large sample approximation of k is however given by (see Wayne, 1978)

( 1) 1

4( 1)(2 1) .

nn n n n











Exercise 2(a)

1. The median age of the onset of diabetes is thought to be 45 years. The ages at onset of a

random sample of 16 people with diabetes are:

Perform the

(a) sign test, (b) Wilcoxon signed-ranks test,

to determine if there is any evidence to conclude that the median age of the onset of

diabetes differs significantly from 45 years. Take α = 0.05.

2. Recent studies of the private practices of physicians who saw no Medicaid patients

suggested that the median length of each patient visit was 22 minutes. It is believed that

the median visit length in practices with a large Medicaid load is shorter than 22 minutes.

A random sample of 20 visits in practices with a large Medicaid load yielded, in order,

the following visit lengths:

(a) Use the large sample approximation of the sign test to determine if there is

sufficient evidence to conclude, at the 1% level of significance, that the average visit

length in practices with a large Medicaid load is shorter than 22 minutes?

(b) Based on the sign test, construct a 95% confidence interval for the median visit length

in practices with a large Medicaid load.

3. The following are the blood glucose levels of 12 patients who attend St. Thomas Hospital:

One-Sample Nonparametric Methods

Perform the Wilcoxon signed ranks test to determine if we can conclude on the basis of

these data that the average glucose level in the population is greater than 96 mg/dl? Take

α = 0.05.

4. From a random sample of 14 students from Accra Catholic Senior High School, the body

masses of 9 students were found to be less than 38 kg whilst those of 4 students exceeded

38 kg with the remaining students recording exactly 38 kg. Can we conclude, based on a

sign test, that the average body mass of students from the school is less than 38 kg?

5. In a sample of 25 adolescents who served as the subjects in an immunologic study, one

variable of interest was the diameter of skin test reaction to an antigen. The sample

observations, in mm erythema, were as follows:

Use the large sample approximation of the Wilcoxon signed ranks test to determine if

we can conclude from these data that the population average is less than 30 mm.

Take α = 0.05.

6. Barrett (1991) reported data on eight cases of umbilical cord prolapse. The maternal ages

were 25, 28, 17, 26, 27, 18, 25, and 30.

(a) Perform the Wilcoxon signed ranks test to determine if there is enough evidence,

based on the data, that the average age of the population from which the sample may

be presumed to have been drawn is greater than 20 years. Take α = 0.01.

(b) Based on the Wilcoxon signed ranks test, construct a 99% confidence interval for the

population median.

7. Out of a random sample of 100 recorded deaths in a certain country during the past year,

68 of them were more than 65 years whilst the remaining 32 were below 65 years. Perform

a sign test to determine if we can we conclude that the average life span in the country is

greater than 65 years. Use α = 0.05.

8. Recent studies of the private practices of physicians who saw no Medicaid patients

suggested that the median length of each patient visit was 22 minutes. It is believed that

the median visit length in practices with a large Medicaid load is shorter than 22 minutes.

A random sample of 20 visits in practices with a large Medicaid load yielded, in order, the

following visit lengths:

One-Sample Nonparametric Methods

Based on the large sample approximation of the sign test, is there sufficient evidence to

conclude that the average visit length in practices with a large Medicaid load is shorter

than 22 minutes?

9. To determine whether the median life span of certain spices of animal is greater than 5

years, a random sample of 25 observations were made and life span in years is the

following:

At 0.05 level of significant, use the large sample approximation of the sign test to

determine if the average life span is greater than 5 years.

10. A physician states that the median number of times he sees each of his patients during the

year is five. In order to evaluate the validity of this statement, he randomly selects ten of

his patients and determines the number of office visits each of them made during the past

year. He obtains the following values for the ten patients in his sample: 9, 10, 8, 4, 8, 3,

0, 10, 15, 9. Do the data support his contention that the median number of times he sees a

patient is five?

11. Moore and Ogletree (1973) investigated the readiness of pupils at the beginning of the

first grade. They compared scores on a readiness test of pupils who had attended a head

start program for a full year with the scores of those who had not. The readiness test scores

of 10 pupils who did not attend a Head Start program are as follows: 33, 19, 40, 35, 51,

41, 27, 55, 39, 21. Can we conclude, based on the Wilcoxon signed ranks test, that the

median score of the population represented by this sample is less than 45.3? Take

 = 0.05.

12. Abu-Ayyash (1972) found that the median education of heads of households living in

mobile homes in a certain area was 11.6 years. Suppose that a similar survey conducted

in another area revealed the educational levels of heads of households as shown in the

following data.

Based on the sign test, can we conclude that the average educational level of the

population represented by this sample is less than 11.6 years? Take  = 0.05.

13. Lenzer et al. (1973) reported the endurance score of animals during a 48-hour session of

discrimination responding. The median score for an animal with electrodes implanted in

One-Sample Nonparametric Methods

the hypothalamus was 97.5. Suppose that the experiment was duplicated in another

laboratory, except that electrodes were implanted in the forebrain in 12 animals. Assume

that investigators observed the endurance score shown in the following table.

Use the one-sample sign test to see whether the investigators may conclude at the 0.05

level of significance that the median endurance score of animals with electrodes implanted

in the forebrain is less than 97.5.

14. Iwamoto (1971) found that the mean weight of a sample of a particular species of adult

female monkey from a certain locality was 8.41 kg. Suppose that a sample of adult females

of the same species from another locality yielded the weights as shown in the following

table. By using the one-sample sign test, can we conclude, at the 0.05 level of significance,

that the median weight of the population from which this second sample was drawn is

greater than 8.41 kg?

2.4 The binomial test

Inferences concerning proportions are required in many areas. The population proportion is a

parameter of frequent interest in research and decision-making activities. The politician is

interested in knowing what proportion of voters will vote for him in the next election. All

manufacturing firms are concerned about the proportion of defective items when a shipment

is made. A market analyst may wish to know the proportion of families in a certain area who

have central air conditioning. A sociologist may want to know the proportion of heads of

household in a certain area who are women. Many questions of interest to the health worker

relate to the population proportion. What proportion of patients who receive a particular

treatment recover? What proportion of a population has a certain disease?

When it is impossible or impractical to survey the total population, researchers base

decision regarding population proportions, on inferences made by analyzing samples drawn

from the population. As usual, inference may take the form of interval estimation or hypothesis

testing.

Sometimes, we want to draw inferences concerning the total number, the proportion or

percentage of units in the population that possess some characteristic or attribute or fall into

some defined class. A random sample of size

is drawn from a population. Suppose we wish

One-Sample Nonparametric Methods

to estimate the proportion,

of units in the population that belong to some definite class in

the population.

Testing hypotheses about population proportions is carried out in much the same way as

for median when the assumptions necessary for the test are satisfied.

2.4.1 Assumptions

1. The data consist of a sample of the outcomes of n repetitions of some process. Each

outcome consists of either a 'success' or a 'failure'. The proportion of the sample having

a characteristic of interest is

an estimate of the population proportion p, where S

is the number of successes (the total number of sampling units with a particular

characteristic of interest).

2. The n trials are independent.

3. The probability of a success p, remains constant from trial to trial.

2.4.2 Hypotheses

One-sided and two-sided tests may be made, depending on the question being asked. In other

words, we can test

against one of the alternatives

Test statistic

Since we are interested in the number of successes S, our test statistic is S. When

has the binomial distribution with parameters

Decision rule

Sufficiently small values of S lead to the rejection of

denote the observed

value of S. We reject

at the α level of significance if the

 

-value , . p P S s n p 

One-Sample Nonparametric Methods

Test statistic

The test statistic therefore is S. When

Decision rule

For sufficiently large values of S, we reject

at α level of

significance if the

is the observed value

of S.

Here, we test

Test statistic

The test statistic therefore is S. When

Decision rule

For sufficiently large or sufficiently small values of S, we reject

The hypothesized

proportion is

whilst the observed sample proportion

is the observed

value of S. The p-value of the test is defined by

 

o 0 0

2 , , if ,

-value

2 , , if .

P S s n p p p

pP S s n p p p





 





at the α level of significance if the

Example 2.10

In a survey of injection drug users in a large city, Coates et al. (1991) found that 2 out of 12

were HIV positive. We wish to know if we can conclude, at the 10% level of significance, that

fewer than 40% of the injection drug users in the sampled population are HIV positive.

Solution

The parameter of interest is p, the proportion of injection drug users in the sampled population

who are HIV positive. We wish to test

One-Sample Nonparametric Methods

The test statistic is S, the number of injection drug users in the

sample who are HIV positive. When

is true, S has the binomial distribution with

parameters

denote the observed value of the test statistic. We reject

at the 0.1 level of

significance if the

 

-value 12, 0.4 . p P S s 

 

-value 212, 0.4 0.0834. p P S   

Since the p-value, 0.0834 < 0.1, we reject

at the 10% level of significance and conclude

that fewer than 40% of the injection drug users in the sampled population are HIV positive.

Example 2.11

A researcher found anterior sub-capsular vacuoles in the eyes of 6 out of 15 diabetic patients.

Using the binomial test, can we conclude that the population proportion with the condition of

interest is greater than 0.2? Use  = 0.05.

Solution

The parameter of interest is

the proportion of diabetic patients in the population with

anterior sub-capsular vacuoles in the eyes. We wish to test

The test statistic is S, the number of diabetic patients in the sample with anterior sub-capsular

vacuoles in the eyes. When

denote the observed value of the test statistic. We reject

at the 0.05 level of

significance if the

 

-value 15, 0.2 . p P S s 

   

-value 615, 0.2 1 615, 0.2 1 0.9819 0.0181. p P S P S       

Since the p-value 0.0181 < 0.05, we reject

at the 0.05 level of significance and conclude

that the population proportion

One-Sample Nonparametric Methods

2.4.3 Large sample approximation

1. If S is a binomial random variable with parameters n and

then the expectation and

variance of S are given by

2. Thus, when the null hypothesis is true, and n is large,

follows an approximate standard normal distribution, N(0, 1).

3. The normal approximation to the binomial distribution is good if

3. Note that the sign-test discussed earlier is a special case of the binomial test, in which

Example 2.12

A commonly prescribed drug for relieving nervous tension is believed to be only 60%

effective. Experimental results with a new drug administered to a random sample of 100 adults

who were suffering from nervous tension show that 70 received relief. Is this sufficient

evidence to conclude that the new drug is superior to the one commonly prescribed? Use

α = 0.05.

Solution

The parameter of interest is p, the proportion of adults in the population who received relief

from nervous tension. We wish to test

at α = 0.05 level of significance. The test statistic is

are greater than 5 and so Z is

approximately N(0, 1) when H 0 is true. We reject H 0 if z, the computed Z value is greater than

70 100 0.6

100 0.6 0.4 2. . 0412 z





Since 2.0412 > 1.645, we reject H 0 at the 0.05 level of significance. We conclude that the new

drug is superior to the one commonly prescribed.

One-Sample Nonparametric Methods

2.4.4 Large sample confidence interval for p

is the proportion of observations in a random sample of size n that belongs to a class of

interest, then an approximate 100(1 – )% confidence interval of the proportion p of the

population that belongs to this class is (see Ofosu & Hesse, 2011)

ˆ ˆ ˆ ˆ

(1 ) (1 )

ˆˆ

p p p p

p z p p z



   

   

is the proportion of the sample with the characteristic of interest.

Example 2.13

In a certain university, the proportion of students who have diabetes mellitus is p. Of the 500

students selected at random from the university, 6 had diabetes mellitus.

(a) Find a point estimate of p. (b) Construct a 90% confidence interval for p.

Solution

(a) A point estimate of p is given by

are of sufficient magnitude to justify the

use of the formula for constructing a confidence interval for p. To construct a 90%

confidence interval, we put

. This gives  = 0.10. From the standard normal

table, we find that

. Hence a 90% confidence interval for p is

0.012 0.988 0.012 0.988

500 500

0.012 1.645 0.012 1.645 , p



   

Exercise 2(b)

1. A researcher found that 66% of a sample of 14 infants had completed the hepatitis B

vaccine series. Can we conclude on the basis of these data that, in the sampled population,

more than 60% have completed the series? Use α = 0.01.

2. A health survey of 12 male inmates 50 years of age and older residing in a state's

correctional facilities was made. They found that 22% of the respondents reported a history

of venereal disease. On the basis of these findings, can we conclude that in the sampled

population, more than 15% have a history of venereal disease? Use α = 0.05.

One-Sample Nonparametric Methods

3. The fraction of defective integrated circuits produced in a photolithography process is

being studied. A random sample of 300 circuits is tested, revealing 13 defectives. Use the

data to test H o:

against H 1 : p  0.05. Use α = 0.05.

4. A commonly prescribed drug for relieving nervous tension is believed to be only 70%

effective. Experimental results with a new drug administered to a random sample of 10

adults who were suffering from nervous tension show that 8 received relief. Is this

sufficient evidence to conclude that the new drug is superior to the one commonly

prescribed? Use α = 0.05.

5. Suppose that, in the past, 40% of all adults favoured capital punishment. Do we have reason

to believe that the proportion of adults favouring capital punishment today has increased

if, in a random sample of 15 adults, 8 favour capital punishment? Use α = 0.05.

2.5 The one-sample runs test for randomness

In many situations we want to know whether we can conclude that a set of observations

constitute a random sample from an infinite population. Test for randomness is of major

importance because the assumption of randomness underlies statistical inference (see Ofosu

& Hesse, 2011). In addition, tests for randomness are important for time series analysis. The

runs test procedure is used to examine whether or not a sequence of sample values is random.

Consider, for example, the following sequence of sample values

Each observation is denoted by a '+' sign if it is more than the previous observation and by a

'– ' sign if it is less than the previous observation as shown in the following table.

A run is a sequence of signs of the same kind bounded by signs of other kind. In this case, we

doubt the sequence's randomness, since there are only two runs.

If the order of occurrence were

One-Sample Nonparametric Methods

we would doubt the sequence's randomness because there are too many runs (10 in this

instance).

Too few runs indicate that the sequence is not random (has persistency) whilst too many

runs also indicate that the sequence is not random (is zigzag). Let us now consider the one

sample runs test. This procedure helps us to decide whether a sequence of sample values is the

result of a random process.

Assumptions

The data available for analysis consist of a sequence of sample values, recorded in the order

of their occurrence.

Hypotheses

We wish to test

The sequence of sample values is random, against

The sequence of sample values is not random.

Test Statistic

The test statistic is R, the total number of runs.

Decision Rule

Since the null hypothesis does not specify the direction, a two-sided test is appropriate. The

critical value,

for the test is obtained from Table A.5, in the Appendix, for a given sample

size n and at a desired level of significance α. If

Tied Values

If an observation is equal to its preceding observation, denote it by zero. While counting the

number of runs, ignore it and reduce the value of n accordingly.

Large Sample Sizes

then the test statistic can be approximated by

One-Sample Nonparametric Methods

at the level of significance α if

where z is the computed value of Z.

Example 2.15

The following are the blood glucose levels of 12 patients who attend St. Thomas Hospital:

Test, at the 0.05 level of significance whether the sequence is random?

The sequence is random, against

The sequence is not random.

The test statistic is

the number of runs.

We reject

at the 0.05 level of significance if

where r is

the observed value of R and

is the critical value. It can be seen that:

Here n = 11 and the number of runs r = 7. From the table of critical values for runs up and

down test,

(see Table A.5, in the Appendix)

Note: Since two consecutive observations are the same, that is 110, we use n = 11 instead

of n = 12.

Since

at the 0.05 level of significance and therefore conclude

that the sequence is random.

Exercise 2(c)

1. The following data show the average daily temperatures recorded at Accra, Ghana, for 15

consecutive days during June 2017.

One-Sample Nonparametric Methods

Test, at the 0.05 level of significance, if we can conclude that the pattern of temperature

is random?

2. The following data show the inflation rate in Ghana from 2006 to 2017. Test, at the 0.05

level of significance, if we can conclude that the pattern of year inflation is random?

References

Abu-Ayyash, A. Y. (1972). The mobile home: A neglected phenomenon in geographic

research. Geog. Bull., 5, 28 – 30.

Barrett, J. M. (1991). Funic reduction for the management of umbilical cord prolapse.

American Journal of Obstetrics and Gynaecology. 165 , 654-657.

Coates, R., Millson, M., Myers, T. (1991). The benefits of HIV Antibody testing of saliva in

field research. Canadian Journal of Public Health, 82, 397-398.

Iwamoto, M. (1971). Morphological studies of Macaca Fuscata: VI, Somatometry. Primates,

12, 151 – 174.

Lenzer, Irmingard I., and White, C. A. (1973). Statistical effects in continuous reinforcement

and successive sensory discrimination situations. Physiolog. Psychol, 1, 77 – 82.

Moore, R. C and Ogletree, E. J. (1973). A comparison of the readiness and intelligence of first

grade children with and without a full year of Head Start training. Education, 93, 266 – 270.

Ofosu, J. B., & Hesse, C. A. (2011). Elementary Statistical Methods. EPP Books Services,

Accra.

Wayne, W. D. (1978). Applied nonparametric statistics. Houghton Mifflin company, London.

... We used the Wilcoxon signed-rank test (nonparametric bivariate analysis) to observe attitudinal changes after the provision of the information by the sensor. This statistical test allowed us to compare measurements at 2 points in time (Akrong et al. 2017) and determine if there were significant differences in the subjective evaluation of air pollution and the perception of risk that the participants report before and after being informed about the indoor situation by the sensor. We report the scores' mean and standard deviation to facilitate interpretation. ...

In southern Chile, epidemiological studies have linked high levels of air pollution produced by the use of wood-burning stoves with the incidence of numerous diseases. Using a quasi-experimental design, this study explores the potential of participatory sensing strategies to transform experiences, perceptions, attitudes, and daily routine activities in 15 households equipped with wood-burning stoves in the city of Temuco, Chile. The results suggest that the experience of using a low-cost sensor improves household members' awareness levels of air pollution. However, the information provided by the sensors does not seem to improve the participants' self-efficacy to control air quality and protect themselves from pollution. The high degree of involvement with the participatory sensing experience indicates that the distribution of low-cost sensors could be a key element in the risk communication policies.

Christian Akrong Hesse

The purpose of this book is to acquaint the reader with the increasing number of applications of statistics in engineering and the applied sciences. It can be used as a textbook for a first course in statistical methods in Universities and Polytechnics. The book can also be used by decision makers and researchers to either gain basic understanding or to extend their knowledge of some of the most commonly used statistical methods. Our goal is to introduce the basic theory without getting too involved in mathematical detail, and thus to enable a larger proportion of the book to be devoted to practical applications. Because of this, some results are stated without proof, where this is unlikely to affect the reader's comprehension. However, we have tried to avoid the cook-book approach to statistics by carefully explaining the basic concepts of the subject, such as probability and sampling distributions; these the reader must understand. The worst abuses of statistics occur when scientists try to analyze their data by substituting measurements into statistical formulae which they do not understand.

Irmingard I. Lenzer
Carol A. White

Nine rats with forebrain and hypothalamic electrodes were trained on a sensory discrimination task with S–intervals ranging from 12 to 60 sec. Their performance during a 48-h discrimination session was compared to that during a 48-h CRF session. The response measures showed that forebrain Ss performed more poorly than hypothalamic Ss during the discrimination session, even though performance of forebrain animals was better in the discrimination session than in the CRF session when time until the first 5-min pause was considered. In the CRF session, there was no difference in performance of forebrain and hypothalamic animals, except in terms of time till first pause. The results are discussed in terms of cumulative effects of forebrain and hypothalamic ESB.

M.G. Kendall
R.M. Sundrum

Dans cette étude, les auteurs passent en revue les méthodes d'inférence connues différemment sous le nom de "non-parametric" et "distribution free". Ils expliquent quelques difficultés de définition des "paramètres" d'une distribution, et définissent une hypothèse comme étant paramétrique si: (a) elle fait une assertion concernant une distribution de fréquence; (b) elle spécifie complètement la distribution excepté pour un nombre fini de paramètres; (c) elle est considérée dans le cadre d'hypothèses alternatives dont la distribution a la même forme. Les auteurs en concluent que l'expression "non-paramétrique" ne doit pas être appliquée à "inférence" mais seulement à des hypothèses. Les auteurs considèrent alors la définition de "distribution free" en rapport avec des tests d'hypothèses. Ils constatent qu'un test comporte quatre éléments: (a) une certaine fonction des observations; (b) la distribution d'échantillonnage de cette fonction des observations; (c) la région critique dans le cas que l'hypothèse est correcte; (d) la forme de la région critique sous des hypothèses alternatives. Un test est défini comme étant "distribution free" seulement si au moins les trois premiers de ces éléments sont indépendants de la distribution de base. Certains types de tests ne satisfont pas à ces exigences et, par conséquent, ne sont pas considérés comme "distribution free". Ce sont (1) des tests dépendant d'inégalités du type Tchebycheff, (2) des tests de la valeur d'un ajustement ("goodness of fit"), (3) des tests conditionnels. Les auteurs considèrent alors quelques tests strictement "distribution free" (tests pour la valeur centrale, tests d'homogénéité, et tests d'indépendance) en se référant spécialement à ceux basés sur le rang des observations. Il apparaît que tous les tests qui sont "distribution free" dans le sens défini par les auteurs, peuvent être dérivés de propriétés de rang.

Mitsuo Iwamoto

Measurements of various parts of the head and body and weighing the body were carried out on about 170 adult Japanese monkeys (Macaca fuscata) and the results are noted with separate statistics for respective local groups. Intraspecific comparisons in the Japanese monkey and interspecific comparisons in macaques are discussed from the somatometrical point of view. Among macaques, the Japanese monkey has a comparatively large body, a very short tail, relatively wide biacromial and biiliac breadths, and markedly la ge intermembral and intercrural indices. The Japanese monkey itself shows various local variations. The most conspicuous difference is to be found between the so-called Yaku monkey living on Yaku islet (Yakushima), south of Kyushu, and the monkeys living in other parts of Japan, and, therefore, it is understandable that the Yaku monkey has been distinguished as a subspecies (M. f. yakui) of the Japanese monkey. The Yaku monkey has a somewhat small body, a relatively large head, wide hips, and slender hands and feet.