R-code 5

From Biocourse

Jump to: navigation, search

표준 정규 분포에서 alpha에 따른 Z값 :

> alpha = c( 0.2, 0.1, 0.05, 0.001)

> zstar = qnorm(1 - alpha/2)

> zstar

[1] 1.281552 1.644854 1.959964 3.290527

 


 

 표준 정규 분포에서 Z값에 따른 alpha :

> 2 * ( 1 - pnorm(zstar) )

[1] 0.200 0.100 0.050 0.001



신뢰구간에 대하여

> m = 50; 
> n=20; 
> p = .5;   # toss 20 coins 50 times

> phat = rbinom(m,n,p)/n          # divide by n for proportions
> SE = sqrt(phat*(1-phat)/n)     # compute SE
> alpha = 0.10;
> zstar = qnorm(1-alpha/2)

> matplot(rbind(phat - zstar*SE, phat + zstar*SE), rbind(1:m,1:m),type="l",lty=1)
> abline(v=p)                             # draw line for p=0.5

 

 image : r5-1.jpeg


 

Proportion Test

## 100번 동전을 던졌을때 42번 앞면이 나왔을때 H0 : P(동전 앞면) = 0.5 , H1 : H0가 거짓

## Case1 : 디폴트일 경우 95%의 신뢰구간을 사용

> prop.test(42, 100)

        1-sample proportions test with continuity correction

data:  42 out of 100, null probability 0.5
X-squared = 2.25, df = 1, p-value = 0.1336
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.3233236 0.5228954
sample estimates:
   p
0.42 


## Case2 : 신뢰구간을 90%로 지정하였을 경우

> prop.test(42,100,conf.level=0.90)

        1-sample proportions test with continuity correction

data:  42 out of 100, null probability 0.5
X-squared = 2.25, df = 1, p-value = 0.1336
alternative hypothesis: true p is not equal to 0.5
90 percent confidence interval:
 0.3372368 0.5072341
sample estimates:
   p
0.42


## 두 Case모두 p-value가 0.1336으로 귀무가설을 기각할 수 없다.


> prop.test( 42, 100, p=0.5)

        1-sample proportions test with continuity correction

data:  42 out of 100, null probability 0.5
X-squared = 2.25, df = 1, p-value = 0.1336
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.3233236 0.5228954
sample estimates:
   p
0.42 

## 귀무가설을 H0: p = 0.5; H1 : p != 0.5 을 검정한다.

## p-value가 0.1336으로 귀무가설을 기각하지 못한다.


> prop.test( 420, 1000, p=0.5)

         1-sample proportions test with continuity correction

data:  420 out of 1000, null probability 0.5
X-squared = 25.281, df = 1, p-value = 4.956e-07
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.3892796 0.4513427
sample estimates:
   p
0.42 

## 귀무가설을 H0: p = 0.5; H1 : p != 0.5 을 검정한다.

## p-value가 4.956e-07으로 귀무가설을 기각한다.

## 위의 예제 보다 sample수가 많아짐에 따라서 p-value의 차이가 존재한다.


Z-test

> x = c(175, 176, 173, 175, 174, 173, 173, 176, 173, 179)

## simple.z.test라는 함수를 만든다.

> simple.z.test = function(x,sigma,conf.level=0.95) {
 n = length(x);xbar=mean(x)
 alpha = 1 - conf.level
 zstar = qnorm(1-alpha/2)
 SE = sigma/sqrt(n)
 xbar + c(-zstar*SE,zstar*SE)
}

## now try it
> simple.z.test(x,1.5)
[1] 173.7703 175.6297

 


  T-test

> t.test(x)

        One Sample t-test

data:  x
t = 283.8161, df = 9, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 173.3076 176.0924
sample estimates:
mean of x
    174.7 

## 아무런 옵션이 없으면 H0: mean(x) = 0, H1 : H0가 거짓이다.

##  p-value가 < 2.2e-16으로 귀무가설을 기각한다.


 ## 평균에 대한 검정 H0 : mean(x) = 25; H1 : H0가 거짓

> xbar = 22; s=1.5; n=10

> t = (xbar - 25) / ( s/sqrt(n) )

> t

[1] -6.324555

> pt(t, df=n-1)
[1] 6.846828e-05


 ## 정규분포와 자유도에 따른 t분포

> xvals=seq(-4,4,.01)
> plot(xvals,dnorm(xvals),type="l")
> for(i in c(2,5,10,20,50)) points(xvals,dt(xvals,df=i),type="l",lty=i)

> title("normal density and t density for various d.f.s")

> legend(2.2,0.35,legend=c("normal","2 d.f","5 d.f","10 d.f", "20 d.f", "50 d.f"),lty=c(1,2,5,10,20,50))


image : r5-2.jpeg


 


  ## t 와 z의 p-value의 비교

> x = rnorm(100)

> y = rt(100,9)

> boxplot(x, y)

image: r5-3.jpeg

> qqnorm(x); qqline(x)

image: r5-4.jpeg

> qqnorm(y); qqline(y)

image: r5-5.jpeg

 


 
## 비모수 검정 ( 윌콕슨 검정 )

> x = c(110, 12, 2.5, 98, 1017, 540, 54, 4.3, 150, 432)

> wilcox.test(x,conf.int=TRUE)

        Wilcoxon signed rank test

data:  x
V = 55, p-value = 0.001953
alternative hypothesis: true mu is not equal to 0
95 percent confidence interval:
  33.0 514.5
sample estimates:
(pseudo)median
           150 

## 아무런 조건이 없을경우 H0 : median(x) = 0 ; H1 : H0가 거짓


## 중앙값 검정  H0 : median(x) = 5; H1 : H0가 거짓

> x = c(12.8,3.5,2.9,9.4,8.7,.7,.2,2.8,1.9,2.8,3.1,15.8)

> stem(x)

 The decimal point is 1 digit(s) to the right of the |

  0 | 01233334
  0 | 99
  1 | 3
  1 | 6

> wilcox.test(x,mu=5,alt="greater")

      Wilcoxon signed rank test with continuity correction

data:  x
V = 39, p-value = 0.5156
alternative hypothesis: true mu is greater than 5 

## p-value 가 0.5156이므로 귀무가설을 기각 할수 없다.



## Two sample 검정


## 2변수 비율 검정 

> prop.test(c(45,56),c(45+35,56+47))

         2-sample test for equality of proportions with continuity correction

data:  c(45, 56) out of c(45 + 35, 56 + 47)
X-squared = 0.0108, df = 1, p-value = 0.9172
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1374478  0.1750692
sample estimates:
   prop 1    prop 2
0.5625000 0.5436893

## 귀무가설 H0 : p1 = p2 ; H1 : p1 != p2

## p-value가 0.9172이므로 귀무가설을 기각 못함.

 


  ## Two sample T 검정

> x = c(15, 10, 13, 7, 9, 8, 21, 9, 14, 8)

> y = c(15, 14, 12, 8, 14, 7, 16, 10, 15, 12)

> t.test(x,y,alt="less",var.equal=TRUE)

        Two Sample t-test

data:  x and y
t = -0.5331, df = 18, p-value = 0.3002
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
     -Inf 2.027436
sample estimates:
mean of x mean of y
     11.4      12.3

## 귀무가설 H0 : mean(x) = mean(y); H1 : mean(x) < mean(y)

## p-value가 0.3002로 귀무가설을 기각 못함.


> t.test(x,y,alt="less")

        Welch Two Sample t-test

data:  x and y
t = -0.5331, df = 16.245, p-value = 0.3006
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
     -Inf 2.044664
sample estimates:
mean of x mean of y
     11.4      12.3

## 귀무가설 H0 : mean(x) = mean(y); H1 : mean(x) < mean(y)

## p-value가 0.3006로 귀무가설을 기각 못함.

## 위의 예와 차이는 두 변수의 분산을 다르다고 가정하여 검정을 함.



 ## 쌍 T 검정

> x = c(3, 0, 5, 2, 5, 5, 5, 4, 4, 5)

> y = c(2, 1, 4, 1, 4, 3, 3, 2, 3, 5)

> t.test(x,y,paired=TRUE)

        Paired t-test

data:  x and y
t = 3.3541, df = 9, p-value = 0.008468
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.3255550 1.6744450
sample estimates:
mean of the differences
                      1

## x 변수와 y변수가 쌍을 이룬 자료임

## 귀무가설은 H0 : d = 0; H1 : H0가 거짓임.

## di = xi - yi임

## p-value가 0.008468로 귀무가설을 기각함.

## 만약 두 변수의 연관성을 무시하고, 독립이라 가정하고 검정을 하였을시는 다음과 같다.

 > t.test(x,y)

        Welch Two Sample t-test

data:  x and y
t = 1.478, df = 16.999, p-value = 0.1577
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.4274951  2.4274951
sample estimates:
mean of x mean of y
      3.8       2.8

## p-value가 0.1577로 귀무가설을 기각하지 못한다.