4.8.15

Recoding multiple variables in a data frame

I had a dataset with a number of variables on knowledge. The response options for the variables were coded as 1, 2 and 3. I wanted to recode them to 1 or 0 depending on whether the answers were right or wrong. However, it would be inefficient to recode every single variable in R, although it could be just a matter of copy/paste/edit task. I was thinking of looping through the relevant columns, assigning new values in each iteration, but looping would make the code difficult to digest and ugly to look at. I simply could not stand ugly codes

I had gone through a number of blogs and websites to find the most elegant way to solve this problem, but to no avail. The best solution I could find was at http://rprogramming.net/recode-data-in-r/, but I didn't like apply+function as it would be an overkill for medical personnel and researchers (of whom I deal with on daily basis) without programming basics. I need an extremely simple solution so that people won't shy away from R. I could not get the problem off my mind until I finally found a simple solution on my own :-) Here I share the codes:
library("car")

# create a data frame
data = read.table(header=T, text="
Q1  Q2  Q3  Q4  Q5
2   2   2   2   1
2   1   2   1   2
2   2   2   2   2
3   3   3   1   1
2   2   2   2   1
                  ")

# Let say for Q1, Q3 and Q5, the correct option is 3, thus recoded as 1, the rest as 0

data.recode = data
mat = as.matrix(data.recode[c(1,3,5)])
mat = recode(mat, "3=1; else=0")
data.recode[c(1,3,5)] = as.data.frame(mat)

# Let say for Q2 and Q4, the correct option is 2, thus recoded as 1, the rest as 0

mat = as.matrix(data.recode[c(2,4)])
mat = recode(mat, "2=1; else=0")
data.recode[c(2,4)] = as.data.frame(mat)

# Compare the original data with recoded ones

data
data.recode
Outputs:
> data
  Q1 Q2 Q3 Q4 Q5
1  2  2  2  2  1
2  2  1  2  1  2
3  2  2  2  2  2
4  3  3  3  1  1
5  2  2  2  2  1
> data.recode
  Q1 Q2 Q3 Q4 Q5
1  0  1  0  1  0
2  0  0  0  0  0
3  0  1  0  1  0
4  1  0  1  0  0
5  0  1  0  1  0

7.2.15

Hows many cows? in R

Our cows problem as posted before here can also be solved in R (obviously). Compare our results with the simulation results.
################################
# We are looking at 0.7 cutoff
# point for sensitivity
################################
# Using R
# Iterate over values
###############################

n = numeric(1)
n.low = 4 # lower limit of n
n.high = 7 # upper limit of n
pr = numeric(0)
p = 0.25 # prevalence of dss
n_ = numeric(0)
pr_ = numeric(0)

for(i in n.low:n.high) {
  n = i
  print(n)
  pr = 1 - pbinom(0, n, p)
  print(pr)
  n_[i] = n
  pr_[i] = pr
}
The results are as follows:
## [1] 4
## [1] 0.6835937
## [1] 5
## [1] 0.7626953
## [1] 6
## [1] 0.8220215
## [1] 7
## [1] 0.8665161
# sample size vs sensitivity
cbind(n=n_[n.low:n.high], Sensitivity=pr_[n.low:n.high])
##      n Sensitivity
## [1,] 4   0.6835937
## [2,] 5   0.7626953
## [3,] 6   0.8220215
## [4,] 7   0.8665161
# only 0.7 & above
cbind(n=n_[n.low:n.high], Sensitivity=pr_[n.low:n.high])[pr_[n.low:n.high] > 0.7, ]
##      n Sensitivity
## [1,] 5   0.7626953
## [2,] 6   0.8220215
## [3,] 7   0.8665161

How many cows? Simulation in R

In continuation of our cows problem as posted before here, we can also solved the problem by simulation approach. I would like to thank Mike Meredith for his contribution and guidance in producing the codes below. The code can be obtained here.
################################
# Try with different sized herds
################################
# We are looking at 0.7 cutoff
# point for sensitivity
################################
# Simulation approach
###############################

B = 10000
p = 0.25
n = numeric(1) # sample size
nCow = numeric(10) # number of cows per farm per iteration
nCow_ = numeric(10) # nCow, for tabulation with cbind
sensitivity = numeric(0)
infected = numeric(B) # real no of infected cows per farm
detected = numeric(B) # no of infected cows in sample of size n
n.low = 4 # lower limit of sample size
n.high = 7 # upper limit of sample size
sen_ = numeric(0) # mean sensitivity for all sizes of farm
n_ = numeric(0) # sample size
cutpt = 0.7 # Acceptable cutoff point

for(x in n.low:n.high) {
  n = x
  print("+++++++")
  print(n)
  for(c in 1:50) {
    nCow = 10*c
    for(i in 1:B) {
      cows = rbinom(nCow, 1, p)
      infected[i] = sum(cows)
      s = sample(cows, n)
      detected[i] = sum(s)
    }
    nCow_[c] = nCow
    detected.if.present = detected[infected > 0]
    sensitivity[c] = mean(detected.if.present > 0)
  }
  temp_s = mean(sensitivity)
  temp_c = cbind(sensitivity=sensitivity, ncow_=nCow_)
  print(temp_c)
  print(n)
  print(temp_s)
  sen_[x] = temp_s
  n_[x] = n
}
The results are as follows:
## [1] "+++++++"
## [1] 4
##       sensitivity ncow_
##  [1,]   0.7217031    10
##  [2,]   0.6771314    20
##  [3,]   0.6893379    30
##  [4,]   0.6805000    40
##  [5,]   0.6850000    50
##  [6,]   0.6796000    60
##  [7,]   0.6784000    70
##  [8,]   0.6769000    80
##  [9,]   0.6878000    90
## [10,]   0.6863000   100
## [11,]   0.6904000   110
## [12,]   0.6852000   120
## [13,]   0.6882000   130
## [14,]   0.6873000   140
## [15,]   0.6846000   150
## [16,]   0.6837000   160
## [17,]   0.6842000   170
## [18,]   0.6875000   180
## [19,]   0.6872000   190
## [20,]   0.6856000   200
## [21,]   0.6859000   210
## [22,]   0.6796000   220
## [23,]   0.6905000   230
## [24,]   0.6801000   240
## [25,]   0.6783000   250
## [26,]   0.6769000   260
## [27,]   0.6793000   270
## [28,]   0.6892000   280
## [29,]   0.6802000   290
## [30,]   0.6831000   300
## [31,]   0.6772000   310
## [32,]   0.6891000   320
## [33,]   0.6845000   330
## [34,]   0.6778000   340
## [35,]   0.6789000   350
## [36,]   0.6863000   360
## [37,]   0.6862000   370
## [38,]   0.6796000   380
## [39,]   0.6806000   390
## [40,]   0.6878000   400
## [41,]   0.6836000   410
## [42,]   0.6827000   420
## [43,]   0.6761000   430
## [44,]   0.6823000   440
## [45,]   0.6847000   450
## [46,]   0.6860000   460
## [47,]   0.6818000   470
## [48,]   0.6789000   480
## [49,]   0.6813000   490
## [50,]   0.6776000   500
## [1] 4
## [1] 0.6838534
## [1] "+++++++"
## [1] 5
##       sensitivity ncow_
##  [1,]   0.8058705    10
##  [2,]   0.7680433    20
##  [3,]   0.7726545    30
##  [4,]   0.7627000    40
##  [5,]   0.7625000    50
##  [6,]   0.7658000    60
##  [7,]   0.7642000    70
##  [8,]   0.7585000    80
##  [9,]   0.7631000    90
## [10,]   0.7666000   100
## [11,]   0.7630000   110
## [12,]   0.7558000   120
## [13,]   0.7634000   130
## [14,]   0.7595000   140
## [15,]   0.7623000   150
## [16,]   0.7590000   160
## [17,]   0.7623000   170
## [18,]   0.7619000   180
## [19,]   0.7614000   190
## [20,]   0.7583000   200
## [21,]   0.7568000   210
## [22,]   0.7626000   220
## [23,]   0.7618000   230
## [24,]   0.7653000   240
## [25,]   0.7633000   250
## [26,]   0.7595000   260
## [27,]   0.7697000   270
## [28,]   0.7675000   280
## [29,]   0.7623000   290
## [30,]   0.7653000   300
## [31,]   0.7592000   310
## [32,]   0.7574000   320
## [33,]   0.7668000   330
## [34,]   0.7678000   340
## [35,]   0.7595000   350
## [36,]   0.7654000   360
## [37,]   0.7666000   370
## [38,]   0.7665000   380
## [39,]   0.7648000   390
## [40,]   0.7551000   400
## [41,]   0.7713000   410
## [42,]   0.7730000   420
## [43,]   0.7653000   430
## [44,]   0.7577000   440
## [45,]   0.7627000   450
## [46,]   0.7577000   460
## [47,]   0.7735000   470
## [48,]   0.7675000   480
## [49,]   0.7593000   490
## [50,]   0.7601000   500
## [1] 5
## [1] 0.7641634
## [1] "+++++++"
## [1] 6
##       sensitivity ncow_
##  [1,]   0.8661619    10
##  [2,]   0.8254589    20
##  [3,]   0.8242000    30
##  [4,]   0.8219822    40
##  [5,]   0.8192000    50
##  [6,]   0.8272000    60
##  [7,]   0.8175000    70
##  [8,]   0.8182000    80
##  [9,]   0.8251000    90
## [10,]   0.8166000   100
## [11,]   0.8256000   110
## [12,]   0.8178000   120
## [13,]   0.8246000   130
## [14,]   0.8280000   140
## [15,]   0.8172000   150
## [16,]   0.8160000   160
## [17,]   0.8235000   170
## [18,]   0.8167000   180
## [19,]   0.8270000   190
## [20,]   0.8242000   200
## [21,]   0.8208000   210
## [22,]   0.8242000   220
## [23,]   0.8173000   230
## [24,]   0.8203000   240
## [25,]   0.8234000   250
## [26,]   0.8339000   260
## [27,]   0.8180000   270
## [28,]   0.8237000   280
## [29,]   0.8179000   290
## [30,]   0.8208000   300
## [31,]   0.8206000   310
## [32,]   0.8231000   320
## [33,]   0.8174000   330
## [34,]   0.8227000   340
## [35,]   0.8188000   350
## [36,]   0.8241000   360
## [37,]   0.8225000   370
## [38,]   0.8271000   380
## [39,]   0.8225000   390
## [40,]   0.8243000   400
## [41,]   0.8232000   410
## [42,]   0.8219000   420
## [43,]   0.8199000   430
## [44,]   0.8138000   440
## [45,]   0.8287000   450
## [46,]   0.8236000   460
## [47,]   0.8193000   470
## [48,]   0.8209000   480
## [49,]   0.8304000   490
## [50,]   0.8227000   500
## [1] 6
## [1] 0.8230001
## [1] "+++++++"
## [1] 7
##       sensitivity ncow_
##  [1,]   0.9164819    10
##  [2,]   0.8687719    20
##  [3,]   0.8698870    30
##  [4,]   0.8692000    40
##  [5,]   0.8674000    50
##  [6,]   0.8717000    60
##  [7,]   0.8579000    70
##  [8,]   0.8659000    80
##  [9,]   0.8610000    90
## [10,]   0.8633000   100
## [11,]   0.8686000   110
## [12,]   0.8620000   120
## [13,]   0.8652000   130
## [14,]   0.8633000   140
## [15,]   0.8657000   150
## [16,]   0.8637000   160
## [17,]   0.8642000   170
## [18,]   0.8618000   180
## [19,]   0.8628000   190
## [20,]   0.8692000   200
## [21,]   0.8644000   210
## [22,]   0.8641000   220
## [23,]   0.8671000   230
## [24,]   0.8625000   240
## [25,]   0.8625000   250
## [26,]   0.8696000   260
## [27,]   0.8689000   270
## [28,]   0.8565000   280
## [29,]   0.8703000   290
## [30,]   0.8606000   300
## [31,]   0.8695000   310
## [32,]   0.8664000   320
## [33,]   0.8696000   330
## [34,]   0.8642000   340
## [35,]   0.8723000   350
## [36,]   0.8670000   360
## [37,]   0.8700000   370
## [38,]   0.8662000   380
## [39,]   0.8667000   390
## [40,]   0.8642000   400
## [41,]   0.8627000   410
## [42,]   0.8686000   420
## [43,]   0.8669000   430
## [44,]   0.8715000   440
## [45,]   0.8695000   450
## [46,]   0.8706000   460
## [47,]   0.8658000   470
## [48,]   0.8619000   480
## [49,]   0.8660000   490
## [50,]   0.8668000   500
## [1] 7
## [1] 0.8670188
# sample size vs sensitivity
cbind(n=n_[n.low:n.high], Sensitivity=sen_[n.low:n.high])
##      n Sensitivity
## [1,] 4   0.6838534
## [2,] 5   0.7641634
## [3,] 6   0.8230001
## [4,] 7   0.8670188
# only > cutoff point
cbind(n=n_[n.low:n.high], Sensitivity=sen_[n.low:n.high])[sen_[n.low:n.high] > cutpt, ]
##      n Sensitivity
## [1,] 5   0.7641634
## [2,] 6   0.8230001
## [3,] 7   0.8670188