4.8.15

Recoding multiple variables in a data frame

I had a dataset with a number of variables on knowledge. The response options for the variables were coded as 1, 2 and 3. I wanted to recode them to 1 or 0 depending on whether the answers were right or wrong. However, it would be inefficient to recode every single variable in R, although it could be just a matter of copy/paste/edit task. I was thinking of looping through the relevant columns, assigning new values in each iteration, but looping would make the code difficult to digest and ugly to look at. I simply could not stand ugly codes

I had gone through a number of blogs and websites to find the most elegant way to solve this problem, but to no avail. The best solution I could find was at http://rprogramming.net/recode-data-in-r/, but I didn't like apply+function as it would be an overkill for medical personnel and researchers (of whom I deal with on daily basis) without programming basics. I need an extremely simple solution so that people won't shy away from R. I could not get the problem off my mind until I finally found a simple solution on my own :-) Here I share the codes:
library("car")

# create a data frame
data = read.table(header=T, text="
Q1  Q2  Q3  Q4  Q5
2   2   2   2   1
2   1   2   1   2
2   2   2   2   2
3   3   3   1   1
2   2   2   2   1
                  ")

# Let say for Q1, Q3 and Q5, the correct option is 3, thus recoded as 1, the rest as 0

data.recode = data
mat = as.matrix(data.recode[c(1,3,5)])
mat = recode(mat, "3=1; else=0")
data.recode[c(1,3,5)] = as.data.frame(mat)

# Let say for Q2 and Q4, the correct option is 2, thus recoded as 1, the rest as 0

mat = as.matrix(data.recode[c(2,4)])
mat = recode(mat, "2=1; else=0")
data.recode[c(2,4)] = as.data.frame(mat)

# Compare the original data with recoded ones

data
data.recode
Outputs:
> data
  Q1 Q2 Q3 Q4 Q5
1  2  2  2  2  1
2  2  1  2  1  2
3  2  2  2  2  2
4  3  3  3  1  1
5  2  2  2  2  1
> data.recode
  Q1 Q2 Q3 Q4 Q5
1  0  1  0  1  0
2  0  0  0  0  0
3  0  1  0  1  0
4  1  0  1  0  0
5  0  1  0  1  0

7.2.15

Hows many cows? in R

Our cows problem as posted before here can also be solved in R (obviously). Compare our results with the simulation results.
################################
# We are looking at 0.7 cutoff
# point for sensitivity
################################
# Using R
# Iterate over values
###############################

n = numeric(1)
n.low = 4 # lower limit of n
n.high = 7 # upper limit of n
pr = numeric(0)
p = 0.25 # prevalence of dss
n_ = numeric(0)
pr_ = numeric(0)

for(i in n.low:n.high) {
  n = i
  print(n)
  pr = 1 - pbinom(0, n, p)
  print(pr)
  n_[i] = n
  pr_[i] = pr
}
The results are as follows:
## [1] 4
## [1] 0.6835937
## [1] 5
## [1] 0.7626953
## [1] 6
## [1] 0.8220215
## [1] 7
## [1] 0.8665161
# sample size vs sensitivity
cbind(n=n_[n.low:n.high], Sensitivity=pr_[n.low:n.high])
##      n Sensitivity
## [1,] 4   0.6835937
## [2,] 5   0.7626953
## [3,] 6   0.8220215
## [4,] 7   0.8665161
# only 0.7 & above
cbind(n=n_[n.low:n.high], Sensitivity=pr_[n.low:n.high])[pr_[n.low:n.high] > 0.7, ]
##      n Sensitivity
## [1,] 5   0.7626953
## [2,] 6   0.8220215
## [3,] 7   0.8665161

How many cows? Simulation in R

In continuation of our cows problem as posted before here, we can also solved the problem by simulation approach. I would like to thank Mike Meredith for his contribution and guidance in producing the codes below. The code can be obtained here.
################################
# Try with different sized herds
################################
# We are looking at 0.7 cutoff
# point for sensitivity
################################
# Simulation approach
###############################

B = 10000
p = 0.25
n = numeric(1) # sample size
nCow = numeric(10) # number of cows per farm per iteration
nCow_ = numeric(10) # nCow, for tabulation with cbind
sensitivity = numeric(0)
infected = numeric(B) # real no of infected cows per farm
detected = numeric(B) # no of infected cows in sample of size n
n.low = 4 # lower limit of sample size
n.high = 7 # upper limit of sample size
sen_ = numeric(0) # mean sensitivity for all sizes of farm
n_ = numeric(0) # sample size
cutpt = 0.7 # Acceptable cutoff point

for(x in n.low:n.high) {
  n = x
  print("+++++++")
  print(n)
  for(c in 1:50) {
    nCow = 10*c
    for(i in 1:B) {
      cows = rbinom(nCow, 1, p)
      infected[i] = sum(cows)
      s = sample(cows, n)
      detected[i] = sum(s)
    }
    nCow_[c] = nCow
    detected.if.present = detected[infected > 0]
    sensitivity[c] = mean(detected.if.present > 0)
  }
  temp_s = mean(sensitivity)
  temp_c = cbind(sensitivity=sensitivity, ncow_=nCow_)
  print(temp_c)
  print(n)
  print(temp_s)
  sen_[x] = temp_s
  n_[x] = n
}
The results are as follows:
## [1] "+++++++"
## [1] 4
##       sensitivity ncow_
##  [1,]   0.7217031    10
##  [2,]   0.6771314    20
##  [3,]   0.6893379    30
##  [4,]   0.6805000    40
##  [5,]   0.6850000    50
##  [6,]   0.6796000    60
##  [7,]   0.6784000    70
##  [8,]   0.6769000    80
##  [9,]   0.6878000    90
## [10,]   0.6863000   100
## [11,]   0.6904000   110
## [12,]   0.6852000   120
## [13,]   0.6882000   130
## [14,]   0.6873000   140
## [15,]   0.6846000   150
## [16,]   0.6837000   160
## [17,]   0.6842000   170
## [18,]   0.6875000   180
## [19,]   0.6872000   190
## [20,]   0.6856000   200
## [21,]   0.6859000   210
## [22,]   0.6796000   220
## [23,]   0.6905000   230
## [24,]   0.6801000   240
## [25,]   0.6783000   250
## [26,]   0.6769000   260
## [27,]   0.6793000   270
## [28,]   0.6892000   280
## [29,]   0.6802000   290
## [30,]   0.6831000   300
## [31,]   0.6772000   310
## [32,]   0.6891000   320
## [33,]   0.6845000   330
## [34,]   0.6778000   340
## [35,]   0.6789000   350
## [36,]   0.6863000   360
## [37,]   0.6862000   370
## [38,]   0.6796000   380
## [39,]   0.6806000   390
## [40,]   0.6878000   400
## [41,]   0.6836000   410
## [42,]   0.6827000   420
## [43,]   0.6761000   430
## [44,]   0.6823000   440
## [45,]   0.6847000   450
## [46,]   0.6860000   460
## [47,]   0.6818000   470
## [48,]   0.6789000   480
## [49,]   0.6813000   490
## [50,]   0.6776000   500
## [1] 4
## [1] 0.6838534
## [1] "+++++++"
## [1] 5
##       sensitivity ncow_
##  [1,]   0.8058705    10
##  [2,]   0.7680433    20
##  [3,]   0.7726545    30
##  [4,]   0.7627000    40
##  [5,]   0.7625000    50
##  [6,]   0.7658000    60
##  [7,]   0.7642000    70
##  [8,]   0.7585000    80
##  [9,]   0.7631000    90
## [10,]   0.7666000   100
## [11,]   0.7630000   110
## [12,]   0.7558000   120
## [13,]   0.7634000   130
## [14,]   0.7595000   140
## [15,]   0.7623000   150
## [16,]   0.7590000   160
## [17,]   0.7623000   170
## [18,]   0.7619000   180
## [19,]   0.7614000   190
## [20,]   0.7583000   200
## [21,]   0.7568000   210
## [22,]   0.7626000   220
## [23,]   0.7618000   230
## [24,]   0.7653000   240
## [25,]   0.7633000   250
## [26,]   0.7595000   260
## [27,]   0.7697000   270
## [28,]   0.7675000   280
## [29,]   0.7623000   290
## [30,]   0.7653000   300
## [31,]   0.7592000   310
## [32,]   0.7574000   320
## [33,]   0.7668000   330
## [34,]   0.7678000   340
## [35,]   0.7595000   350
## [36,]   0.7654000   360
## [37,]   0.7666000   370
## [38,]   0.7665000   380
## [39,]   0.7648000   390
## [40,]   0.7551000   400
## [41,]   0.7713000   410
## [42,]   0.7730000   420
## [43,]   0.7653000   430
## [44,]   0.7577000   440
## [45,]   0.7627000   450
## [46,]   0.7577000   460
## [47,]   0.7735000   470
## [48,]   0.7675000   480
## [49,]   0.7593000   490
## [50,]   0.7601000   500
## [1] 5
## [1] 0.7641634
## [1] "+++++++"
## [1] 6
##       sensitivity ncow_
##  [1,]   0.8661619    10
##  [2,]   0.8254589    20
##  [3,]   0.8242000    30
##  [4,]   0.8219822    40
##  [5,]   0.8192000    50
##  [6,]   0.8272000    60
##  [7,]   0.8175000    70
##  [8,]   0.8182000    80
##  [9,]   0.8251000    90
## [10,]   0.8166000   100
## [11,]   0.8256000   110
## [12,]   0.8178000   120
## [13,]   0.8246000   130
## [14,]   0.8280000   140
## [15,]   0.8172000   150
## [16,]   0.8160000   160
## [17,]   0.8235000   170
## [18,]   0.8167000   180
## [19,]   0.8270000   190
## [20,]   0.8242000   200
## [21,]   0.8208000   210
## [22,]   0.8242000   220
## [23,]   0.8173000   230
## [24,]   0.8203000   240
## [25,]   0.8234000   250
## [26,]   0.8339000   260
## [27,]   0.8180000   270
## [28,]   0.8237000   280
## [29,]   0.8179000   290
## [30,]   0.8208000   300
## [31,]   0.8206000   310
## [32,]   0.8231000   320
## [33,]   0.8174000   330
## [34,]   0.8227000   340
## [35,]   0.8188000   350
## [36,]   0.8241000   360
## [37,]   0.8225000   370
## [38,]   0.8271000   380
## [39,]   0.8225000   390
## [40,]   0.8243000   400
## [41,]   0.8232000   410
## [42,]   0.8219000   420
## [43,]   0.8199000   430
## [44,]   0.8138000   440
## [45,]   0.8287000   450
## [46,]   0.8236000   460
## [47,]   0.8193000   470
## [48,]   0.8209000   480
## [49,]   0.8304000   490
## [50,]   0.8227000   500
## [1] 6
## [1] 0.8230001
## [1] "+++++++"
## [1] 7
##       sensitivity ncow_
##  [1,]   0.9164819    10
##  [2,]   0.8687719    20
##  [3,]   0.8698870    30
##  [4,]   0.8692000    40
##  [5,]   0.8674000    50
##  [6,]   0.8717000    60
##  [7,]   0.8579000    70
##  [8,]   0.8659000    80
##  [9,]   0.8610000    90
## [10,]   0.8633000   100
## [11,]   0.8686000   110
## [12,]   0.8620000   120
## [13,]   0.8652000   130
## [14,]   0.8633000   140
## [15,]   0.8657000   150
## [16,]   0.8637000   160
## [17,]   0.8642000   170
## [18,]   0.8618000   180
## [19,]   0.8628000   190
## [20,]   0.8692000   200
## [21,]   0.8644000   210
## [22,]   0.8641000   220
## [23,]   0.8671000   230
## [24,]   0.8625000   240
## [25,]   0.8625000   250
## [26,]   0.8696000   260
## [27,]   0.8689000   270
## [28,]   0.8565000   280
## [29,]   0.8703000   290
## [30,]   0.8606000   300
## [31,]   0.8695000   310
## [32,]   0.8664000   320
## [33,]   0.8696000   330
## [34,]   0.8642000   340
## [35,]   0.8723000   350
## [36,]   0.8670000   360
## [37,]   0.8700000   370
## [38,]   0.8662000   380
## [39,]   0.8667000   390
## [40,]   0.8642000   400
## [41,]   0.8627000   410
## [42,]   0.8686000   420
## [43,]   0.8669000   430
## [44,]   0.8715000   440
## [45,]   0.8695000   450
## [46,]   0.8706000   460
## [47,]   0.8658000   470
## [48,]   0.8619000   480
## [49,]   0.8660000   490
## [50,]   0.8668000   500
## [1] 7
## [1] 0.8670188
# sample size vs sensitivity
cbind(n=n_[n.low:n.high], Sensitivity=sen_[n.low:n.high])
##      n Sensitivity
## [1,] 4   0.6838534
## [2,] 5   0.7641634
## [3,] 6   0.8230001
## [4,] 7   0.8670188
# only > cutoff point
cbind(n=n_[n.low:n.high], Sensitivity=sen_[n.low:n.high])[sen_[n.low:n.high] > cutpt, ]
##      n Sensitivity
## [1,] 5   0.7641634
## [2,] 6   0.8230001
## [3,] 7   0.8670188

8.12.14

Setting up hot corner + expose effect in XFCE on Linux Mint

Setting up hot corner + expose effect in XFCE was already discussed somewhere else here http://forum.xfce.org/viewtopic.php?id=8105, so I could not claim I'm first in that respect.

However I could not find step-by-step solution for Linux Mint, thus I'm sharing the steps here.

1. Install brightside (via Synatic Package Manager or apt-get).

2. Install skippy-xd (https://code.google.com/p/skippy-xd/). You'll need to compile from source for this one.

- Download the source (https://github.com/richardgv/skippy-xd)
- Install required development packages (https://code.google.com/p/skippy-xd/wiki/Installation)

sudo apt-get install libimlib2-dev libfontconfig1-dev libfreetype6-dev libx11-dev libxext-dev libxft-dev libxrender-dev zlib1g-dev libxinerama-dev libxcomposite-dev libxdamage-dev libxfixes-dev libxmu-dev

- Extract the source (unzip).
- cd into the source folder, then

make

followed by

sudo make install

which would install skippy-xd into /usr/bin

3. Run brightside-properties, and set Custom action... > On entering region to skippy-xd.

4. To wrap things up, add brightside to your list of Application Autostart.

7.12.14

How many cows are required per farm so as the probability of getting at least one diseased cow is 70%?

Today I stumbled upon a problem during a student presentation.

How many cows are required per farm so as the probability of getting at least one diseased cow is 70% (acceptable level to us, using area under ROC cutoff point), given that the disease prevalence is 25%?

My solution for this problem is by using binomial distribution. The following is the formula for the distribution,
In our context,

x, number of success
n, sample size
p, prevalence

and to solve our problem

p(1 or more) = 1 - p(0)

Using spreadsheet, we can find the value of n iteratively (I prefer LibreOffice Calc). Just key in the following function (or just put up the formula above)

=1 - BINOMDIST(n, x, p, 1)

thus in our context

=1 - BINOMDIST(n, 0, 0.25, 1)

after playing around with n, I found

for n = 4, p(1 or more) = 1 - 0.316 = 0.684

for n = 5, p(1 or more) = 1 - 0.237 = 0.763

I'd go for 5 cows...

4.12.14

Installing STATA 11 on Linux Mint

I could not manage to run the provided install script to install STATA 11/SE on my Linux Mint LMDE 201403 system, so I went through the script itself and did everything manually.

The steps were as follows:

1. Copy unix/linux.64 contents to local disk, let say to ~/Downloads/stata11. The files are

ls

ado.taz  base.taz  bins.taz  install  setrwxp  utilities.taz

2. Create installation folder /usr/local/stata11

sudo /usr/local/stata11

3. Create symlink to the folder

sudo ln -s /usr/local/stata11 /usr/local/stata

4. Copy all files to /usr/local/stata11

sudo cp ~/Downloads/stata11/* /usr/local/stata11

5. Uncompress all .taz files

cd /usr/local/stata11
sudo uncompress *.taz

would result in:

ls

ado.tar  base.tar  bins.tar  install  setrwxp  utilities.tar

6. Untar all .tar files

tar -xof ado.tar
tar -xof base.tar
tar -xof bins.tar
tar -xof utilities.tar

would result in:

ls

ado     stata  stata.lic  stata_pdf utilities  xstata-se
auto.dta     stata11.png  stata-mp   stata-se xstata
isstata.110  stata_br  stata.msg  stinit xstata-mp

in addition to all our installation files listed above.

7. Remove all .tar files

rm *.tar

8. Set permission using setrwxp script

sudo ./setrwxp

9. Once done, just remove install and setrwxp from the folder

sudo rm install setrwxp

10. Start license authorization

sudo ./stinit

11. Once done with license authorization, try running STATA 11/SE GUI version.

./xstata-se

You will get a number of errors. The following are the errors with solution:

./xstata-se: error while loading shared libraries: libtiff.so.3: cannot open shared object file: No such file or directory

Check whether you have libtiff4. If not install using Synaptic Package Manager or apt-get. Then make symlink to libtiff.so.4

sudo ln -s x86_64-linux-gnu/libtiff.so.4 libtiff.so.3

Followed by

./xstata-se: error while loading shared libraries: libgtksourceview-1.0.so.0: cannot open shared object file: No such file or directory

Install libgtksourceview2.0-0 using Synaptic Package Manager or apt-get

sudo ln -s libgtksourceview-2.0.so.0 libgtksourceview-1.0.so.0

12. Run STATA again

./xstata-se

13. To be able to run directly from Terminal without going to /usr/local/stata11 folder, set your path in ~/.profile

gedit ~/.profile

and edit this line

PATH="$HOME/bin:$PATH"

to

PATH="$HOME/bin:$PATH:/usr/local/stata"

Log out and log in again to apply the change.

8.10.14

Convert variable name list to comma separated variable names

This trick is quite useful for SPSS. Let say I have 50 variables that I want to sum up the values by using SUM(var1,var2, ...) in Transform > Compute variable..., converting the list:
var1
var2
...
var50
to
var1,var2,...,var50
then
SUM(var1,var2,...var50)
is quite tedious.

I found this nice command line from http://www.shellhacks.com/ (in Linux, of course). Save your list in a file, say temp_file, then
echo $(awk 'NR > 1{print line", "}{line=$0;}END{print $0" "}' temp_file) > new_file
Your comma separated variable names are ready in new_file. Thanks, penguin.