As a Fellow for the Program for Advanced Research in the Social Sciences, I have the opportunity to teach students, faculty, and staff at Duke how to develop research designs, chooose quantitative methods, and implement those methods with statistical software.
Recently, a student asked me for help in calculating average score scales from multiple survey items. Since this provided a good opportunity to teach the student that there are multiple approaches to any programming problem and that each approach faces different trade-offs in terms of computational cost, verbosity, generality, and the opportunity for making mistakes, I put together a short gist I thought I’d share.
RStudio is a popular, well-supported IDE for R programmers. While a number of text editors with steep learning curves and direct interaction with command line may offer more power and flexibility, RStudio facilitates completion of common tasks with minimal investment.
One reason to use RStudio is the ease with which researchers can embrace literate programming to create dynamic documents. Dynamic documents are attractive because they promise reductions in human error and time costs for researchers.
Many data analysts often wish to examine subsets of data or otherwise manipulate data using indicators of data missingness. Luckily, R features a number of different ways of designating a value as missing. Unluckily, some of the interactions with popular functions are not always intuitive and this can produce unintended results.
I wrote a demonstration of this awhile back. The below showcases behaviors of missing values many R programmers likely expect and also some surprising results.