top of page
Search

Creating dummy variables in R

Randy Zwitch has a blog entry on creation of dummy variables from factor levels.

example  as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
names(example)  "strcol"

#For every unique value in the string column, create a new 1/0 column
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
for(level in unique(example$strcol)){
 example[paste("dummy", level, sep = "_")]  ifelse(example$strcol == level, 1, 0)
}
view raw

Often you encounter special characters in which case you can use gsub and regular expressions

example  as.data.frame(c("AÆ", "AÆ", "B", "FÅ", "C", "G", "C", "D", "E", "FÅ"))
names(example)  "strcol"

#For every unique value in the string column, create a new 1/0 column
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
for(level in unique(example$strcol)){
 example[gsub('[^a-zA-Z0-9_],"",paste("dummy", level, sep = "_"),fixed=FALSE)]  ifelse(example$strcol == level, 1, 0)
} 

You may also use levels instead of unique in conjunction with subsetting, e.g. levels(example$strcol)[-1] to create dummy variables mapping reference level to baseline/intercept in your regression model. Equation formulas are generated with the paste function: paste(“somevar ~”,paste(names(dataframe),sep=””,collapse=”+”))

1 view0 comments

Recent Posts

See All

dplyr or base R

dplyr and tidyverse are convenient frameworks for data management and technical analytic programming. With more than 25 years of R experience, I have a tendency to analyze programmatic problems before

bottom of page