top of page

Preparing data. A basic example using R-base

The data editing and formatting phase is one of the most important steps in fitting a structural equation model (SEM), whether it is covariance based as in SPSS AMOS and LISREL or a partial least squares model as in smartPLS and warpPLS. Often you receive data from surveys with a plenitude of different data types: numeric, integer, ordinal, factor, indicator, text comments. One of the first steps is to decide what to include in an analysis. Should we transform data and perform analysis of marginal distributions before running a CB-SEM, study relations between ordinal variables with LISREL or just clean up data and do a robust exploratory PLS-SEM? We are going to scrap unused variables and perform tasks such as imputation or calculation of indicators of factor levels beforehand, split data into several sheets or files with appropriate keys or identifiers. Then comes the formatting issue. Lots of survey data come as  SPSS sav files or in the SAS sas7dbat file format, and if your analyzing clinical data you might encounter Stata’s dta format. You can export to other formats from these commercial packages or use a program like Stat/Transfer. Ordinary ascii based formats might do, when import of csv files straight from Excel or OpenOffice fails. Using a simple file format also enables you to run several types of covariance based models or partial least squares regressions. Then why not use simple txt files?

Here is an example of R-base code which will convert an Excel csv file to a txt file. Ready for analysis in a software package such as LISREL.

#Print current directory getwd() #Change to another directory through interactive dialog window setwd(choose.dir()) dirPath #Choose the CSV file filePath filePath #Check options for read.csv (US/European comma format etc) CSVdata #View data View(CSVdata) #Save data in current directory txtPath write.table(CSVdata, txtPath, sep=”\t”, eol=”\r\n”, row.names=FALSE)

2 views0 comments

Recent Posts

See All

dplyr or base R

dplyr and tidyverse are convenient frameworks for data management and technical analytic programming. With more than 25 years of R experience, I have a tendency to analyze programmatic problems before


bottom of page