The data editing and formatting phase is one of the most important steps in fitting a structural equation model (SEM), whether it is covariance based as in SPSS AMOS and LISREL or a partial least squares model as in smartPLS and warpPLS. Often you receive data from surveys with a plenitude of different data types: numeric, integer, ordinal, factor, indicator, text comments. One of the first steps is to decide what to include in an analysis. Should we transform data and perform analysis of marginal distributions before running a CB-SEM, study relations between ordinal variables with LISREL or just clean up data and do a robust exploratory PLS-SEM? We are going to scrap unused variables and perform tasks such as imputation or calculation of indicators of factor levels beforehand, split data into several sheets or files with appropriate keys or identifiers. Then comes the formatting issue. Lots of survey data come as SPSS sav files or in the SAS sas7dbat file format, and if your analyzing clinical data you might encounter Stata’s dta format. You can export to other formats from these commercial packages or use a program like Stat/Transfer. Ordinary ascii based formats might do, when import of csv files straight from Excel or OpenOffice fails. Using a simple file format also enables you to run several types of covariance based models or partial least squares regressions. Then why not use simple txt files?
Here is an example of R-base code which will convert an Excel csv file to a txt file. Ready for analysis in a software package such as LISREL.
#Print current directory
getwd()
#Change to another directory through interactive dialog window
setwd(choose.dir())
dirPath
#Choose the CSV file
filePath
filePath
#Check options for read.csv (US/European comma format etc)
CSVdata
#View data
View(CSVdata)
#Save data in current directory
txtPath
write.table(CSVdata, txtPath, sep=”\t”, eol=”\r\n”, row.names=FALSE)
Comments