top of page
Search

Dummy variables in SAS

It is difficult to create indicator or dummy variables from factor variables in SAS programming language.

The macro below assumes an integer factor variable (factor levels with values 0, 1, 2, 3, …) and no missing values or formats. Here is a way to ensure this simple assumption is fulfilled. Add a few lines to an appropriate data step:

if missing(factor) then factor=0; 
format factor;

To convert a non-numeric factor variable to an integer factor variable create a format and construct an integer factor variable:

proc format library=WORK;
value $factor
'firstlevel'='1'
'secondlevel'='2'
...
'finallevel'='n'
;
run;

data newdataset;
set olddataset;
newfactor=input(put(factor,$factor.),8);
run;

You are now able to define and run the macro.

/* Macro to generate dummy variables. */

%MACRO GETCON; 
  %DO I = 1 %TO &N; 
  %IF &&M&I = 0 %THEN %GOTO OUT; 
    IF &factor = &&M&I THEN &factor&I = 1; 
    ELSE &factor&I = 0; 
  %OUT: %END; 
%MEND GETCON; 

%macro indicator(factor,dataset);
PROC SORT DATA=&dataset OUT=UNIQUE NODUPKEY; 
  BY &factor; 
RUN; 
/* Assign the largest value of CON to the macro variable N. */ 
DATA _NULL_; 
  SET UNIQUE END=LAST; 
  IF LAST THEN CALL SYMPUT('N', PUT(&factor, 8.)); 
RUN;
/* Assign the initial value 0 to all macro variables. */  
DATA _NULL_; 
  DO I = 1 TO &N; 
    CALL SYMPUT('M'||LEFT(PUT(I, 8.)), '0'); 
  END; 
RUN; 
/* Assign the value of CON to the corresponding macro variable. */ 
DATA _NULL_; 
  SET UNIQUE; 
  CALL SYMPUT('M'||LEFT(PUT(&factor, 8.)), PUT(&factor, 8.));  
RUN; 
/* Create dummy variables. */ 
DATA &dataset; 
  SET &dataset; 
  %GETCON 
RUN; 
%mend;

*Example data;
DATA TESTDATA; 
  INPUT CON; 
  CARDS; 
    1 
    7 
   34 
  115 
    7 
    1 
  487 
   34 
  506 
   57 
    7 
   43 
  ; 
RUN; 

*Test run;
%indicator(con,testdata);

The macro above is based on an example in SUGI paper 052-29.

2 views0 comments

Recent Posts

See All

dplyr or base R

dplyr and tidyverse are convenient frameworks for data management and technical analytic programming. With more than 25 years of R experience, I have a tendency to analyze programmatic problems before

bottom of page