Monday, February 27, 2012

sas7bndx: Story of SAS Indexes

Last night I ran code to extract companies from the CRSP Daily Stock file. It ran all night and was going to take 8 more hours today. This is unacceptable. The fix was relatively simple, here's how it came about.

First I got the file original CRSP datafile (2 hour download):

libname mydata = "D:\SAS";
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=&username password=&password;
  rsubmit;
    libname crspa '/wrds/crsp/sasdata/a_stock';
    PROC DOWNLOAD DATA= crspa.dsf
      OUT=mydata.crsp_dsf;
      WHERE date > '01JAN1995'd;
    RUN;
  endrsubmit;
signoff;

But extracting permnos from that file took 1 minute for each PROC SQL statement. Everyone said something was wrong, no one had ideas what. So I poked around a WRDS ssh session and noticed all these .sas7bndx files. NDX sounds like index I cleverly thought to myself, maybe SAS has an index file to make looking up permnos faster! Lo and behold, it does. So then I ran this code:

PROC DATASETS LIBRARY=mydata;
  MODIFY crsp_dsf;
  INDEX CREATE permno;
RUN;

And now my PROC SQL queries execute in milliseconds. Day seized.

No comments:

Post a Comment