Sunday, November 18, 2012

Persistant Default Library

I've been switching back and forth between libraries I define and the work library, mostly because I don't like putting "mylib." in front of every library name. But today I learned that if you define the USER library, it will use that as a permanent work library. So for example the following code:
libname USER "D:/SAS/project1";

DATA example_database;

SET other_database_in_D_SAS_project1;

RUN;
Will create example_database.sas7db in the D:/SAS/project1 folder, and when you reboot SAS and run the libname user command again (or better yet, put it in your autoexec.sas file), all your work files will be there waiting. It's a time saver if you are working on a project and have to shutdown SAS.

Friday, March 16, 2012

Data Step Array (Macro) Variables

I want some thing simple. I have a data file with many fields with sequential names, and I want to reorganize them. It's all dead simple regex logic: var12 becomes var2 in row 1, var22 becomes var2 in row 2. Two minutes in python. In SAS...

So here's my method of making an Array Macro Variable (nothing native to the best of my knowledge) on which I will then use numbered indexes to massage the table later:

data _null_;
    i = 1;
    DO name = "bear","pig","velociraptor";
        ii = left(put(i,2.));
        call symput('variable_name'||ii,name);
        i+1;
        put name;
    END;

The output of that is the following

bear
pic
velo

Oh yeah, that happened. SAS guessed the length of name for the loop at 4 characters, then truncated velociraptor.

The solution was to use the length name $12:

data _null_;
    i = 1;
    length name $12;
    DO name = "bear","pig","velociraptor";
        ii = left(put(i,2.));
        call symput('variable_name'||ii,name);
        i+1;
        put name;
    END;

Monday, March 12, 2012

Cleaning SDC Downloaded Data

I wrote some python code to simply clean up SDC downloaded fixed width data. It uses easygui, a python library to give it a little GUI interface, and I'll be the first to admit it's pretty crappy. But it works and made my life easier, so I'll share.

Google Doc Link

Sunday, March 11, 2012

Compustat Codes and Field (Variable) Names

Maybe this is common knowledge, but this page made life infinitely easier to replicate old papers:

http://www.crsp.chicagobooth.edu/documentation/product/ccm/cross/annual_data.html

Now all I need is free time to turn that into a tool for automatic SQL query generation.

Also while I'm at it I may as well plug a program I use constantly. WinSplit Revolution allows you to resize windows with key commands (I use control alt numpad). So setting up my desktop to look like below takes a few keyboard presses. If you use multiple monitors it's a life saver. Because sharing is caring.


Friday, March 9, 2012

Searching Google Scholar

I've been doing non-SAS research, and got tired of messing around with Google Scholar searches, so I made the following chrome search tag:
http://scholar.google.com/scholar?&num=100&as_subj=bus&as_q=%s
That searches within Business and Finance journals only which is quite convenient in my opinion. To add to Chrome (similar in firefox), go to the URL in the following screen shot and enter the URL quoted above where it says. I put the letter s for the keyword, so now when I want to do a google scholar search I type CNTRL-L, then type s and hit space. Anything typed after the space will be the search term.

Tuesday, February 28, 2012

Remove Overlapping Windows

A little bit of SAS code to remove dates that overlap by some amount (defined in a Macro Variable). Example from an event study with dates from SDC.

First off sort the data, removing any duplicates. In this case duplicates is determined just by the variables sorted on, so permco, permno, and date. This removes the problem of having more than one event on the same day.

Create a temporary table with lag and difference fields. The lag field is more for debugging as the difference field is all we care about. If the event is the first event for the company, leave the fields blank.

Create the output table and only keep events where the event date is either blank (first event) or if the time between events is greater than the window.

------ Code ------

mydata: the library where my crsp and cleaned sdc files are
sdc_permno: database with SDC data and permno's instead of cusips (already linked SDC and CRSP)
sdc_final: the output sdc data file with the event dates (non overlapping) and permnos
&beta_pre_length: The macro variable with the length of my pre window for calculating the betas in the market return model
&study_start_date: The macro bariable with the start date

/* SDC_FINAL: Clean up data and remove all SEOs with another SEO within window_prev before. */
PROC SORT DATA=mydata.sdc_permno OUT=tmp_sdc_permno NODUPKEY;
    BY permco permno fdate;
RUN;
DATA tmp_sdc_lag;
    SET tmp_sdc_permno;
    BY permco permno fdate idate shares shares_p shares_s;
    lag_date = LAG(fdate);
    dif_date = DIF(fdate);
    IF FIRST.permco THEN DO;
        lag_date = .;
        dif_date = .;
    END;
RUN;
PROC SQL;
    CREATE TABLE mydata.sdc_final AS
    SELECT monotonic() as count, permno, fdate, shares, shares_p, shares_s
    FROM tmp_sdc_lag
    WHERE fdate > &study_start_date and (dif_date EQ . OR dif_date > &beta_pre_len)
    ;
    DROP TABLE tmp_sdc_lag;
    DROP TABLE tmp_sdc_permno;
QUIT;

Monday, February 27, 2012

sas7bndx: Story of SAS Indexes

Last night I ran code to extract companies from the CRSP Daily Stock file. It ran all night and was going to take 8 more hours today. This is unacceptable. The fix was relatively simple, here's how it came about.

First I got the file original CRSP datafile (2 hour download):

libname mydata = "D:\SAS";
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=&username password=&password;
  rsubmit;
    libname crspa '/wrds/crsp/sasdata/a_stock';
    PROC DOWNLOAD DATA= crspa.dsf
      OUT=mydata.crsp_dsf;
      WHERE date > '01JAN1995'd;
    RUN;
  endrsubmit;
signoff;

But extracting permnos from that file took 1 minute for each PROC SQL statement. Everyone said something was wrong, no one had ideas what. So I poked around a WRDS ssh session and noticed all these .sas7bndx files. NDX sounds like index I cleverly thought to myself, maybe SAS has an index file to make looking up permnos faster! Lo and behold, it does. So then I ran this code:

PROC DATASETS LIBRARY=mydata;
  MODIFY crsp_dsf;
  INDEX CREATE permno;
RUN;

And now my PROC SQL queries execute in milliseconds. Day seized.

PROC SQL Dates and Macro Variables

Here's a little quiz: What is the name of a 'column' in a database?
a) column
b) field
c) anything but
d) variable
If you answered D, go to wikipedia and find 'variable' on the SQL page. Go, I'll wait. So after wasting hours searching for variables, I learn that SAS calls columns variables, and the things that represent numbers for flexibility (in algebra you call them variables) are MACRO variables. Good luck googling help for macro variables and avoiding all the pages that discuss both database variables and macros.

I digress. Here's what I learned. The following code:
PROC SQL NOPRINT;
    SELECT permno,fdate format=mmddyy10.
    INTO :permnovar, :eventdate
    FROM libstore.sdc_events
    WHERE count = &i;

    SELECT *
    FROM libstore.crsp_dsf
    WHERE date = &eventdate;
QUIT;
Will return null. Oh the data is in there, if you replace &eventdate; by '01JAN2001'd you will get data. But SAS can't pull a date from one table and use it in the WHERE query in another table. MACRO VARIABLES just don't work that way. Duh.

So the solution? Everything is text. That's my new mantra. Obviously the solution is to use the following code:
PROC SQL NOPRINT;
    SELECT permno,fdate format=date9.
    INTO :permnovar, :eventdate
    FROM libstore.sdc_events
    WHERE count = &i;
QUIT;

%let eventdate_cleaned = "&eventdate"d;

PROC SQL NOPRINT;
    SELECT *
    FROM libstore.crsp_dsf
    WHERE date = &eventdate_cleaned;
QUIT;
The other thing to note is that mmddyy10 is NOT the format to use. Even though it comes out of SDC like that, and CRSP uses something else, PROC SQL uses date9 in its comparison. In fact '01/01/2000'd throws an error in PROC SQL. Don't ask, just accept it.

So the moral of this story is make everything a string. Always. Also use date9 in PROC SQL. And '&eventdate' doesn't work, only double quotes evaluates the macro variable. Ugh, macro variable angers me.