Tuesday, February 16, 2016

Fama French Industries

I'm back in Python and needing to get FF12 from sic codes. So I wrote a little script to download the definitions from French's website and make a Pandas DataFrame that allows for merging. Thought I would share:


Edit: An alternative is to use pandas_datareader.famafrench


from io import BytesIO
from zipfile import ZipFile
import requests
def download_ffind_zip(ind_num):
zip_url = ('http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Siccodes{}.zip'
.format(ind_num))
data = requests.get(zip_url)
zipfile = ZipFile(BytesIO(data.content))
return zipfile.open('Siccodes{}.txt'.format(ind_num)).read().decode()
def get_ffind_df(ind_num):
if ind_num not in [5, 10, 12, 17, 30, 38, 48, 49]:
raise ValueError('Industry number must be one of {} not {}.'
.format([5, 10, 12, 17, 30, 38, 48, 49], ind_num))
re_nameline = re.compile(r'^\s*(?P<ff{0}>\d\d?)\s+(?P<ff{0}_name>[a-z]+)\s+(?P<detail>.+)\s*$'
.format(ind_num), re.I|re.M)
re_rangeline = re.compile(r'^\s*(?P<sicfrom>\d{3,4})-(?P<sicto>\d{3,4})(?P<notes>\s+.+)?\s*$', re.I|re.M)
data = download_ffind_zip(ind_num)
# init to 'other'
try:
current_ind = [_.groupdict() for _ in re_nameline.finditer(data)
if _.group('ff{0}_name'.format(ind_num)).lower() == 'other'][0]
except IndexError:
current_ind = {'ff{0}'.format(ind_num):ind_num,
'ff{0}_name'.format(ind_num):'Other',
'detail':''}
vals = {i:current_ind for i in range(10000)}
for line in data.split('\n'):
match = re_nameline.search(line.strip())
if match:
current_ind = match.groupdict()
continue
match = re_rangeline.search(line.strip())
if not match:
continue
match = match.groupdict()
sicfrom,sicto = int(match['sicfrom']), int(match['sicto'])
for i in range(sicfrom, sicto+1):
vals[i] = current_ind
df = pd.DataFrame.from_dict(vals, orient='index')
df.index.name = 'sic'
df['ff{0}'.format(ind_num)] = df['ff{0}'.format(ind_num)].astype(int)
return df.reset_index()

Monday, February 1, 2016

SAS on XUbuntu

For a long time I only had SAS running in -nodms mode on the latest XUbuntu, my desktop's OS. Today I finally figured it out, and wanted to share just in case anyone else has had this problem.

First off, I'm running Xubuntu Wily (15.10), and SAS 9.4. The installation didn't work in graphical mode, because when I sudo su sas, then ./sasdm.sh, it complains: Can't connect to X11 window server using ':0' as the value of the DISPLAY variable. Whatever, ./sasdm.sh -console works. Anyway, the first problem when launching SAS is that it complained about the SASHELP Portable Registry being corrupted. Turns out it didn't exist at all. So I had to copy regstry.sas7bitm from a working version of SAS 9.3 (yeah, it worked across versions somehow) to my local sascfg directory (/opt/SASHome/SASFoundation/9.4/nls/en/sascfg/). Once that was there, I started getting errors about missing libraries. First libXp.so.6, which doesn't exist on the Wily repo any more, and must be downloaded from the Vivid repo here:

http://packages.ubuntu.com/vivid/amd64/libxp6/download

And secondly libjpeg.so.62, which can be installed with sudo apt-get install libjpeg62-dev libjpeg62. Finally once that was done, SAS loaded in dms mode. It also now runs in X11 mode forwarded over ssh now too.