A python script to manage regulatory data of U.S. Bank Holding Companies
The Chicago Fed offers great data on bank holding companies, click here for the full page, and here for the download page. The data is big, deep, large. The number of variables is stunning (more than 3,000). The format is a flat text file with a typical delimiter: a caret (^) separates the data fields.
How to manage this data efficiently? The Chicago Fed offers a SAS script to manage the data, but SAS is costly software written by consultants who approach software as if it were bouillabaisse.
Alternatively, one can download the data and then import it into Stata, however, the processing is slow and cumbersome.
Python, however, offers a fine solution. I wrote a script that merges the BHC data into one data set.
You can choose to output the data to CSV or STATA format. You can also send the data to your MySql server.
You should install Python and add the full Scipy pack. If you are not familiar with python, read Python for Kids and Python for Data Analysis. If you want to download the bank holding company data quickly, you need curl.
Note, this all works like a charm under Linux (Mint 17). Linux just works finer if you want to use large data and free software.
If you want to give it a go, visit my Git page for more information: github.com/blucap/BankHoldingCompanyData