Introduction

This project was created by three Olin Students for the Spring 2016 Data Science course taught by Paul Ruvolo. We decided to analyze Congress over time because of the important role Congress plays in legislative functions, and understanding how it has chnaged. We want our data to be accessible to everyone because statistics indicate fewer than one in five Americans approve of Congress. We hope that more information can lead to more interest and informed decisions about voting for Congresspeople.

Data

All of our data came from GovTrack, a site that provides free analytics and data on the US Congress. We downloaded over 100GB of data from their free servers, in the form of individual JSON files pertaining to a single vote within a session of a legislative branch. We used Python to parse the data and compile it into a single CSV file for each session of Congress.

All the data we used for this project can be found in our GitHub repo. It is licensed free and open source under the GNU General Public License, and we encourage visitors to use the data as they see fit to continue the spirit of the project.

The data exists in comma-separated value (CSV) format. We have 226 CSV files, one for each branch of Congress for each of its 113 sessions. The 114th session of Congress was in term at the time of creation of this project, so we chose not to include it for incomplete data. The files are named [session][branch].csv where [session] is the session number, 1-113, and [branch] is the branch of Congress, either "house" or "senate". Additionally, we created a legislators.csv file which contains a row for each legislator and columns of information about them.

Within the CSV files, each row pertains to a vote taken during that session. These are reduced to votes we decided were important and relevant to this project: votes on bills, amendments, or passages. Other types of votes, such as nominations, were not included because we decided they were not relevant to our goals.

The columns in the CSV files are as follows:

Legislator IDs These cross-reference to the "vote_id" column of the legislators.csv, and is a unique ID identifying that legislator. The contents of this column are the legislator's vote: either "Yea", "Nay", "Not Voting", or NaN if no data exists.
date The date of the vote
isAmendment A boolean value indicating whether the vote is an amendment
requires The votes required to pass - most are simple majority, some are two-thirds
result Text indicating whether the vote passed
title Title of the bill in numerical format
Subjects These columns are all subjects of bills present in that session, and the contents are boolean values indicating whether a bill pertained to a subject. This data only exists in the 93rd session and later.
billTitle A short title, if it exists
committee The bill's committe of origin, if applicable, else NaN
officialTitle The official title of the bill
sponsor The bill's sponsor, if known

Code

The data processing for this project was done in Python, primarily using the pandas package. The frontend visualization was done using d3 with JavaScript.

We also wish to include some notes about how the visualizations were created:

Party Evolution

This graphic shows the changing political landscape of Congress through political party membership. The seats of the combined house and senate are displayed in an arc mimicing the seating at the State of the Union address. The seats are grouped and colored according to party, and mousing over a seat will tell the viewer specific information about that legislator. All non-major parties are colored grey, due to the overwhelming number of non-major parties. We defined a non-major party for a session of Congress as one whose party membership does not exceed 5% of the total Congressional session.

Partisanship and Party Agreement

This visual shows how party agreement and unification changes with time. The circles represent a party, and their fill shows how unified the party was during that session of Congress. We measured party unification by considering how every member of the party votes on all votes in that session. The unification is the percentage of times in which 90% or more of the party voted the same way. Additionally, the overlap of the circles represents agreement between parties. We measured this by creating the "average" member of each party, who votes with the majority of their party on each vote. The party agreement is the number of times the average member of the parties would vote the same way. This graphic focuses on the two largest parties in any given session, but includes the unification metric of all other major parties present in that session, where the definition of a major party is given above.

Anne LoVerso

Anne is a junior at Olin College, studying engineering with a concentration in computing. She considers herself one with the data. Although it appears that we carefully processed our data in Python, it really appeared spontaneously after Anne concentrated really hard for a few minutes thinking about Congress.

Pratool Gadtaula

Pratool is a junior at Olin College studying Electrical and Computer Engineering. He takes pride in the cleanliness of his glasses. When they are clean, he feels that he is ready to take on the world and its challenges. This is evidenced by the hard work on this project.

Zoher Ghadyali

Zoher is a junior at Olin College, pursuing an Electrical & Computer Engineering degree. With an interest in graphic design, data visualization, and government, he hopes you enjoy our project and vote to abolish Congress and establish anarchy. Stay woke.