Both datasets used on this website were found on the City of Chicago Data Portal, which is linked below. This publicly available data website contains thousands of datasets about education, environment, health and safety, infrastructure, and more in the city of Chicago. Given that Chicago is the nearest big city to Notre Dame, we thought it would be interesting to do some data analysis about two very important issues: public health and education. Both datasets started as JSON text which we parsed and organized into lists, and then eventually into DataFrames.
On the City of Chicago Data Portal, we found a rich dataset about all public schools in the Chicago area from the 2011-2012 school year. This dataset contained 566 rows and 79 columns of data, where each row was a different school, and each column was a different characteristic of the school. Not only did this dataset provide test scores and other data about student performance, but it also contained data about parent involvement, safety, student misconducts, and location.
The second dataset that we decided to analyze was on about public health statistics in Chicago from 2005-2011. This dataset contained 77 rows and 29 columns, where each row was a different community area in Chicago, and each column was a different public health statistic for that area. The dataset also provided statistics about poverty, housing, and education in those areas, which made it a great source to pair with the public education dataset.