Analyzing Chicago Crime Data

Analyzing Chicago Crime Data

December 28, 2019 ( last updated : July 12, 2017 )
DataScience Data Analysis SQL

https://github.com/SEUNGJO/IBMSQL


Information

Using this Python notebook you will:
1.Understand 3 Chicago datasets
2.Load the 3 datasets into 3 tables in a Db2 database
3.Execute SQL queries
Understand the datasets
To complete the assignment problems in this notebook you will be using three datasets that are available on the city of Chicago's Data Portal:
1.Socioeconomic Indicators in Chicago
2.Chicago Public Schools
3.Chicago Crime Data
1. Socioeconomic Indicators in Chicago
This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.
For this assignment you will use a snapshot of this dataset which can be downloaded from: https://ibm.box.com/shared/static/05c3415cbfbtfnr2fx4atenb2sd361ze.csv
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2
2. Chicago Public Schools
This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.
For this assignment you will use a snapshot of this dataset which can be downloaded from: https://ibm.box.com/shared/static/f9gjvj1gjmxxzycdhplzt01qtz0s7ew7.csv
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t
3. Chicago Crime Data
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.
This dataset is quite large - over 1.5GB in size with over 6.5 million rows. For the purposes of this assignment we will use a much smaller sample of this dataset which can be downloaded from: https://ibm.box.com/shared/static/svflyugsr9zbqy5bmowgswqemfpm1x7f.csv
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

Connect to the database

Let us first load the SQL extension and establish a connection with the database
%load_ext sql
# Remember the connection string is of the format:
# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name
# Enter the connection string for your Db2 on Cloud database instance below
%sql ibm_db_sa://szd30383:rw2r69jsqnnzgd-g@dashdb-txn-sbox-yp-dal09-03.services.dal.bluemix.net:50000/BLUDB

Find the total number of crimes recorded in the CRIME table

# Rows in Crime table
%sql SELECT COUNT(*) FROM CHICAGO_CRIME_DATA

Retrieve first 10 rows from the CRIME table

%sql select * from CHICAGO_CRIME_DATA limit 10

How many crimes involve an arrest?

%sql select count (*) from CHICAGO_CRIME_DATA where arrest

Which unique types of crimes have been recorded at GAS STATION locations?

%sql select distinct(primary_type) from CHICAGO_CRIME_DATA where location_description = 'GAS STATION'

In the CENUS_DATA table list all Community Areas whose names start with the letter ‘B’.

%sql select community_area_number, community_area_name FROM CENSUS_DATA WHERE community_area_name LIKE 'B%'

Which schools in Community Areas 10 to 15 are healthy school certified?

%%sql
select name_of_school
from CHICAGO_PUBLIC_SCHOOLS
where community_area_number BETWEEN 10 and 15
and healthy_school_certified = 'Yes'

What is the average school Safety Score?

%sql select avg(safety_score) from CHICAGO_PUBLIC_SCHOOLS

List the top 5 Community Areas by average College Enrollment [number of students]

%%sql
select community_area_name
FROM CHICAGO_PUBLIC_SCHOOLS
GROUP BY community_area_name
ORDER BY SUM(college_enrollment) / COUNT(name_of_school) DESC
LIMIT 5

Use a sub-query to determine which Community Area has the least value for school Safety Score?

%%sql
select community_area_name
FROM CHICAGO_PUBLIC_SCHOOLS
where safety_score = (select min(safety_score) from CHICAGO_PUBLIC_SCHOOLS )

[Without using an explicit JOIN operator] Find the Per Capita Income of the Community Area which has a school Safety Score of 1.

%%sql
select D.per_capita_income
from CENSUS_DATA AS D, CHICAGO_PUBLIC_SCHOOLS AS S
where D.community_area_number = S.community_area_number
AND S.safety_score = 1

Originally published December 28, 2019
Latest update July 12, 2017

Related posts :