Police Stop Data

Team Members: Sam Wheelock, Nelson Tan, Johan Boll, Kevin Pham, Jack Venberg

Sources: Data used to create these visualization comes from Stanford's open policing data .

Design Decisions

We wanted to answer the question, "How does race, age, and sex affect an individual's likelihood to be searched after a police stop in the USA?" We decided to use bar charts to see the correlation between the externally identifying factors of an individual and how many times they were searched out of how many times they were stopped. For race and sex we used a categorical encoding because they are distinct and unordered. We used an ordinal encoding for age so we could bin the ages and visualize them using a bar chart to be consistent with the categorical charts. We thought that a ten year age range (10-19, 20-29, 30-39, etc.) was a good granularity for analyzing different groups of ages. We decided on choosing all the cities and state patrols that had data on time of stop, subject race, subject sex, subject age, and if a search was conducted. The cities and states we chose were: San Diego, Nashville, Oakland, San Francisco, Hartford, Conneticut State Patrol, New Orleans, Montana State Patrol, Charlotte, Durham, Fayetteville, Greensboro, Raleigh, North Carolina State Patrol, Winston-Salem, Philadelphia, San Antonio, Burlington, Vermont State Patrol, and Washington State Patrol.

To allow exploration of the data, we wanted to see if time of day affected the type of people being searched. We implemented this with brushing of hours in the day and linking it to aforementioned bar charts. The chart we use for brushing would also double as a chart to see the percentage of all searched individuals. This allows for comparing the search rate based on a single identifying factor against the average search rate.

Development Process

Early parts of the project were split into smaller tasks, such as research and data wrangling. However, for the majority of development, we heavily collaborated on each part of the project. Ideation was done as a group to decide what our end goal was with the visualization and what story we wanted to tell. Most of the code was written as a group as well. Because we did not have a lot of experience with D3, Vega-Lite API, and Vega-Lite, doing group programming allowed us to leverage each others' understanding of the languages.

We spent approximately 10 hours on this project, consisting primarily of group Zoom calls where we collaborated on the visualization. The aspect of development that took the most time was deciding on a tool to create our graphic with and learning to use it effectively. We initially experimented with D3 then the Vega-Lite API, but eventually found Vega-Lite to be the best tool for our project. We also spent a few hours wrangling our data to get it into a usable format. This included deciding on which city to use and aggregating data into a usable format. This preprocessing took a lot of time but eventually paid off with faster and smoother interactions in our visualization.