Wrangling Data

Kristi Yang
2 min readSep 16, 2020

From reading Sandhya Kambhampati’s piece “Cleaner, Smarter Spreadsheets Start with Structure” I was surprised to have been surprised by her “structure data is more important than memorizing formulas” claim, because it seems so obvious. More important that learning =SUM(B3+B4) is learning how to organize your information in a readable way. Her reminder that data journalism is still journalism, hence needs to get information to people, brings to light the fact that your data needs to be clean, clear, and easy to understand.

In the same vein, Steve Doig’s “Basic Steps in Working with Data” article discusses the importance of treating the numbers as a live source, having a list of questions for the data, just as one would have for an interview. I think this humanization of numbers, essentially the reminder that data journalism is still journalism and should be carried out as such.

This week, I decided to look at New York State’s Criminal Justice Statistics, focusing on crime and victimization, specifically hate crime incidents. The dataset I looked at was titled “Hate Crime Incidents” and showed a five-year span (2014–2018) of reported incidents separated by county. As this was quite a large dataset, I extrapolated only data from the five New York City boroughs: Bronx County, Kings County, New York County, Queens County, and Richmond County. I cleaned up my data and represented my findings in a bar column chart with the years on the x-axis and the number of hate crimes on the y-axis. I had tried doing the opposite, however, after playing around with different graph layouts in Excel, this seemed to be the best fit. I used a different color to represent each county and kept the rest of my graph black.

From this graph, I can see that crime in Manhattan (New York County) has generally increased in the five-year period. Generally, crime in Brooklyn (Kings County) has been the highest of the five boroughs. This may be due to the fact that it is also the most populous of the five boroughs. On the other side of the spectrum, the number of reported hate crimes has remained the lowest in Staten Island (Richmond County). In the same vein, its the least populous borough, which may explain the low hate crime count. Though I expected these two findings, I also expected reported hate crime to be higher in Queens, as it is the most diverse of the boroughs with lots of ethnic enclaves, which I believe would correlate to hate crime incidence.

--

--