Discovering possible association between childhood obesity and children's built environment in the neighborhood around their homes.
This is a follow-up project built upon the Food Environment and Obesity project. After examining the associations between food and weight outcomes, it's only natural to wonder how other elements of the environment could affect children's health and weight.
Parks are essentially polygons. Like calculating point to point euclidean distance, we can also use nncross
under spatstat
package or gDistance
functions under rgeos
package to get the shortest distance from a point to the nearest polygon. While nncross
is faster in finding the shortest distance travelled, gDistance
is more versatile since it is able to compute distance between each point and every polygon in the data. To further compute street network distance, I use network analyst
package in ArcGIS.
I want to examine how close or far away parks are located from students and their homes, to understand how the distance between thier homes and the nearest park could be associated with their weight outcomes, i.e. are they more likely or less likely to be overweight or obese? Or is there no association? Using nncross
, I can convert a set of x-y coordinates into a ppp
object. With maptools
package, I can read parks in the form of shapefiles, and convert them into psp
objects.
While it is important to know how close or far away your nearest park is, it is also critical to understand the bigger picture by looping in the number of parks you have access to within certain distances. What if you are a high school student and the nearest park is a playground that your 7-year-old sibling likes better? So you probably won't hang out with your friends at the playground, but instead you may like this other place with a basketball court, only that one is 5 more minutes of walk for you. To count the number of park polygons within in certain distances, I can use gDistance
to first compute the distances between every student and the parks in the city, aggregate the results by the type of park(e.g. playgrounds, neighborhood parks, flagship parks like Central Park, etc.), and run a simple apply
to count the numbers. Of course all these assume students can enter a park at any point on the edge of the polygons, which is empirically not possible, but operationally the best guess I can produce.
Euclidean distance is a close enough proxy of measuring access, but in urabn areas like New York City where most people access public facilities by walking, I'd like my measurements to more accrurately reflect the reality. Using the network analyst
module in ArcGIS, I'm able to compute point to point walking/driving distance for how people access the food environment around their homes. Unfortunately ArcGIS (as of v10.5) cannot do the same for point to polygon yet. As an alternative, I use arcpy.Merge_management
to combine the street network and parks (as lines) and create a new layer of "streets" and use arcpy.Intersect_analysis
to pinpoint all the intersections between streets and park edges. These points argueably represent park entrances, and I therefore have a set of new points to replace the park polygons. The point to polygon problem now becomes the old point to point computation and arcpy.na.FindClosestFacilities
can take over from here. For street network distance calculation, I use Jupyter Book after connecting the Anaconda package to Arc10.5. A note on how to operationalize computing closest distance with large datasets is detailed in a separate .md file. (Arc is still a 32-bit software as of v10.5, and it cannot access more than 2G RAM per analysis, therefore when I ran 1000 students vs. ~80 parks points, it ran out of memory.)
There are two data sources involved:
- students' home addresses geocoded into x-y coordinates. Again this is confidential data that I cannot publish on this site, but in essence, any geo points will do;
- parks: New York City Department of Parks and Recreation surveys all city parks every four years and releases aerial images of the Open Space (Parks) data on NYC Open Data portal. The most recent year was surveyed in 2012 and released in 2014. Archive data and details of their survey methods are available here.