Los Angeles County Commute Origin-Destination Map
I was working on this during September 1-7, 2016. This map shows commuting origins (homes) and destinations (jobs) in Los Angeles County. Commutes are counted per census tract. There is also a fuller screen version.
The brighter the red, the more commuter destinations (jobs, roughly) per unit of area. The brighter the green, the more commuter origins (homes, roughly) per unit of area. Red and green make yellow, so bright yellow means lots of both origins and destinations.
Originally I wanted to make a transit desert map. The idea was that I would find all the commutes in LA county, ask Google how long they took on transit, then make a map about which areas had the worst transit commutes. However, there are about 1,000,000 origin-destination pairs, which is too many to ask Google about.
I thought that maybe I could compute the transit commute durations myself. I had hoped to run a graph distance algorithm to get the travel times between different locations. I ended up looking at LA Metro's GTFS (General Transit Feed Specification) files and did end up loading all the ~11,000 transit stop locations into a searchable graph. That was pretty fun and successful. However, when I tried to load all of the ~3,000,000 daily individual times that buses / trains stop throughout the day through the system, my computer crashed. Maybe I'll try to make the graph structure more lightweight and try it again in the future...
I ended up setting for a map of just the sum totals of origins and destinations per tract, which was pretty low hanging fruit.
- The LA Metro site has nice stuff, including scheules, real time data, GIS data, etc.
- The US Census Site has all kinds of information, including:
- I used QGIS, a free GIS program, to convert the shape files to geojson files, which are much easier for me to inspect and parse.
Assumptions / Simplifications
- I used to dump non-contiguous tracts and cross-country trips (e.g. from LA County to Orange County), but I have since fixed that.
- I opened the *.shp file with QGIS and dumped it to *.geojson so it'd be a little easier to parse.
- I got the commuter files from the census site. I used grep to filter out the origin-destination with at least one end in LA County (06037), then used python to sum the trips up per tract.
- I used Python to load the shapes, areas, and commuter info into a dictionary. I then used the json Python library to dump that dictionary to a file.
- I capped the density coloring at 3000 commutes per square kilometer. That is, the colors don't get any more intense if the density rises above 3000 commutes per km^2. It would be nice to come up with some kind of responsive coloring scheme based on the maximum density in the screen or data set.
- The file containing the tract information is large (~10MB). It would be good to load it dynamicallly with AJAX or something similar to reduce load times.
- I'd like to load files for all counties and load them on the fly based on which states and counties you zoomed in on.
- Obviously, I'd like to take transit routes into account, as noted above. I'll have to make my graph more lightweight so that I can actually load them all. Or, alternately, I'll have to figure out how to run graph algorithms on graphs that don't fit into memory.
The map turned out more or less how I expected it to. It really helps show where the jobs and homes are. Some highlights:
- There is a large concentration of both jobs and homes in the so-called Wilshire-Santa Monica Corridor, as both experience and authority would tell me.
- You can really see the industrial zones to the southeast of downtown Los Angeles and down south of LAX. Interestingly, Hancock Park, which has a reputation for being a residential neighborhood, is very red because there are a lot of busineses, but really not that many residents. The houses just take up a lot of area.
- The mountains really come out strongly in black.