Recent Posts

Tutorial: Building a hover-enabled map using Tilemill

50 minute read

This is the first of two tutorials on adding hover interactions to interactive maps using free, open-source tools.

Today I wanted to break down the step-by-step process of how I created this interactive, chloropleth map of Georgia population change from 2010 to 2011. While today's tutorial doesn't cover how to add hover states or moving tooltips to your maps, such as those you see in this iteration of the same map made using CartoDB+Leaflet, I'll cover how to add those more advanced features in a later post (they require using a few more tools and doing a bit more programming). For now, though, we're going to stick to the basics of simple hover interactions. Here's what you'll need to follow along:

1. Gathering and preparing the data

The first thing you'll need to do is locate your data and parse it down to a format in which it can be easily visualized. For this example, we're using population estimates from the U.S. Census Bureau, which, luckily for us, has already done most of the heavy lifting analysis-wise. Download the data for Georgia or whatever other state you wish to visualize as a comma separated value (.csv) file from this page. Now you'll want to turn that .csv file into a Google spreadsheet. You can do this from the Google Drive dashboard by selecting Create>Spreadsheet, then choosing File>Import once your new spreadsheet opens. Locate and upload the .csv file. Select "Replace current spreadsheet" and set "Comma" as the separator character. Voila. Your data should then appear for each county. Here's what the correct options for that will look like:

Given that each county has a different base population, the only standar way to compare en masse how many residents each county gained or lost is by calculating the percent change (see here to figure out how to calculate that in Excel or Google Spreadsheets). In this example, however, the Census Bureau has already calculated out the percent change for us, making our job that much easier. Delete all the unnecessary columns from the spreadsheet, leaving only the county name in the first column, the 2010 and 2011 population totals in the second and third columns, and the "Percent" value in the fourth column (the "Percent" value refers to the percent change between the two years). Also make sure to delete any extra rows at the top or bottom of the spreadsheet, so that the first row contains the column titles, with the first county beginning on the second row and ending on the last row. Highlight the column containing the county names and select “Data>Sort sheet by column, A-Z." This will put the entries in alphabetical order.

If you happen to get a weird period ('.') preceding each county name as I did, you can get rid of it pretty easily by performing the following steps:

  1. Insert a new column to the right and, assuming your first county name begins in cell A2, enter the following formula into B2: =MID(A2,2,LEN(A2)). This function deletes the first character of the county name –– the unwanted period –– automatically.
  2. Copy and paste the new B2 cell into the rest of the rows to apply the same formula throughout the entire spreadsheet.
  3. Copy the new period-free column you just created and paste it as unformatted text into another new column to the right by selecting "Paste special>Paste values only." This formats the data in new column as values only so that it they won't include a formula that depends on the incorrectly formatted column to work.
  4. Delete the first two columns so that the first column now becomes the county names only, free of the preceding periods.

We're almost finished getting the data ready. All we have left to do is add in a column containing the official county codes for each county so that we'll have a common attribute with which we can merge the spreadsheet with the geometric data later. Download this .csv from the Census Bureau, which contains the county codes for all 50 states. Open it as a Google spreadsheet and delete all the rows except for the ones for the state you're visualizing. With only the rows for the state at hand remaining, highlight the "ctyname" column and select "Data>Sort values from A-Z." This should reorder the spreadsheet to be exactly the same alphabetic order as the spreadsheet with the population totals. Copy the "county" column containing the numeric county codes, which should now also be arranged in numeric order, and paste it into a new column in the spreadsheet with the population totals, which should also be sorted alphabetically. This should now give you the correct county codes for each county in a new column. Title that column simply "COUNTY" (all-caps). For an idea of what things should look like at this point, check out my Google spreadsheet here, or see the following screenshot:

One last thing to keep in mind: If your county codes include values less than three digits, which they probably will, make sure any values less than three-digits long have 0s preceding them to force them to be three digits long (i.e. '001'). That way, it will match up with the three digit values in the geometric data later on in the process.

Now that we have our population data ready, let's download the corresponding geometric county polygons from the Census Bureau here. For this tutorial we'll be using the shapefiles (.shp) format, so make sure to select the option on the download page. I obviously used Georgia in this example; feel free to choose whatever other state you desire, so long as you follow the exact same instructions. Unzip the archive to your computer. You should see three files in the new unzipped folder: a .dbf, a .shp and a .shx. All we'll be need for this tutorial is the .shp file.

At this point, you should have two different files: a Google spreadsheet of data formatted something like this, and a shapefile of corresponding county polygons. The next step will be to take the population data and bind it to the shapefile so that the two match up.

2. Binding the data to the shapefile using QGis

Fire up QGis. Select the "Add Vector Layer" option from the top of the window and locate the .shp file you downloaded earlier. Open it in QGis and you should see a nice outline of your state that looks something like this:

Now go back to your newly created Google spreadsheet and export the data as a .csv by selecting "File>Download data as." After downloading the .csv, rename it to something simple like "georgia.csv." The next thing we need to do is import the .csv into QGis. But before we can do that, we need to create a new '.csvt' file by the same name and in the same directory as the .csv that will tell QGis what type of data each column is (string, number, real, etc.). For this example, your '.csvt' file will look something like this, with a data type defining each column of the .csv:

If you're having trouble with this part, download my .csvt here. Should you need to adjust the data types, just force open it in TextEdit and change them. The main two things to make sure is that the COUNTY column is defined as a string and the PERCENT column is defined as a value.

Once your .csvt file has been placed in the same directory as your .csv, go back to QGis and add a new vector layer just as you did before with the shapefile. This time, locate the .csv file and open it. QGis should then automatically detect the .csvt file in the background and assign the appropriate data types to each column in the .csv. To make sure this worked correctly, you can control-click the new .csv vector layer and select "Properties>Fields" to check that each field has the appropriate data type.

Now you can get down to the business of binding the data from the .csv to the shapefile. Select the shapefile layer and go to "Properties>Joins." Add a new vector join, setting the .csv as the join layer and both the join field and the target field as COUNTY, like this:

Applying the join should merge your spreadsheet and shapefile, binding the population data to the polygons using the shared "COUNTY"attribute that contains the matching county codes. To check and make sure everything worked correctly, control-click the shapefile vector layer in QGis and select "Open attribute table." You should see the population data attached as columns to the end of the attribute table.

Once you've ensured that the data has been attached to the polygon vector layer, you can now export the new shapefile by control-clicking the shapefile vector layer, selecting "Save as," and saving the layer in the ESRI Shapefile format. Now you're ready to compress the shapefile package into a .zip and import it into Tilemill. where you can style it, add interactivity and more.

3. Styling the map in Tilemill

Open Tilemill. Create a new project. Under the layers panel, add a new layer and locate the .zip of the ESRI shapefile you just exported from QGis. Upload the package. Upon doing so, you should immediately see the polygons for your state. Now you'll need to style the map in the style.mss panel using the Carto language. Because the numbers at hand for this map represent either a positive or a negative percent change, it makes sense to create a chloropleth map where red represents negative values and green represents positive values. You might try using ColorBrewer or 0to255 to find the right color ramp for your data. For this example, I used the following style parameters:


#georgia {

“Newspapers are the new startups”

1 minute read

Newspapers are the new startups . . . we’re starting to see a lot of great changes as technologies improve and cultures change."
-John Levitt, Director of Sales and Marketing,

Levitt's is one of the most insightful takes on the publishing industry I've heard in a while. It's going to take a lot of restructuring and a ground-up approach, but I'm excited to be a part of it as we embrace the start-up culture in Savannah.

SavSwap: Tackling the online classified ads market

11 minute read


Innovative, quality journalism takes money to produce. In the past, one of the largest revenue streams for news organizations has traditionally come from the classified ad market – a revenue stream that has all but dried up in today's era of Craigslist and eBay. As an online editor, developer, manager and digital strategist for Savannah Morning News and, a midsized news organization owned by Morris Publishing Group, I've sat through countless digital strategy meetings discussing how we as a company can win back a sliver of the online classified ad market, if for no other reason than audience growth, and with the longterm goal of driving revenue to support our company's journalistic efforts.

After brainstorming the issue with our V.P. of Audience Steve Yelvington, I identified a few key competitive advantages news organizations may still possess in the classified market:

  • Brand trust/recognition - Local news organizations still command considerable trust and boosterism in the markets they serve, adding an extra level of accountability to the classified ad process.
  • More secure social and physical verification - Unlike the major national competitors, we have the opportunity to verify users' identities using social, email and physical address verification, further weeding out spammers and scammers.
  • Mobile-centric technologies - The massive scale of national competitors has so far prevented them from implementing a more seamless mobile user experience using HTML5/responsive design strategies.
  • Print marketing - Local news organizations can still leverage their considerable print marketing footprint to add extra value to the online classified ad experience, including setting up a secure drop-off and pick-up point for transactions so that users don't have to go into the homes of strangers.
  • Social marketing - Strategic use of social channels tied in with existing larger social networks from local publishers targeting local buyers only. For example, each ad with an approved photo will be fed to an Instagram, Pinterest and Twitter account, which would serve as a second, highly visual storefront.

The following weekend, I built SavSwap, a prototype for the sort of product I believe could harness local news organizations' competitive advantages while at the same time creating a cleaner, more visual, simpler and more secure user experience than the national competitors have to offer. While still very much in beta state, SavSwap is now being proposed as a model for all Morris properties to adopt, and is slated for a local launch in the Savannah market this quarter.

Launch beta version of project

A few of the key features that make SavSwap stand apart from other classified ad attempts include:

  • A fully responsive, device-agnostic design that makes ad listing, browsing and submission easy and free from wherever you are, mo matter device you use.
  • Social media and email authentification of all users.
  • Premium listing options for paid members of, including "Membership" badges and higher page prominence
  • An inherently visual experience, with infinite scroll as well as traditional taxonomy and keyword search.
  • Geolocation.
  • Confidential on-site messaging system, with the option for users to display further contact information if they so desire.

Features that we plan to implement include:

  • Options for users to pay a small additional fee for their listing to appear in the print edition of Savannah Morning News.
  • A native iOS and Android mobile app (xCode template has already been built and is currently being prepared for submission to iTunes and Google Play stores).

For a brief presentation outlining the SavSwap model, see my slideshare presentation here. To see SavSwap in its current development state, see here.

Visualizing 2012 census estimates using CartoDB and Leaflet

17 minute read

I've been tinkering around with some new mapping tools lately, and figured I'd put them to good use by displaying the 2011-2012 population estimates released last week by the U.S. Census Bureau. The inherently geographical nature of the census makes it a data set just begging to be mapped.

Rather than the de facto Google Maps JavaScript API V3, I decided to go with CartoDB and Leaflet to see what I could produce.

As I mentioned in a recent post, CartoDB offers an excellent Fusions-esque interface, although it allows for far less front-end customization and requires more beneath the hood programming. Nonetheless, CartoDB can make pretty maps right out of the box, which you can then fully customize using the CartoDB API and basic SQL statements. There's one caveat, however: The service only allows you to upload 5 tables for free. That could be a dealbreaker for cash-strapped news organizations and freelance data journalists.

Anyhow, I downloaded a .zip shapefile package of all 159 Georgia counties from the U.S. Census Bureau, then brought the package into CartoDB using the service's default upload interface. Using Excel, I calculated the percent change from the most recent population estimates to last year's estimates. I then added the resulting values as a column in my CartoDB table, which you can see here.

After playing a bit with the API, I was able to format a diverging chloropleth map from my table with the following style parameters, written using 0to255 to ensure an equidistant color scheme:

#statewidepop {
#statewidepop [percent_change<=5.5] {
#statewidepop [percent_change<=4] {
#statewidepop [percent_change<=3] {
#statewidepop [percent_change<=2.25] {
#statewidepop [percent_change<=1.5] {
#statewidepop [percent_change<=0.75] {
#statewidepop [percent_change<=0.3] {
#statewidepop [percent_change<=0] {
#statewidepop [percent_change<=-0.5] {
#statewidepop [percent_change<=-1] {
#statewidepop [percent_change<=-2] {
#statewidepop [percent_change<=-3] {
#statewidepop [percent_change<=-4] {
#tl_2009_13_county[percent_change<=-5] {

Check out the resulting map:

The map above shows the percent change in population from July 2010 to July 2011 in all 159 Georgia counties, as estimated by the U.S. Census Bureau. The darker the green, the higher the positive percent change. The darker the red, the higher the negative percent change. Click on a county to see its percent change.

Pretty nice, huh? But what if I want to customize the style of the pop-up windows or perform more advanced functions like creating custom image markers or switching between layers? That's where Leaflet, an open-source JavaScript library, comes in handy.

Using the Leaflet library

The map above displays the estimated percent change in population of various midstate counties between July 2011 and July 2012. The greener the county, the higher its percent increase. The deeper red the county, the higher its percent decrease. Click on a county to see more precise totals, or select a group from the dropdown in the top right corner for a breakdown of the population changes by race.

To get a wider range of flexibility, I called up a segment of the statewide data – the counties within the Macon/Warner Robins metropolitan area – using the Leaflet javascript. Leaflet allows you to reference layers from CartoDB or Google Maps from within its API, making integration a breeze. All you have to do is reference a few lines of code and your CartoDB data will appear as a layer on your Leaflet map automagically. But even on its own, Leaflet is pretty robust, especially for being so lightweight.

In the map above, I took the county shapefile package from earlier, converted it to GeoJSON using QGis, then, following these parameters, called up the GeoJSON data for the selected counties using the Leaflet script. For the underlying map tiles, I created a custom style using Cloudmade, then referenced it using my API key and the following line of script:

var cloudmade = new L.TileLayer('http://{s}{z}/{x}/{y}.png',

Because I also wanted to show a breakdown of the data by race for the same geography, I added in a custom control menu that allows the user to switch between layers for easy comparison. In addition, I styled the popup to my liking, with green and red values to connote increase and decrease.

From there, I was able to add in the additional data sets of population change by race. For each demographic group, I created a corresponding layer group. Each layer group contained the data as well as the appropriate styles and colors. See the source code below:

var totalLayer = new L.LayerGroup();

What we can learn about charts from The WSJ Guide to Information Graphics

12 minute read

Although geared primarily toward the production of static graphics for print publications, Dona M. Wong's The Wall Street Journal Guide to Information Graphics (2010provides a wealth of salient and time-honored tips and guidelines that any student of data visualization would be well-advised to follow. At the heart of Wong's book is the notion that data integrity trumps all else, and no matter how aesthetically pleasing or visually powerful an information graphic may be, if it doesn't communicate clear and accurate data to the reader/user, it doesn't do its job.

In the first two chapters of The WSJ Guide, Wong, a former student of data viz extraordinaire Edward Tufte, addresses the topic of charting. From a theoretical standpoint, Wong lays out four principle steps to the charting process:

  1. Research: Find your data source, and ensure that it's timely, authoritative and free of bias.
  2. Edit:  Figure out what the data says (essentially, determine what your story is), and conceive of how best to boil that data down in a way that's simple enough for your intended audience to understand without skewing its meaning.
  3. Plot: Determine the appropriate chart type for your data (e.g. bar, column, line, pie, stacked bar, etc.), choose the right settings (scale, increments, axes, etc.), labeling the chart (e.g. legends and source lines) and pick the best color and typography combinations to accentuate your key message.
  4. Review: When you're done, ask yourself the following questions: Does the data match up with what external sources say? Are there any outliers? Does the chart make sense? What would the average user/reader think upon first seeing the chart?

Regarding the finer points of charting, Wong does an excellent job at pointing out the various dos and don'ts of the presentation process.  She sets forth clear guidelines about when to use what type of chart. For example, when dealing with change over time, Wong says to always use a line chart instead of a bar chart, as bar charts should ideally be reserved for comparing several different series of data. Also, Wong asserts, pie charts usually aren't as good of a choice for displaying complex data as bar or line charts, primarily because they make it harder to discern discrete differences in size (later, she flat-out dismisses the donut pie chart for the same reason). A few of her other tips I found particularly relevant included: (a) avoiding high-contrast color schemes that draw attention away from the data, (b) shying away from icons with high detail so as to avoid visual overload, (c) never, under any circumstances, add cloying shadow or 3D-effects and (d) never rely on zebra patterns, dotted lines or other fancy methods of labeling. "A chart is not a piece of fine art," Wong says.

Most importantly, Wong sets forth some other general principles to help designers avoid creating misleading charts. For example, when creating a bubble chart, always plot the bubbles by area, not radi. Also, never plot two different data series on noncomparable scales, and when creating bar charts, always start at the zero baseline. Other steps to ensuring data integrity include putting numbers into their appropriate context (comparing apples to apples), holding off on rounding until the end of the data analysis process and avoiding charting predictive numbers alongside actual ones. As Wong so eloquently puts it, "Unlike a misspelled word in a story, one wrong number discredits a whole chart."

Although the new addition of interactivity to chart design adds another layer of complexity to the visualization process that Wong doesn't address here, most of the guidelines she sets forth hold true in both static and dynamic mediums. Yet it would be interesting to hear what she has to say about the vexing question of when and when not to add static labels to interactive charts...

Making the case for hover interactions in maps

19 minute read

In keeping with my recent spate of mapping nerdiness, I decided to take an interactive map I produced last month displaying statewide annual population changes a step further by adding mouseover/hover capabilities. Here's the hover-y, nicely-colored chloropleth map I came up with. But before I get into the nitty-gritty of how I created the map –– which I'll explain step-by-step in a later post –– let me exercise a bit of self-indulgence by defending my growing belief in the need for hover capabilities when visualizing geographic data.

Not too long ago, I was an avid believer in the no-frills, less-interactivity-is-more approach to mapping geographic data, espoused by the brilliant Brian Boyer (@brianboyer), News Applications Editor for NPR and a former member of the News Apps Team at The Chicago Tribune. Boyer's argument for the need to keep maps simple –– like they used to be back in the days of ink and paper –– certainly has its merits. After all, the process of bringing a physical map closer to one's eyes to get a better view is a natural, timeless user interaction, and maps like this one, which Boyer produced during his time at The Tribune, are far more intuitive in communicating information upon first glance than many of the infoWindow-laden Google maps being produced by news organizations these days, many of them simply for the sake of being called 'interactive' (for those of you fortunate enough not to be mapping nerds, infoWindow is just Google-speak for the clickable popup boxes you see in Google maps).

But Boyer's minimalist mapping aesthetic only really works when you have one or two pieces of textual data you want to display for each geographic area. What if you have multiple pieces of information you want to display for each polygon, such as in this snazzy map from The Texas Tribune? Or, less likely but equally problematic, what if you need to bind non-textual data to your geographic polygons, such as images or Google Charts? In cases such as these, you're going to need to provide some sort of interaction that allows the user to expand and collapse the data for each area individually, or you'll just end up with a chronic case of visual overload.

Not to mention, on a more abstract level, studies have repeatedly shown that users tend to spend more time on applications that provide direct feedback based upon their actions, even if that feedback sometimes makes their ability to consume information at first glance less efficient (see Donald Norman's 2005 book Emotional Design: Why We Love (Or Hate) Everyday Things, in which Norman asserts that the feeling of emotional satisfaction and empowerment users receive from triggering an action not only puts them in a clearer state of mind, but also makes them more engaged in the information at hand). So, if we're trying to communicate geographic data to users as effectively as possible, it only makes sense that we'd want to have a certain degree of user interaction –– both for the sake of preventing visual overload and for making users feel more engaged. Such is the logic behind clickable infoWindows.

Still, clickable popups leave us with another problem: Users have to make the conscious and deliberate effort to click a polygon to see the data for that geographic area. Requiring clicks may sound like a trivial task to the designer or journalist-programmer, but for the short-attention span user, it can be an awful lot to ask for. To be fair, however, click-triggered popups may not be much of a problem for maps with only a few dozen polygons. But for maps with hundreds of small polygons –– say, census tracts or zip codes –– it can be very tedious to click the right polygon without first having to zoom into the map so far that you lose sight of the broader context.

That leads me back to a conversation I had a couple of months ago with a friend of mine from Columbia's J-School, Michael Keller (@mhkeller), who's now working as the Senior Data Reporter at The Daily Beast. Michael insisted to me after a Hacks/Hacker event that providing hover interaction for maps is almost always a good thing, because hovers require less work on the part of the user. I'll admit that I was dubious at the time, thinking of hovers as often unwanted, accidental triggers that can be distracting to the map and data at hand. But lately I've come around to his way of seeing things. If implemented correctly (i.e. no flashy interactions that cover up other parts of the map), hovers are almost always a good idea. For example, this recent map of New York Stop-and-Frisk data that Keller produced for The New York World using CartoDB and Leaflet is so detailed that it couldn't possibly have worked without infoWindows, and would have been unwieldy if it relied on click-triggered interaction. By including floating mouseover capabilities, the map allows the user to scan quickly through the chloropleth map to see individual Stop-and-Frisk data from each block, without having to attempt to click through minute geographic areas.*

I'm certainly not advocating interaction for interaction's sake (although such a case could be made, given the dynamics of the Web). But I am saying that hovers give more immediate visual feedback than click-triggered events, especially in maps. Hovers help draw users into the data without requiring them to seek it out consciously–– almost like a catchy lede would in a print narrative. So for the time-being, I'm pro-hover.

*Keller later messaged me letting me know that some examples which better illustrate the power of hovers include this map and this map, both of which use hover functionality to help highlight the effects of proposed redistricting efforts in the New York State Senate. What you'll also notice about Keller's maps is that they include hover states, which I also think is a necessity, especially for maps that include lots of small polygons.

Building a responsive site in less than 20 minutes

6 minute read

An ever-so-sleek responsive portfolio site I designed for a friend in less than 20 minutes using Skeleton as a foundation.[/caption]

With all this talk lately of the new era of responsive design, I realized today that I've yet to create anything that's actually responsive. Given that I've only pondered using it in the implementation of complex, database-driven news sites, the task of tweaking every level of CSS to fit perfectly into a responsive grid system has so far seemed too daunting to tackle.

But I got curious this afternoon, and found my way upon a cool new library called Skeleton, which bills itself as a "a beautiful boilerplate for responsive, mobile-friendly development." Essentially, Skeleton is a collection of CSS and JS files that makes the mystery of responsive design seem a little less illusive. Upon uploading the package to my server, I was pleased to find a neatly-coded, easy-to-understand responsive site that I could play around with and tweak to my own liking. I ended up adjusting the grid size and performing some minor customizations to the underlying Skeleton structural CSS, but other than that, the development kit was, as it promised, "style agnostic."

Since I didn't have any specific projects I needed to be working on today, I decided to give Skeleton a whirl by designing a new online portfolio as a favor for my friend Daniel Medina. I'm not kidding – within 10 minutes I'd coded a fully satisfactory responsive portfolio site that looks beautiful on my iPhone and tablet.

So, if you're feeling experimental, try Skeleton out. It's exciting to see how quickly this technology is catching steam, and to see the actual workings of the responsive CSS in action firsthand. My next goal is to integrate a responsive layout into a more-powerful database-driven site, perhaps one designed in Drupal.

Raw data as oxymoron

1 minute read

"Raw data is both an oxymoron and a bad idea; to the contrary, data should be cooked with care."
- Geoffrey Bowker Bowker was spot on in his comments made last week at Columbia Journalism School. I can't tell you how many times I've had to make order out of chaos from "raw data," i.e. unintelligible, inaccurate spreadsheets.