Fusion Cooking with Foraged Data

Turning a website into a dataset can be easier than you think. This recipe explains how to take tables from a website, turn them into datasets, and then turn combine the tables with Google Fusion Tables. It finishes by explaining how to use the combined table to create a mash-up map. If you already have all the data in one table, you can skip to Steps 4 and 5.

You will need

  • Data held in a table on someone’s website
  • A Google Account
  • Google Chrome (you can use other web-browsers, but Chrome includes some useful features that help you)

Step by step

Step 1: Find your Data

We’re using information about the locations and opening times of Bristol libraries. Right now the information is only available as tables of data on different pages of the Bristol City Council website, but it’s not available as raw data. However, we might be able to turn it into raw data the way it is held on the web page is structured using a HTML table.

First we have to check how the information is structured. To do that, we’ll look at the source-code of the page, and explore whether it is contained within < table >, < tr > (table row) and < td > (table cell) tags in a repeating standard pattern. The Google Chrome browser ( ) provides a useful shortcut to find something in the source code. Highlight some of the information, then right-click and choose ‘Inspect Element’ to see a representation of the source code. Can you see ‘td’ and ‘table’ elements?

If you can’t, your page might need some more advanced scraping approaches, such as using Scraperwiki. Otherwise, you now need to search your page source code for other ‘table’ elements. Count them, and count where in the list the table with the information you need comes: for example, is it the third table to be found in the source code. (Only count < table >, don’t count the closing < / table > tags).

Step 2: The magic ImportHTML

We’re going to use a special Google spreadsheets ‘Import’ formula to extract data from the web page. You will need to create a new spreadsheet at (you will need to use your Google login, or create one if you don’t have one).

In the first cell of the spreadsheet (A1), enter

=ImportHTML(“http://webpagethedataison.com/page.html”,n)

where the first part inside the brackets is the address of the page with the data on (in quotes), and n is the number you found earlier (e.g. this is the nth table in the page).

When you hit enter, Google Spreadsheets should go and look-up your page and fetch back the spreadsheet data, filling in values across the spreadsheet.

For the Bristol libraries data, we created two Google Spreadsheets, one with the formula:

=ImportHTML(“http://www.bristol.gov.uk/item/venuelist/?VenueTypeId=1&XSL=venueaddress”,”table”,2)

and the other with:

=ImportHTML(“http://www.bristol.gov.uk/item/venuelist/?VenueTypeId=1&XSL=venueopening”,”table”,2)

Note on the above: Copying and pasting these links into Google spreadsheet did not work for me. Eventually found out that if you delete the double quotes and just retype them in situ it works !!! see link for explanation

Step 3: Clean Carefully

Sometimes the data you have imported may not all appear in the right cells. This happens when the original web page uses merged cells in it’s table (spanning columns or rows). Right now your formula is a live-link to the original web page. If you try and change any of the values it has imported it will quickly reset them to match the web page.

If you don’t need to edit anything, then leaving the live-link in place is useful. Otherwise, you will need to copy all the imported data, and paste it back into the same place using the ‘Paste Values Only’ option from the ‘Edit’ menu.

You can now tidy up any elements that need tidying up.

To save your spreadsheet - click where it says ‘Untitled Spreadsheet’ and give it a title. Make the title really descriptive, including details of the source and date you took the data.

For example, I named the Bristol library spreadsheets:

Bristol Library Locations (Created 25th Jan 2011) and Bristol Library Opening Times (Created 25th Jan 2011)

Step 4: Fusion time

Now we’re going to use one of my favorite open data cook book kitchen gadgets: Google Fusion Tables. It makes combining data and visualizing it really easy. The important thing about using Google Fusion Tables to combine two datasets is that they must have one column in common - something they can be fused together on.

First go to and log-in with your Google account. Then choose ‘New Table’ and then ‘Import Table’. You can import the first Google Spreadsheet that you created earlier. Again, give it a clear name and describe where it comes from. Repeat for each table you created in the earlier steps. (If you only have one spreadsheet which contains all the information you need, you can skip to step 5).

Once both your spreadsheets are imported to Fusion Tables, open one of them and go to the ‘Merge’ menu. Pick the column in the table you have open that you are merging on, and then click in the ‘Enter URL or select below…’ box to be able to choose your second spreadsheet, and pick the column in that spreadsheet that you are merging against. Give your merged table a descriptive name and click ‘Merge’.

In the Bristol libraries example we merged the two tables by name - giving us a new table with both the addresses and opening times of libraries in one place.

Step 5: Season with some visualisation

Head back to the front page of your Fusion Tables account (just click the logo), and open up your new merged table.

From the ‘Visualize’ menu, see what visualizations are available to you. Fusion Tables try to automatically detect whether or not there are addresses, numerical and date columns in your data. If it’s not guessed right, you can tell it what sort of data columns contain from the ‘Edit’ → ‘Modify Columns’ menu.

We’re going to focus on mapping the data in this recipe. Choose ‘Map’ from the Visualize menu (if you have different address columns in your data you might have to choose which one to use), and watch as Fusion Tables looks for where to put your data on the map.

Zoom in to the area where the markers are. Click the markers to see the default display that Fusion Tables provides of your data.

Step 6: Adjust as required

With each of the visualizations you will find some customization options, and the option to ‘Get Embeddable Link’ which provides you with a snippet of HTML code that you can drop into websites and blogs to show your visualization elsewhere, like below (Note: you might need to change the ‘Share’ settings from the button at the top-right of the screen to allow other people to access your data - making it ‘public’ or ‘unlisted’ under Visibility settings).

You can customize how the mapping information windows that pop-up when you click on a marker appear by choosing ‘Configure info Window’, and you can change the markers using ‘Configure styles’.

Health Warnings

Google Fusion tables automatically updates merged tables when the values in the fusion table tables that go into them are changed. But, when you import a table into Fusion tables from Google Spreadsheets you are creating a copy, so changes you make in Google Spreadsheets (or changes it picks up from the website your data originally came from) will not be automatically cascaded to your combined table or visualisation.

If you need something that takes data form a website and keeps itself up to date a tool like Scraper Wiki may be more appropriate.

Examples and variations

Got an example of this recipe to share? Edit this page and add in a link!

scraping, , fusion tables, mapping
recipe/fusion_cooking_with_foraged_data.txt · Last modified: 2011/05/14 18:32 by John Brennan
You are here: startrecipefusion_cooking_with_foraged_data
CC Attribution-Share Alike 3.0 Unported Recent changes RSS feed Valid XHTML 1.0