Sometimes you need to combine raw data from different files before you can start working with it. You might also need to convert the data from one format to another. Whilst mixing up lots of different files might require some programming, or setting up a database to mix your data into, you can combine and convert a few files easily using Yahoo Pipes.
Yahoo Pipes provides a drag-and-drop interface for working with data. You can ‘plumb together’ various modules which fetch, filter and transform your data. You can create your own pipes by going to and logging in with a Yahoo account (create one if you don’t already have one).
Look for the ‘Create Pipe’ button to get started. If you have already created a pipe and want to edit it, look for the ‘My Pipes’ link.
(Note: When this recipe was written Pipes was in the process of being upgraded from a Version 1 to Version 2 ‘engine’ to power the pipes. By default new pipes were still created using the Version 1, but if you save your Pipe early on and then re-load it, you get the option to upgrade to Version 2 displayed in the header. Upgrading seemed to make pipes using XML more reliable).
From the menu on the left of your new pipe select ‘Sources’ and drag and drop a ‘Fetch data’ module onto the canvas grid in the main area of the screen.
Enter the web address of one of the data files you want to include in your mix. This should be a link directly to the XML data file. You can check that Yahoo pipes has managed to find it by clicking on your ‘Fetch data’ module and checking the ‘Preview’ in the debugger draw at the bottom of the screen.
Once you are sure that Yahoo Pipes is able to read the data successfully, you can add another Fetch Data module to the canvas to fetch another file that you want to include in your mix.
In our case we’re mixing together Aid Transparency data about Afghanistan using two XML files from DFID and the World bank.
Now that you have two or more files being fetched by Yahoo pipes, choose ‘Operators’ from the menu on the left, and drag and drop a ‘Union’ module onto the canvas. You will see this has a number of input blobs at the top, and one output blob at the bottom.
Click and hold the output blobs at the bottom of one of your ‘Fetch data’ modules, and drag to an input blog of the ‘Union’ module. You should see a pipe gets created connecting these modules together. Connect all your ‘Fetch data’ modules to the ‘Union’ inputs, and then connected the ‘Union’ output to the ‘Pipe Output’ at the bottom of the canvas.
If everything has worked, when you click on ‘Pipe Output’ and choose preview in the debugger draw you will see a few numbers which, when clicked on, expand to show the data from your different files, now brought together in one place.
If you look at the preview now you will find it contains 0, 1 and numbers for each of the files you are mixing together, and you have to click each number to find the elements within your XML file that you want merged together. In most cases you really want to have all the elements from the different files properly blended, displayed night next to each other - rather than collected under separate headings.
To make that happen you will need to find out the name of the elements you want to mix together from the original XML files. In the example of aid transparency data, when we open one of the XML files in Firefox, or in Google Chrome with an we can see that all the elements we want to mix together are within ‘iati-activity’ tags. Adding ‘iati-activity’ into the ‘Path to item list’ section of each of the ‘Fetch data’ modules and then previewing the Pipe Output again we find lots of elements, each containing the details of an activity.
(Note: If the XML files you are mixing together are large previewing could take a while)
You can now save and run your pipe. Choose ‘Save’ from the menu at the top and then look for a ‘Run pipe’ link at the very top of the screen. When you run your pipe you will see a list of all the elements that have been mixed together. You also have options to get an RSS feed of these element (which, if your original data included titles and descriptions will just include these titles and descriptions), or to get a JSON data file which can be used in third-party tools and programming languages that understand JSON.
(Note: You can view the source of any pipes you find to get ideas for how to adapt and develop your own pipes).
There are a number of ways you can develop this recipe further:
-
A flexible pipe - In the recipe above you need to fix in advance the files you will mix together. Using the ‘User Input’ operators you can create a pipe that accepts user input and can be re-used for merging different
XML files. For example, allows you to specify two
XML files and what elements from should be merged. It defaults to the example we’ve been using above.
-
A filtering pipe - , John Adams uses features available in Pipes to filter the outputs from a file. Combine this with merging to only get the merged data you are interested in.
-
Based on an idea by John Adams and with thanks to Tony Hirst for endless good ideas, tips and tricks for using Yahoo Pipes.