I am guilty of screen scraping

Posted: October 2nd, 2011 | Author: | Filed under: technical | Tags: , | 1 Comment »

I love data visualization. I am inspired by the fact that India has a wealth of data that can lead to wonderful visualizations that can help Government officials, politicians, journalists, social workers do their daily jobs better. Even the average citizen should be able to see the visualization and understand what’s going on and maybe get inspired to be a more responsible citizen.
I am on the constant lookout for Government data that is openly accessible and see if I can do something interesting with it. That led me to build this visual guide of elected representatives of India and this blog entry comparing corruption at the higher levels and by average citizens
There are a lot of websites (usually run by NGOs) that have data about Indian politicians and their details (education, criminal records, attendance in the parliament, participation in the parliament etc). This website has data on criminal records that made me interested. I emailed them asking (hoping) if they have any APIs that can be used to access the data, but I guess that is not high on their priority list. However, what they had was data arranged in HTML tables and lots of pages numbered serially. So I got into screen scraping mode and scraped out information of around 7800 candidates who stood in the elections in the Indian General Elections in 2009. Now, I have their educational qualifications, criminal records, total assets and liabilities (some have really obscene numbers) in my database.
The goal is now 2 fold.

  1. Create a simple visualization on the map of India to show educational qualifications, assets information and criminal cases of these candidates. Give the ability to filter by party, state. So viewers can see which areas had more educated candidates, which had more criminal candidates and where did the richest candidates come from.
  2. While creating the above goal, ensure that the API to access data is well written so that it can be exposed to other developers interested in building more cool visualizations.

This should be fun.


One Comment on “I am guilty of screen scraping”

  1. 1 Software for small business said at 2:35 pm on October 11th, 2011:

    Love what you are doing with the blog man!