Public Data

DJ Showcase: (20 January 2014) is back with a fairly good data journalism story once again. Though the analysis is by another site,, which has created an infographic depicting living costs worldwide for expats. Of the 119 nations featured, India, Nepal and Pakistan have the lowest living costs, while Switzerland, Norway and Venezuela have the highest.

Yes, you read it right. Despite all the inflation talks, India is still one of the cheapest places on earth. Here’a s link to the story and below is the link to the infographic.


DJ Showcase: Final Work of Participants in ICFJ Data Journalism Workshop

The three day ICFJ Data Journalism workshop in Delhi held between 5th to 7th September 2014 culminated in all the participants being divided into groups to work on real life stories using freshly learnt techniques in data scraping, cleaning and visualization. Ideas ranged from Narendra Modi’s popularity on Twitter to changing pattern of media ownership; from transformation of India into a cashless economy to the changing definition of middle class in India.

Here are examples of some of the work that various teams produced at the end of the workshop.

Complaints against police (An Infographic on statistics on complaints against police and conviction rates)

Cashless in India (A data-based story on how India is turning to electronic transactions)

Class Calculator (A tool to calculate which economic category a consumer belongs based on consumption pattern)

Inside Media (An investigation into how ownership patterns have changed in top six most valuable  media companies in India)

Terror Statistics (A data-based story on the cost of conviction on terror cases)

There were more such. Some of them were

Online Video Advertising: The Reality

Crime Against Women in India

Narendra Modi vs Other Global Leaders: Popularity on Twitter

Modi Teri Ganga Maili (Ganga Action Plan)

They either do not have their work available online or if they have, Datajourno does not have those links.

DJ Showcase: Times of India (03 September 2014)

The Times of India’s regular STATOISTICS column in its print edition is a consistent effort to popularize infographics based stories. A good infographic, says visualization guru Albert Cairo, should be beautiful, functional and insightful. Most of the TOI infographics are beautiful and functional. But the “insight” or the “story” is often missing.

What’s a story? Something that is unusual (“man bites a dog”), counter-intuitive or in the other extreme, establishes something that people have somehow believed but there is no direct evidence.

Rarely does a great story comes from one source. You may get an idea. But then, you make a hypothesis, test it out by getting more information from new sources or verifying some of the already obtained information.

Data journalism is no different. Once in a while, if you are lucky, you can get a good story from a single dataset. You have to juxtapose a couple of datasets; may be some investigation is required. The “insight” or the “USP” of the story often comes from that. Even some basic observations about exceptions, predominant trend are a good starting point.

Look at this infographics

Almost in all food items (and these are not basic food items like rice, wheat, vegetables or dal) urban India outscores rural India. That is not surprising per se. But there are exceptions. Fish is something where rural India scores. Apple remains primarily an urban fruit while tropical fruits like guava or mango (the desi fruits) are consumed equally by rural and urban India.

A good starting point for a great story is often: why? And this (or any single) dataset won’t answer that. Some of the best data journalism ideas come from single datasets, but great ideas need great execution to make them great stories.


DJ Showcase: LiveMint (11 AUGUST 2014)

Were the Afghan elections rigged?

From the headline, it looks like yet another political story from Afghanistan. But take a closer look and you will find it it one of the finest examples of data journalism. The fact that it is in an Indian newspaper and that too, one that primarily covers business and economy, just adds to its charm.

There are several reasons why we like this one so much.

First of all, it sets the expectation straight away. The blurb says what it is: one popular method used to determine whether a data set is doctored is to look at the last digits of the values. The simple sentence does two things: it raises the interest level of those looking forward to a data story; at the same time, it turns away those who are uncomfortable with data but are looking for a spicy story on some new evidence of malpractice being caught on camera. In short, it sets an accurate expectation. That is good journalism.

Second, and that is primarily the reason why it is here is that it actually tests the limits of data journalism. While most data journalism stories are about analyzed results, it is about nature of data sets.  While most are about data analysis, itis about statistics and yes, probability.  It looks at elections data in India, Iran and Afghanistan to suggest that elections in Iran and Afghanistan were probably rigged at the counting stage.


In an election, for example, there is no reason that the distribution of the last digits of the vote counts of various candidates should not be uniform—given the large number of votes that each candidate gets, the last digit is essentially random, and there is no reason that the probability of a 1 in the units place is more than that of a 2 in the units place. Thus, in a free and fair election, it is likely that the last digit is distributed uniformly.

Third, and this aspect often ignored by new age data journalism champions, many of who understand data very well but are not familiar with the basic promises of good journalism. And that is: you have to be fair and balanced, even if that takes a little interest away from the story. You cannot sacrifice these basic journalism values to make a story more interesting. This story adds the ‘note of caution’ in a very clear and prominent way.

Finally, a note of caution. There are several ways in which an election can be rigged. Speaking broadly, it can be rigged at either the voting or the counting stages. This method of looking at the last digits only gives us an indication of the probability of rigging in the counting stages. Methods such as “ballot stuffing” (reportedly not uncommon in India) cannot be caught with such methods.

Great work.

The need for data sensitivity in newsrooms

While we discuss data journalism and various advanced tools of data analysis, data scraping and visualization, what often gets overlooked is the need for sensitivity towards data by any average journalist.

Two days back, the Indian government released information on revenue generated by major historical monuments in India in the financial year 2013-14 (April 2013 to March 2014). It was a simple list of monuments with the revenue generated by a monument from ticket sales/camera charges etc against it. For some strange reasons, in the press release, the sequence was not in decreasing order of revenue generated by these monuments, as one would expect. Neither was it alphabetical nor was it arranged according to regions (such as North, South, East, West). It was not arranged in any particular order.

But the list itself is fairly simple to understand. Most newspapers presented the list, quite logically, in order of decreasing revenue. The Times of India carried the list as the lead item in today’s newspaper, but giving it a different twist, by choosing to highlight falling visitor numbers in the aftermath of the Delhi rape case and how that has affected visits to monuments—something which is not evident from the data itself.

But while trying to present the list of top revenue generating monuments, the Times of India, missed a few. And there was nothing written to suggest that it was just a representative list and not the top 10. So, some monuments such as Konark, Khajuraho and Elephanta were missing from the list.

Here is the correct list. The figures are all in INR million.


This raises questions about how much can we trust the data presented by newspapers? This is surely some data that no journalist would deliberately misrepresent for any vested/ideoological reason. The only plausible reason is the discomfort to make sense of data. And this is such a simple dataset.

While it is important to spread data journalism tools and techniques, it is equally important to sensitize news desks about teh need to understand simple datasets.


Why the new change in Juvenile Justice Act was much needed…in one chart

The Union Cabinet has cleared the bill to amend the Juvenile Justice Act, which among other things, will allow the courts to treat minors above the age of 16, accused in serious crimes as adults. The government plans to introduce the bill in the current session of Parliament.

Currently, the maximum punishment under the Juvenile Justice Act is three years’ confinement at correctional homes.

There’s enough evidence to suggest that the age brackets are an important parameter to consider in dealing with juvenile justice. The chart below shows how juvenile crime (number of juveniles apprehended) has changed over the years. The data is from National Crime Records Burea.

Juveniles Apprehended by Age Groups 


Juvenile Justice

Three Age Brackets: Three Stories

It is evident that the stories in the three age brackets are very different.

In the age bracket, 7-12 years, number of crimes has actually gone down significantly. It is a 63% drop between 2003 to 2013. That is an average 9% year on year drop.

In the age bracket, 13-16 years, there has been a 14% growth in these 10 years, which translates to 1% average annual growth (CAGR). That is far lower than the growth in overall crime rate.

It is the age bracket of 17-18 years which has actually seen a steep rise in number of juveniles being apprehended. The growth is 60% or a CAGR of 5%.

According to some lawyers, in India, many people in villages and small towns, do not have proper birth records. The lawyers, who know that this is a sure shot way of escaping punishment, often use this to their advantage. So, many offenders, who are actually 22, 23 or even more, escape by claiming they are below 18.

The new changes in Juvenile Justice Act would precisely be able to tackle this problem.


Data Journalism in India: Nascent but noticeable

Shyamanuja Das

In a well-discussed (and well-tweeted) article published on the Global Investigative Journalism Network website, India’s Media – Missing the Data Journalism Revolution recently, journalist and academician, Priya Rajasekar, argues that Indian media, by and large, is still to wake up to the opportunity of data journalism.

Probably the first in-depth analysis on the subject in India, the article is fairly complete in terms of capturing the viewpoints of the entire spectrum of stakeholders—the academicians, the practitioners and other influencers. It even goes to explore the reasons behind what it calls the Indian media’s “not subscribing to the idea (of data journalism)”.

The basic assumption — that Indian media has not really taken to data journalism seriously — is not exactly way off the mark, if one takes into account only the traditional media. India surely does not have the likes of a Guardian Datablog and NYT Upshot.

But then, how many of the traditional media brands even in the developed markets have such initiatives? India’s The Hindu  actually has a dedicated section on data stories, though its nowhere near Guardian and NYT sites.

There are online ventures, though. Though not quite the Vox and FiveThirtyEight of India, some of them are making an impact. A few are dedicated to data journalism, while others are news and analysis sites but do have a few good data journalism stories. Some traditional media houses have also started exploring the area in a more focused manner, which interestingly, is being noticed by even the common readers.

Two events in the recent past have helped the cause in a big way.

One, of course, was the General Elections held in April – May 2014. India’s is the largest elections on earth, not just in terms of the size of the electorate but also in terms of number of political parties. India has more than thousand political parties, out of which about 60 are recognized national and state parties. That makes analyzing vote shares and linking that to seats won fairly complex and interesting. With the Election Commission sharing raw data, we saw a lot of good analysis this time. DataJourno carried a round-up of election coverage here.

The other was release of crime data by National Crime Records Bureau (NCRB). Though NCRB has been sharing this data for many years now, thanks to the growing awareness about data analysis, almost all newspapers did multiple stories this time, analyzing the data. Crime against women and regional trends in crime dominated the coverage.

Here is a round-up of some of the data journalism initiatives in India. These are among the most noticeable efforts, though the list is not exactly comprehensive. One clarification: there are quite a few other sites that have fairly decent content based on analyzing data. But they are not really journalistic stories, for there are no ‘stories’ in most of them. In fact, that is a big confusion that exists in data journalism—what is journalism and what is not. But then, that is a topic by itself and is not restricted to India. So, we will keep that for another day.

Here is the list, with examples wherever possible.

In addition, two other newspapers must be mentioned for their data journalism efforts, though they do not call it by that name. Mint, a business newspaper and Times of India, India’s largest selling English newspaper.  Mint was the first newspaper to start visualizing stories much before the excitement about data journalism started. It also does a number of data analysis stories but they are restricted to mostly macroeconomics, not of immense interest to the lay readers. The Times of  India, has started a regular section in its print version, called STATOITICS (TOI is a shorter version of its full name), where it presents interesting data through simple visualization.

The trend is new but is surely catching up. One challenge, though, is that number crunchers who can write some English are posing as data journalists, taking advantage of lack of presence of real journalists, many of whom are intimidated by numbers. So, instead of being the hot new area within journalism, data journalism has ended up becoming a poor cousin of data science and analytics.

DJ Showcase: (8 July 2014)

Four charts that explain why we don’t need a separate rail budget

In yet another great example of data journalism where data/charts (and not so beautiful, eye-catching ones at that) have been used arguing a point, shows why we do not need a separate rail budget. The underlying logic builds on the fact that Railways is neither a big expenditure head nor the most dominant force in its area: surface transport.


How Indian states are doing in terms of HDI…in one chart

With Narendra Modi making governance the central election issue in the 2014 General Elections, the debate, after long time, focused on development. While the BJP leader and current prime minister highlighted key economic parameters such as investment, industrial growth, power generation, per capita income etc, his critics pointed out to relatively poor performance in social parameters, measured globally by United Nation Development Program (UNDP)’s human development index (HDI). With two of the world’s most well-known economists, Jagdish Bhagwati and Amrtya Sen joining the debate on what constitutes good governance, the global community took notice.

As it is, truth is rarely black and white. While Modi made some tall claims about economic parameters, most notably by highlighting absolute parameters (which have always been higher for Gujarat, even before Modi) and not the delta during his period, his critics, while pointing to HDI figures did just the reverse. They chose to ignore the fact that Gujarat still was among the upper half of the Median when it came to HDI; it was only when one compares it with its ranking in terms of economic parameters that it looks pale.

The chart here shows absolute HDI 2007-08 (latest available) on the x-axis and growth between 1999-2000 and 2007-08 on the y-axis.

HDI in India

Without considering India’s overall performance in HDI, which remains low, the pure comparison among states, shows some definite trends.  In the chart, the lines show the median values and not the average values.

  1. The overall news is good, with most low HDI states such as Odisha, Bihar, UP, Chhatisgarh and Madhya Pradesh registering good growths. Expectedly, Kerala, Maharashtra, Punjab, Harya and Gujarat show a lower growth.
  2. Only one area that clearly falls in Quardrant I: North East.  It is, by definition, the only star, though Karnataka and Tamil Nadu too have good performance.
  3. Kerala, with an HDI that is ahead of Russia, Malaysia, Kuwait and Saudi Arbaia, has a growth slightly lower but it is still a good performance, considering how poorly the other two high HDI states have fared in terms of their growth. Delhi is actually the only entity to show a negative growth while Goa too shows a sluggish growth.
  4. Uttarakhand and Jharkhand, two younger states, clearly outshine everyone else in terms of growth. Uttarakhand is fast moving to Quardrant I.
  5. Rajasthan is the only state that is clearly falling behind. West Bengal too is not catching up with the rest of the Eastern brigade such as Bihar, Odisha, NE, Assam and Jharkhand which are clearly on the growth track now.
  6. Modi’s Gujarat, is clearly below the median growth and falls on the median line of absolute HDI. With a position of 11 among 23 states/UTs (NE is treated as one), it is clearly not a good showing. But it is not absolutely pathetic as many claim.


So, which are the best countries in the world?

Best…in what sense? Across what parameters? The question sounds too naive, too simplistic.

But it may not be. Going by data—the only thing we swear by—indeed, some countries may be better than rest of us in almost all areas, across all parameters.

DataJourno decided to carefully compare countries across measurable parameters in different aspects of life, society and economy. But instead of getting into individual data points, we decided to bank on already established systems that exist—in the form of global rankings and ratings.

By doing this,

  1. we avoided trying to reinvent the wheel, with our limited knowledge in many of those areas
  2. we avoided getting caught in standardization issues
  3. we short-circuited on time
  4. we banked on the credibility and quality of these ratings, which have only become better over the years.

After going through several such lists, we zeroed in on six such global rankings that are most credible and respected.They measure competitiveness, economic freedom, natural environment, human development, integrity of people, and quality of life. Of course, we went by their latest ranking, without trying to standardize on year. That would have made us take older rankings in many cases just because one list is older.

These lists are

  • Global Competitiveness Ranking by World Economic Forum 2013-14
  • United Nations Development Programme (UNDP) Human Development Index Rankings 2012
  • Transparency International Corruption Perception Index 2013
  • The Economist Intelligence Unit Where To Be Born (earlier called Quality of Life) Index 2013
  • The Heritage Foundation-Wall Street Journal Index of Economic Freedom 2014
  • Yale University Environment Performance Index 2014

The last two are comparatively new  and probably not as established as the other other four. But we still decided to include them as both these (economic freedom and environment) are important parameters and they are clearly the best in class, when it comes to those categories.

We decided to look at the top 20 ranking countries in each of those lists.

And this is what we found.

  • There are just 38 countries in this list of lists, which could potentially have 120 countries, if there was no overlap. A list of 60-80 would have meant fairly good overlap. A list of just 38 means a very high overlap.
  • If you remove all such countries that feature among top 20 in just one of the lists, there are 24 countries that feature in two or more lists.
  • There are as many as 12 countries that feature in 5 or 6 lists. Out of which, five countries (Singapore, Sweden, Netherlands, Denmark, Germany) feature in all the lists.

So, why can’t we call these five (or even for that matter, these 12), the world’s best countries? 

Here is a tabular summary of the country ranking.


In short, some countries are better than others in almost all respects.

But the bigger conclusion is: the parameters probably have a stronger correlation than we think they have.

Based on this, we create a composite ranking, giving equal weightage to all parameters. We decided to look at just the rankings and not the scores, as scores are not easily comparable, the scales being different.

Based on their ranks in all the lists, this is how the overall top 20 looks like.

  1. Singapore (features in all the 6 lists)
  2. Australia (features in 5 of the 6 lists)
  3. Switzerland (features in 5 of the 6 lists)
  4. Sweden (features in all the 6 lists)
  5. Netherlands (features in all the 6 lists)
  6. Norway (features in 5 of the 6 lists)
  7. Denmark (features in all the 6 lists)
  8. Germany (features in all the 6 lists)
  9. Hong Kong (features in 5 of the 6 lists)
  10. New Zealand (features in 5 of the 6 lists)
  11. Finland (features in 5 of the 6 lists)
  12. United States (features in 5 of the 6 lists)
  13. Canada (features in 4 of the 6 lists)
  14. Ireland (features in 4 of the 6 lists)
  15. United Kingdom (features in 4 of the 6 lists)
  16. Luxembourg (features in 3 of the 6 lists)
  17. Austria (features in 4 of the 6 lists)
  18. Japan (features in 3 of the 6 lists)
  19. Belgium (features in 4 of the 6 lists) & Iceland (features in 3 of the 6 lists)

As one can see, though richer countries do better, it is not necessarily smaller versus bigger. The US, Germany, Japan, UK, Canada and Australia feature in the list as do Luxembourg, Hong Kong and Singapore.

Here is a visual representation of how balanced the top countries look, across different parameters.



The more regular a graph looks, the more balanced is the country across parameters.