Month: August 2014


Which state is India’s biggest drinking state?

Data journalism is journalism first—and last. Basic principles of good journalism applies to data journalism as well. One of those fundamental principles is to check credibility of information. It starts with knowing where to get the most authentic information on a particular topic. Yes, even in the days of Google.

This story in The Hindu,  by Rukmini S, one of the very, very few practicing data journalists in India, beautifully illustrates that.

The topic is alcohol consumption in different states. And the context is Kerala’s decision to move towards prohibition. Some basic research, as the euphemism goes for Google search, convinced the media that Kerala is indeed the top per capita alcohol consuming state in India. And what can you beat it? The top drinking state heading towards prohibition…

But is Kerala really India’s most drinking state? It took a real data journalist to ask that question and bust the myth.

As Kerala takes the first steps towards prohibition, here’s a question: is Kerala really India’s biggest drinker? The media sure seems to think so; here’s the Times of India, saying so today (but giving no source), The Indian Expresssaid it in 2008 but the source study is nowhere on the internet and the Economist said so in 2013citing a Kerala-based advocacy group director. Various other reports cite Kerala’s 2008 Economic Review but this isn’t available online either.

Anyone who has any interest in tracking consumption pattern in India would know that the biggest agency that tracks that info is National Sample Survey Office (NSSO) under Ministry of Statistics and Programme Implementation, through its various “rounds” of surveys. Rukmini used that data to prove that it is not Kerala but Andhra Pradesh which is the biggest drinker.

Despite being a great reminder of what should not be passed off as data journalism, the story fails to excite. A simple and direct headline like “And you thought Kerala is the biggest drinker” could have been far more direct than a text-bookish headline like “India’s biggest drinkers.”

Nevertheless, it assures. That data journalism in India is in good hands.


DJ Showcase: (22 August 2014)

This one chart tells you everything you need to know about the state of Indian cricket

In yet another brilliant data journalism story, tells us the problem of Indian cricket in just one chart. Of course, you will appreciate the analysis a little better, if you are above 40. The chart shows that the cribbing that the 40-pluses do about T20 ruining test cricket may actually be right.

This chart is the gist of the story, This one chart tells you everything you need to know about the state of Indian cricket. 

Well, the chart pretty much tells you the entire story, if you know basics of cricket.   We like the story because of a number of reasons.

  1. It is application of data journalism in an area that that interests a large section of people in India. The story puts to rest a debate: whether or not the glamorous T20 format is impacting test cricket.  Whats more, it is extremely topical in the wake of India’s disastrous test series in England.
  2. You do not have to read the text at all. If you are tuned in to contemporary cricket, you get the story right away from the chart.
  3. It busts the myth that you need a lot of data analysis in the background to produce a good data journalism story. In this case, it is just simple addition, which a class II student can do.

In our showcase series on good data journalism stories, has clearly come out on top so far, even though it does not claim to be a data journalism site per se. It proves what DataJourno believes as its core philosophy: a good data journalism story must be a good story first. How much of data it contains or how great the visualization looks are not what determine the quality of a data journalism story.


Top Domestic Grossers of Bollywood

Almost every major new release these days is setting a new record in terms of collections at the box office. Does that mean that today’s movies are more successful than movies of yesteryears? Not everyone agrees. Many rightly point to the declining value of rupee as being the major reason behind this phenomenon.

DataJourno calculated the 2013 value of the domestic box office collections of all major Indian hits to come up with the list of top 15 domestic grossers of all times. All the figures are in INR crores and are net box office collections. The present value calculations are based on available data. This IBN Live report can be a good source for verification. Movies released in 2014 have been excluded from this list. Some of them like Klick would surely feature in this list.

Bollywood Top Grossers of All Times (till 2013)

Top Grossers in Bollywood

The figures in green are the current value of collections, while the the figures in red are collections when the movies were actually released


Some clarifications.

1. It is not comprehensive. We may have missed some movies, especially towards the lower end of the table.

2. Reliable data on collection of many movies are not available, at least not publicly. This may introduce discrepancies such as small overseas collections being part of the collections for older movies.

But does this do justice to all films? Clearly not. It is just a technical presentation, which says which are the movies that have been commercially most successful. It does not go beyond that. It does not examine why. For example, the fact that the Mother Indias and Mughal-E-Azams did the business from a fraction of number of screens as compared to the Dhoom 3s or Krrish 3s do not come out from this chart. Neither does the rise in disposable incomes of average Indians. At the same time the fact that today people have access to so many other forms of entertainment also does not come out of this chart.

So, take it for what it is. And see if your favorite movie is there in the list.

DJ Showcase: LiveMint (11 AUGUST 2014)

Were the Afghan elections rigged?

From the headline, it looks like yet another political story from Afghanistan. But take a closer look and you will find it it one of the finest examples of data journalism. The fact that it is in an Indian newspaper and that too, one that primarily covers business and economy, just adds to its charm.

There are several reasons why we like this one so much.

First of all, it sets the expectation straight away. The blurb says what it is: one popular method used to determine whether a data set is doctored is to look at the last digits of the values. The simple sentence does two things: it raises the interest level of those looking forward to a data story; at the same time, it turns away those who are uncomfortable with data but are looking for a spicy story on some new evidence of malpractice being caught on camera. In short, it sets an accurate expectation. That is good journalism.

Second, and that is primarily the reason why it is here is that it actually tests the limits of data journalism. While most data journalism stories are about analyzed results, it is about nature of data sets.  While most are about data analysis, itis about statistics and yes, probability.  It looks at elections data in India, Iran and Afghanistan to suggest that elections in Iran and Afghanistan were probably rigged at the counting stage.


In an election, for example, there is no reason that the distribution of the last digits of the vote counts of various candidates should not be uniform—given the large number of votes that each candidate gets, the last digit is essentially random, and there is no reason that the probability of a 1 in the units place is more than that of a 2 in the units place. Thus, in a free and fair election, it is likely that the last digit is distributed uniformly.

Third, and this aspect often ignored by new age data journalism champions, many of who understand data very well but are not familiar with the basic promises of good journalism. And that is: you have to be fair and balanced, even if that takes a little interest away from the story. You cannot sacrifice these basic journalism values to make a story more interesting. This story adds the ‘note of caution’ in a very clear and prominent way.

Finally, a note of caution. There are several ways in which an election can be rigged. Speaking broadly, it can be rigged at either the voting or the counting stages. This method of looking at the last digits only gives us an indication of the probability of rigging in the counting stages. Methods such as “ballot stuffing” (reportedly not uncommon in India) cannot be caught with such methods.

Great work.

The need for data sensitivity in newsrooms

While we discuss data journalism and various advanced tools of data analysis, data scraping and visualization, what often gets overlooked is the need for sensitivity towards data by any average journalist.

Two days back, the Indian government released information on revenue generated by major historical monuments in India in the financial year 2013-14 (April 2013 to March 2014). It was a simple list of monuments with the revenue generated by a monument from ticket sales/camera charges etc against it. For some strange reasons, in the press release, the sequence was not in decreasing order of revenue generated by these monuments, as one would expect. Neither was it alphabetical nor was it arranged according to regions (such as North, South, East, West). It was not arranged in any particular order.

But the list itself is fairly simple to understand. Most newspapers presented the list, quite logically, in order of decreasing revenue. The Times of India carried the list as the lead item in today’s newspaper, but giving it a different twist, by choosing to highlight falling visitor numbers in the aftermath of the Delhi rape case and how that has affected visits to monuments—something which is not evident from the data itself.

But while trying to present the list of top revenue generating monuments, the Times of India, missed a few. And there was nothing written to suggest that it was just a representative list and not the top 10. So, some monuments such as Konark, Khajuraho and Elephanta were missing from the list.

Here is the correct list. The figures are all in INR million.


This raises questions about how much can we trust the data presented by newspapers? This is surely some data that no journalist would deliberately misrepresent for any vested/ideoological reason. The only plausible reason is the discomfort to make sense of data. And this is such a simple dataset.

While it is important to spread data journalism tools and techniques, it is equally important to sensitize news desks about teh need to understand simple datasets.


Why the new change in Juvenile Justice Act was much needed…in one chart

The Union Cabinet has cleared the bill to amend the Juvenile Justice Act, which among other things, will allow the courts to treat minors above the age of 16, accused in serious crimes as adults. The government plans to introduce the bill in the current session of Parliament.

Currently, the maximum punishment under the Juvenile Justice Act is three years’ confinement at correctional homes.

There’s enough evidence to suggest that the age brackets are an important parameter to consider in dealing with juvenile justice. The chart below shows how juvenile crime (number of juveniles apprehended) has changed over the years. The data is from National Crime Records Burea.

Juveniles Apprehended by Age Groups 


Juvenile Justice

Three Age Brackets: Three Stories

It is evident that the stories in the three age brackets are very different.

In the age bracket, 7-12 years, number of crimes has actually gone down significantly. It is a 63% drop between 2003 to 2013. That is an average 9% year on year drop.

In the age bracket, 13-16 years, there has been a 14% growth in these 10 years, which translates to 1% average annual growth (CAGR). That is far lower than the growth in overall crime rate.

It is the age bracket of 17-18 years which has actually seen a steep rise in number of juveniles being apprehended. The growth is 60% or a CAGR of 5%.

According to some lawyers, in India, many people in villages and small towns, do not have proper birth records. The lawyers, who know that this is a sure shot way of escaping punishment, often use this to their advantage. So, many offenders, who are actually 22, 23 or even more, escape by claiming they are below 18.

The new changes in Juvenile Justice Act would precisely be able to tackle this problem.


Data Journalism in India: Nascent but noticeable

Shyamanuja Das

In a well-discussed (and well-tweeted) article published on the Global Investigative Journalism Network website, India’s Media – Missing the Data Journalism Revolution recently, journalist and academician, Priya Rajasekar, argues that Indian media, by and large, is still to wake up to the opportunity of data journalism.

Probably the first in-depth analysis on the subject in India, the article is fairly complete in terms of capturing the viewpoints of the entire spectrum of stakeholders—the academicians, the practitioners and other influencers. It even goes to explore the reasons behind what it calls the Indian media’s “not subscribing to the idea (of data journalism)”.

The basic assumption — that Indian media has not really taken to data journalism seriously — is not exactly way off the mark, if one takes into account only the traditional media. India surely does not have the likes of a Guardian Datablog and NYT Upshot.

But then, how many of the traditional media brands even in the developed markets have such initiatives? India’s The Hindu  actually has a dedicated section on data stories, though its nowhere near Guardian and NYT sites.

There are online ventures, though. Though not quite the Vox and FiveThirtyEight of India, some of them are making an impact. A few are dedicated to data journalism, while others are news and analysis sites but do have a few good data journalism stories. Some traditional media houses have also started exploring the area in a more focused manner, which interestingly, is being noticed by even the common readers.

Two events in the recent past have helped the cause in a big way.

One, of course, was the General Elections held in April – May 2014. India’s is the largest elections on earth, not just in terms of the size of the electorate but also in terms of number of political parties. India has more than thousand political parties, out of which about 60 are recognized national and state parties. That makes analyzing vote shares and linking that to seats won fairly complex and interesting. With the Election Commission sharing raw data, we saw a lot of good analysis this time. DataJourno carried a round-up of election coverage here.

The other was release of crime data by National Crime Records Bureau (NCRB). Though NCRB has been sharing this data for many years now, thanks to the growing awareness about data analysis, almost all newspapers did multiple stories this time, analyzing the data. Crime against women and regional trends in crime dominated the coverage.

Here is a round-up of some of the data journalism initiatives in India. These are among the most noticeable efforts, though the list is not exactly comprehensive. One clarification: there are quite a few other sites that have fairly decent content based on analyzing data. But they are not really journalistic stories, for there are no ‘stories’ in most of them. In fact, that is a big confusion that exists in data journalism—what is journalism and what is not. But then, that is a topic by itself and is not restricted to India. So, we will keep that for another day.

Here is the list, with examples wherever possible.

In addition, two other newspapers must be mentioned for their data journalism efforts, though they do not call it by that name. Mint, a business newspaper and Times of India, India’s largest selling English newspaper.  Mint was the first newspaper to start visualizing stories much before the excitement about data journalism started. It also does a number of data analysis stories but they are restricted to mostly macroeconomics, not of immense interest to the lay readers. The Times of  India, has started a regular section in its print version, called STATOITICS (TOI is a shorter version of its full name), where it presents interesting data through simple visualization.

The trend is new but is surely catching up. One challenge, though, is that number crunchers who can write some English are posing as data journalists, taking advantage of lack of presence of real journalists, many of whom are intimidated by numbers. So, instead of being the hot new area within journalism, data journalism has ended up becoming a poor cousin of data science and analytics.