DJ Showcase: (22 August 2014)

This one chart tells you everything you need to know about the state of Indian cricket

In yet another brilliant data journalism story, tells us the problem of Indian cricket in just one chart. Of course, you will appreciate the analysis a little better, if you are above 40. The chart shows that the cribbing that the 40-pluses do about T20 ruining test cricket may actually be right.

This chart is the gist of the story, This one chart tells you everything you need to know about the state of Indian cricket. 

Well, the chart pretty much tells you the entire story, if you know basics of cricket.   We like the story because of a number of reasons.

  1. It is application of data journalism in an area that that interests a large section of people in India. The story puts to rest a debate: whether or not the glamorous T20 format is impacting test cricket.  Whats more, it is extremely topical in the wake of India’s disastrous test series in England.
  2. You do not have to read the text at all. If you are tuned in to contemporary cricket, you get the story right away from the chart.
  3. It busts the myth that you need a lot of data analysis in the background to produce a good data journalism story. In this case, it is just simple addition, which a class II student can do.

In our showcase series on good data journalism stories, has clearly come out on top so far, even though it does not claim to be a data journalism site per se. It proves what DataJourno believes as its core philosophy: a good data journalism story must be a good story first. How much of data it contains or how great the visualization looks are not what determine the quality of a data journalism story.



Top Domestic Grossers of Bollywood

Almost every major new release these days is setting a new record in terms of collections at the box office. Does that mean that today’s movies are more successful than movies of yesteryears? Not everyone agrees. Many rightly point to the declining value of rupee as being the major reason behind this phenomenon.

DataJourno calculated the 2013 value of the domestic box office collections of all major Indian hits to come up with the list of top 15 domestic grossers of all times. All the figures are in INR crores and are net box office collections. The present value calculations are based on available data. This IBN Live report can be a good source for verification. Movies released in 2014 have been excluded from this list. Some of them like Klick would surely feature in this list.

Bollywood Top Grossers of All Times (till 2013)

Top Grossers in Bollywood

The figures in green are the current value of collections, while the the figures in red are collections when the movies were actually released


Some clarifications.

1. It is not comprehensive. We may have missed some movies, especially towards the lower end of the table.

2. Reliable data on collection of many movies are not available, at least not publicly. This may introduce discrepancies such as small overseas collections being part of the collections for older movies.

But does this do justice to all films? Clearly not. It is just a technical presentation, which says which are the movies that have been commercially most successful. It does not go beyond that. It does not examine why. For example, the fact that the Mother Indias and Mughal-E-Azams did the business from a fraction of number of screens as compared to the Dhoom 3s or Krrish 3s do not come out from this chart. Neither does the rise in disposable incomes of average Indians. At the same time the fact that today people have access to so many other forms of entertainment also does not come out of this chart.

So, take it for what it is. And see if your favorite movie is there in the list.

DJ Showcase: LiveMint (11 AUGUST 2014)

Were the Afghan elections rigged?

From the headline, it looks like yet another political story from Afghanistan. But take a closer look and you will find it it one of the finest examples of data journalism. The fact that it is in an Indian newspaper and that too, one that primarily covers business and economy, just adds to its charm.

There are several reasons why we like this one so much.

First of all, it sets the expectation straight away. The blurb says what it is: one popular method used to determine whether a data set is doctored is to look at the last digits of the values. The simple sentence does two things: it raises the interest level of those looking forward to a data story; at the same time, it turns away those who are uncomfortable with data but are looking for a spicy story on some new evidence of malpractice being caught on camera. In short, it sets an accurate expectation. That is good journalism.

Second, and that is primarily the reason why it is here is that it actually tests the limits of data journalism. While most data journalism stories are about analyzed results, it is about nature of data sets.  While most are about data analysis, itis about statistics and yes, probability.  It looks at elections data in India, Iran and Afghanistan to suggest that elections in Iran and Afghanistan were probably rigged at the counting stage.


In an election, for example, there is no reason that the distribution of the last digits of the vote counts of various candidates should not be uniform—given the large number of votes that each candidate gets, the last digit is essentially random, and there is no reason that the probability of a 1 in the units place is more than that of a 2 in the units place. Thus, in a free and fair election, it is likely that the last digit is distributed uniformly.

Third, and this aspect often ignored by new age data journalism champions, many of who understand data very well but are not familiar with the basic promises of good journalism. And that is: you have to be fair and balanced, even if that takes a little interest away from the story. You cannot sacrifice these basic journalism values to make a story more interesting. This story adds the ‘note of caution’ in a very clear and prominent way.

Finally, a note of caution. There are several ways in which an election can be rigged. Speaking broadly, it can be rigged at either the voting or the counting stages. This method of looking at the last digits only gives us an indication of the probability of rigging in the counting stages. Methods such as “ballot stuffing” (reportedly not uncommon in India) cannot be caught with such methods.

Great work.

The need for data sensitivity in newsrooms

While we discuss data journalism and various advanced tools of data analysis, data scraping and visualization, what often gets overlooked is the need for sensitivity towards data by any average journalist.

Two days back, the Indian government released information on revenue generated by major historical monuments in India in the financial year 2013-14 (April 2013 to March 2014). It was a simple list of monuments with the revenue generated by a monument from ticket sales/camera charges etc against it. For some strange reasons, in the press release, the sequence was not in decreasing order of revenue generated by these monuments, as one would expect. Neither was it alphabetical nor was it arranged according to regions (such as North, South, East, West). It was not arranged in any particular order.

But the list itself is fairly simple to understand. Most newspapers presented the list, quite logically, in order of decreasing revenue. The Times of India carried the list as the lead item in today’s newspaper, but giving it a different twist, by choosing to highlight falling visitor numbers in the aftermath of the Delhi rape case and how that has affected visits to monuments—something which is not evident from the data itself.

But while trying to present the list of top revenue generating monuments, the Times of India, missed a few. And there was nothing written to suggest that it was just a representative list and not the top 10. So, some monuments such as Konark, Khajuraho and Elephanta were missing from the list.

Here is the correct list. The figures are all in INR million.


This raises questions about how much can we trust the data presented by newspapers? This is surely some data that no journalist would deliberately misrepresent for any vested/ideoological reason. The only plausible reason is the discomfort to make sense of data. And this is such a simple dataset.

While it is important to spread data journalism tools and techniques, it is equally important to sensitize news desks about teh need to understand simple datasets.


Why the new change in Juvenile Justice Act was much needed…in one chart

The Union Cabinet has cleared the bill to amend the Juvenile Justice Act, which among other things, will allow the courts to treat minors above the age of 16, accused in serious crimes as adults. The government plans to introduce the bill in the current session of Parliament.

Currently, the maximum punishment under the Juvenile Justice Act is three years’ confinement at correctional homes.

There’s enough evidence to suggest that the age brackets are an important parameter to consider in dealing with juvenile justice. The chart below shows how juvenile crime (number of juveniles apprehended) has changed over the years. The data is from National Crime Records Burea.

Juveniles Apprehended by Age Groups 


Juvenile Justice

Three Age Brackets: Three Stories

It is evident that the stories in the three age brackets are very different.

In the age bracket, 7-12 years, number of crimes has actually gone down significantly. It is a 63% drop between 2003 to 2013. That is an average 9% year on year drop.

In the age bracket, 13-16 years, there has been a 14% growth in these 10 years, which translates to 1% average annual growth (CAGR). That is far lower than the growth in overall crime rate.

It is the age bracket of 17-18 years which has actually seen a steep rise in number of juveniles being apprehended. The growth is 60% or a CAGR of 5%.

According to some lawyers, in India, many people in villages and small towns, do not have proper birth records. The lawyers, who know that this is a sure shot way of escaping punishment, often use this to their advantage. So, many offenders, who are actually 22, 23 or even more, escape by claiming they are below 18.

The new changes in Juvenile Justice Act would precisely be able to tackle this problem.


Data Journalism in India: Nascent but noticeable

Shyamanuja Das

In a well-discussed (and well-tweeted) article published on the Global Investigative Journalism Network website, India’s Media – Missing the Data Journalism Revolution recently, journalist and academician, Priya Rajasekar, argues that Indian media, by and large, is still to wake up to the opportunity of data journalism.

Probably the first in-depth analysis on the subject in India, the article is fairly complete in terms of capturing the viewpoints of the entire spectrum of stakeholders—the academicians, the practitioners and other influencers. It even goes to explore the reasons behind what it calls the Indian media’s “not subscribing to the idea (of data journalism)”.

The basic assumption — that Indian media has not really taken to data journalism seriously — is not exactly way off the mark, if one takes into account only the traditional media. India surely does not have the likes of a Guardian Datablog and NYT Upshot.

But then, how many of the traditional media brands even in the developed markets have such initiatives? India’s The Hindu  actually has a dedicated section on data stories, though its nowhere near Guardian and NYT sites.

There are online ventures, though. Though not quite the Vox and FiveThirtyEight of India, some of them are making an impact. A few are dedicated to data journalism, while others are news and analysis sites but do have a few good data journalism stories. Some traditional media houses have also started exploring the area in a more focused manner, which interestingly, is being noticed by even the common readers.

Two events in the recent past have helped the cause in a big way.

One, of course, was the General Elections held in April – May 2014. India’s is the largest elections on earth, not just in terms of the size of the electorate but also in terms of number of political parties. India has more than thousand political parties, out of which about 60 are recognized national and state parties. That makes analyzing vote shares and linking that to seats won fairly complex and interesting. With the Election Commission sharing raw data, we saw a lot of good analysis this time. DataJourno carried a round-up of election coverage here.

The other was release of crime data by National Crime Records Bureau (NCRB). Though NCRB has been sharing this data for many years now, thanks to the growing awareness about data analysis, almost all newspapers did multiple stories this time, analyzing the data. Crime against women and regional trends in crime dominated the coverage.

Here is a round-up of some of the data journalism initiatives in India. These are among the most noticeable efforts, though the list is not exactly comprehensive. One clarification: there are quite a few other sites that have fairly decent content based on analyzing data. But they are not really journalistic stories, for there are no ‘stories’ in most of them. In fact, that is a big confusion that exists in data journalism—what is journalism and what is not. But then, that is a topic by itself and is not restricted to India. So, we will keep that for another day.

Here is the list, with examples wherever possible.

In addition, two other newspapers must be mentioned for their data journalism efforts, though they do not call it by that name. Mint, a business newspaper and Times of India, India’s largest selling English newspaper.  Mint was the first newspaper to start visualizing stories much before the excitement about data journalism started. It also does a number of data analysis stories but they are restricted to mostly macroeconomics, not of immense interest to the lay readers. The Times of  India, has started a regular section in its print version, called STATOITICS (TOI is a shorter version of its full name), where it presents interesting data through simple visualization.

The trend is new but is surely catching up. One challenge, though, is that number crunchers who can write some English are posing as data journalists, taking advantage of lack of presence of real journalists, many of whom are intimidated by numbers. So, instead of being the hot new area within journalism, data journalism has ended up becoming a poor cousin of data science and analytics.

DJ Showcase: (8 July 2014)

Four charts that explain why we don’t need a separate rail budget

In yet another great example of data journalism where data/charts (and not so beautiful, eye-catching ones at that) have been used arguing a point, shows why we do not need a separate rail budget. The underlying logic builds on the fact that Railways is neither a big expenditure head nor the most dominant force in its area: surface transport.


How Indian states are doing in terms of HDI…in one chart

With Narendra Modi making governance the central election issue in the 2014 General Elections, the debate, after long time, focused on development. While the BJP leader and current prime minister highlighted key economic parameters such as investment, industrial growth, power generation, per capita income etc, his critics pointed out to relatively poor performance in social parameters, measured globally by United Nation Development Program (UNDP)’s human development index (HDI). With two of the world’s most well-known economists, Jagdish Bhagwati and Amrtya Sen joining the debate on what constitutes good governance, the global community took notice.

As it is, truth is rarely black and white. While Modi made some tall claims about economic parameters, most notably by highlighting absolute parameters (which have always been higher for Gujarat, even before Modi) and not the delta during his period, his critics, while pointing to HDI figures did just the reverse. They chose to ignore the fact that Gujarat still was among the upper half of the Median when it came to HDI; it was only when one compares it with its ranking in terms of economic parameters that it looks pale.

The chart here shows absolute HDI 2007-08 (latest available) on the x-axis and growth between 1999-2000 and 2007-08 on the y-axis.

HDI in India

Without considering India’s overall performance in HDI, which remains low, the pure comparison among states, shows some definite trends.  In the chart, the lines show the median values and not the average values.

  1. The overall news is good, with most low HDI states such as Odisha, Bihar, UP, Chhatisgarh and Madhya Pradesh registering good growths. Expectedly, Kerala, Maharashtra, Punjab, Harya and Gujarat show a lower growth.
  2. Only one area that clearly falls in Quardrant I: North East.  It is, by definition, the only star, though Karnataka and Tamil Nadu too have good performance.
  3. Kerala, with an HDI that is ahead of Russia, Malaysia, Kuwait and Saudi Arbaia, has a growth slightly lower but it is still a good performance, considering how poorly the other two high HDI states have fared in terms of their growth. Delhi is actually the only entity to show a negative growth while Goa too shows a sluggish growth.
  4. Uttarakhand and Jharkhand, two younger states, clearly outshine everyone else in terms of growth. Uttarakhand is fast moving to Quardrant I.
  5. Rajasthan is the only state that is clearly falling behind. West Bengal too is not catching up with the rest of the Eastern brigade such as Bihar, Odisha, NE, Assam and Jharkhand which are clearly on the growth track now.
  6. Modi’s Gujarat, is clearly below the median growth and falls on the median line of absolute HDI. With a position of 11 among 23 states/UTs (NE is treated as one), it is clearly not a good showing. But it is not absolutely pathetic as many claim.


The Contours of A Mobile World

ITU has released its latest ICT indicators database for June 2014. This is updated with 2013 numbers for most indicators.

According to the data, global mobile subscriptions will be just short of 7 billion mark in 2014, standing at 6.9 billion. The total mobile subscriptions in 2013 stood at 6.6 billion.

That is as many as 93.1 mobiles per every 100 inhabitants. The figure will go up to 95.5 in 2014, according to ITU.

A little analysis reveals interesting insights.

  • Among the ten countries with highest mobile subscriptions base, six are in Asia.
  • Top ten countries account for 58% of the total mobile subscriptions
  • Chinese regions (China, Hong Kong, Taiwan and Macao) and the Indian subcontinent (India, Pakistan, Bangladesh, Nepal, Sri Lanka, Bhutan and Maldives) together account for 37% of the world’s mobile subscriptions. 
  • The world has 32 mobile broadband connections for 100 inhabitants. The developed-developing country divide is significant here. In the developed world, the figure stands at 83.7 while in the developing world, it stands at just 21.1.

Here are some important figures regarding mobile telephony.

Growth in global mobile subscriptions continues, with no signs of either slowing down or accelerating.

The mobile densities across countries vary significantly, though. Here is a world map depicting number of mobile subscriptions per 100 inhabitants. Those with dark red have the lowest density and those with dark green have the highest density of mobile subscribers.  Just hover over the country to get the absolute value.

However, two countries are dominant leaders: China and India. With Indonesia almost catching up with the USA, it is a matter of few months before the three largest mobile countries would be in Asia.

Some countries have seen significant growth during the last 12 years, when the world changed to a mobile world. Here is a list of countries which recorded the maximum growth between 2001 to 2013.

And this is how the mobile density of world’s largest emerging economies (BRICS) grew.

India’s Supercomputing Stature Continues to Slide

The half-yearly list (June 2014) of the world’s fastest supercomputers, maintained by is out. India, which at one time, was home to the 4th fastest supercomputer in the world, continues with its downward journey in the ranking.

The country’s fastest supercomputer, an IBM-based system at the Indian Institute of Tropical Meteorology, ranks a lowly 52nd rank globally. The same system, a year back, was ranked 36th fastest in the world. With with no improvement in its LINPACK performance, which is used to rank these systems, its rank dropped to 44 in the November 2013 list, before dropping further in the latest list.

The chart shows the highest ranks an Indian supercomputer obtained in all the successive lists starting with November 2003, when a supercomputer at Tata Sons, occupied the 4th rank globally.


Supercomputers India Rank

The top slot in the recent list, released on Monday, 23rd June 2014, is taken by a Chinese supercomputer. China, which has 76 supercomputers in the list of top 500, has continued to climb the ranking, both in terms of number of supercomputers in the list, as well as highest performance/rank. Here is a comparison of how India and China have fared in the list. The parameter is number of supercomputers in the list of top 500.


Supercomputers India China

Here are the top countries in the list, in terms of number of sites featuring in the list


top supercomoputing countries