Month: May 2014

Which Lok Sabha is more qualified: this one or last one?

Some debates never end. Not just the Mumbai versus Delhi types. But even the knowledge versus action types.

Take for instance, the latest controversy surrounding HRD minister Smriti Irani’s qualification—or rather the alleged lack of it. She proclaims that she should be judged by her work and not qualification and there are quite a few takers for her stance. It seems action is winning this round of battle over knowledge. Not exactly surprising, considering the new prime minister considers and markets himself as a karma yogi.

But does it mean that the current (16th) Lok Sabha lags behind the previous one, when it comes to qualification of its MPs?

Does not look so, if you consider that this Lok Sabha has 33 PhDs as compared to just 9 in the last one and even has slight edge in number of post graduates. But when it comes to graduates and above, it is a little behind.

In the 15th Lok Sabha, every 4 out of 5 members was a graduate or above. That is 80% of the members. The figure is a little less at 76% for the current Lok Sabha.

This Lok Sabha is also a little heavier at both ends. On one hand, the PhDs and post graduates together account for 34% of the total members as compared to 28% in last Lok Sabha. On the other, the share of such members who have not studies beyond class 10th too is more at 13%, as compared to 10% in last Lok Sabha. The comparison is not too different even if you add class 12th pass outs.

Yes, it is the middle (the graduates) that ruled the last Lok Sabha. As many as 52% of all members were just graduates in the last Lok Sabha; that number is drastically lower at 42% in the current Lok Sabha.


And here are how the major parties stack up when it comes to how qualified their MPs are. The figures represent the percentage of MPs who are graduates and above.


Yes, regional parties like TMC, ADMK and BJD are at top while some other regional parties like Shiv Sena and TDP are at bottom.



Which players have actually taken the most “runs” in IPL?

Who are the most hardworking among the successful batsmen in the IPL 2014? Well, depends on how you define hardworking. One way we did is to actually see how much they have run between the wickets?

We are sure, you will not be surprised. Among those who have been successful (scored 200 or more runs) in IPL 2014, only five have got more runs by running between wickets compared to what they have scored through boundaries. This is till the end of league matches. The qualifiers and the eliminator have not been included.

So, here they are: Manish Pandey, du Plessis, Raydu, Rahane, and Gambhir. And you thought Indians are lazy?



For the sake of completeness, who are the most prominent boundary heaters? Maxell, Dwayne Smith and Sehwag. No suprise here. Only these three batsmen have scored more than 200 runs and more than 70% of those runs have come from boundaries,



India has the third highest obese population….

…but its mostly because of our large population; the obesity rate is still not very high.

India has the third highest obese population in the world, according to a study by an international consortium of researchers led by the Institute for Health Metrics and Evaluation (IHME) at the University of Washington. Obesity is defined as having a BMI equal to or greater than 30.

But do not worry. Any absolute number would look big when it comes to India, because of India’s large population. In terms of percentage of obese people, India is almost at the bottom of the table, among large nations.

The study finds that more than 50% of the world’s 671 million obese live in 10 countries (ranked beginning with the countries with the most obese people): US, China, India, Russia, Brazil, Mexico, Egypt, Germany, Pakistan, and Indonesia. As you can see in the bar charts below, India has the lowest obesity rate among all of them as well as some other major countries that we have included, both among adults and children.


Children obesity

The study further finds that between 1980 and 2013, the prevalence of overweight/obese children and adolescents increased by nearly 50%. In 2013, more than 22% of girls and nearly 24% of boys living in developed countries were found to be overweight or obese. Rates are also on the rise among children and adolescents in the developing world, where nearly 13% of boys and more than 13% of girls are overweight or obese

While India’s good showing may be because of the ‘average’ effect (the country has a lot of malnourished children), it still ooks impressive as many of the other developing countries such as Egypt, South Africa and Sudan have higher obese people. Here is an infographic from the researchers.

Unfortunately, we have not found the story in any of the major Indian publication, at the time of writing this.


India has second-highest number of shadow entrepreneurs in the world: ToI

“India has the second highest number of shadow entrepreneurs in the world. For every business that is legally registered in India, there are 127 shadow businesses that are not,” says a report in Times of India, quoting a study of 68 countries led  by Professor Erkko Autio and Dr Kun Fu from Imperial College Business School, UK.

Shadow entrepreneurs are individuals who manage a business that sells legitimate goods and services but they do not register their businesses. This means that they do not pay tax, operating in a shadow economy where business activities are performed outside the reach of government authorities.

In short, it is a problem of law enforcement.

Indonesia has 131 unregistered businesses for every one legally registered business, while India has 127 such businesses for every legal business. UK has just one unregistered business for every 30 legally registered business.


Chart: DataJourno

DJ Showcase ( May 2014)

NOTA may have affected outcome in 19 constituencies

This is a simple but very effective example of data journalism, in the new Indian news site,

In a story,  NOTA may have affected outcome in 19 constituencies, the site has analyzed the voting data in the General Elections 2014 to come up with a list of 19 constituencies across India where the results may have been affected by NOTA (None of the above) choices introduced for the first time in this General Elections.

The authors Santosh Sunderesan and Nikita Saxena have analyzed the data to come with interesting insights, such as

  • most of these are rural constituencies
  • all of these are reserved constituencies
  • they all, with one exception,  recorded high voter turnout.

Since many of these are also the Naxalite hit areas, the question to ask is: if the ultra left militants had a role to play in this high NOTA preference?

Elections 2014: The Showcase for Data Journalism

If data journalism were a religion, national elections in a democracy would be its biggest festival. Never ever does mainstream media get so obsessed with data, data analysis and “deriving insights from data”, as it does during the elections season.

In India, the efforts stands out even more, as otherwise, there is very little data that journalists care about. Sure, the heavy economic parameters such as GDP growth or growth in industrial production are regular features in business media, but few normal people even glance through them; not even those who otherwise consumer regular business and corporate news.  The only category of people who have some fascination for numbers are the ardent cricket fans who follow cricket statistics.

What adds both color and complexity to the elections data in India is the country’s diversity: in its demographics, in issues and not to forget number of national and regional parties.

Color, quite literally. Just contrast, in your mind, two visualizations, a pie with two colors  representing Democrats and Republicans and another having at least 7-8 colors, representing BJP, Congress, AIADMK, TMC, BJD, Shiv Sena, TDP, TRS…you can drag it if you like.

With so many parties fighting it out, the vote share that each gets and how that translates into seats is recommended study for those wanting to understand paradoxes.  Add to that so many ways you can analyze the data: party-wise, region-wise, state-wise, constituency-wise, reserved constituency-wise, in terms of which party is ruling the state…and so on

Last week was that once-in-a-five-year festival. The Election results came on 16 May.

Not surprisingly, every TV channels, every newspaper or every online new portal worth its name had a dedicated data analysis section on its website. Here are links to a few among major media brands in India.

  • The Economic Times: The top business newspaper in India and one of the largest in Asia, has a old-fashioned tabular representation of the results. Of course, with so many other elements and ads, it is also extremely cluttered.


  •  Firstpost: A sleek site, great for those getting deeper into their own analysis. Scores on presentation, breadth of coverage, comprehensiveness, with fairly good ease of use. The only negative: it is a little intimidating for many.


  • The Hindu: Often hailed as the only general newspaper that is actively into data journalism, The Hindu’s election data page disappoints, despite having functionality. It is not at all intuitive; all that you are greeted with is a map of India.


  • The Hindustan Times: It has sleekness of design, all the functionality, and a fairly clean page. Yet, it is not intuitive to use. But the best among all major general newspapers.


  • IBN Live: Another cluttered site, with a tabular representation of overall results, in sharp contrast to the group site, Firstpost.



  •  India Today: A simple, clean website. Almost a clone of NDTV site, with the same way of visualization. But it is not as good in terms of presentation and even functionality


  • Mint: Mint, clearly well ahead of any other media in India, when it comes to data journalism, disappointed. That too after advertising heavily about its election coverage.



  • NDTV: By far the simplest presentation, strictly focused on election results and nothing else. Scores on ease to use as well as presentation
  • NDTVThe Times of India:  The Times of India clearly decided to play up the news and pushed the data to the bottom of the page. It had a simple, easy to understand format but with no functionality to drill down further.



Apart from the result coverage, there have been some interesting analyses of results by some of these media brands. Especially noteworthy is this analysis in Mint which creates a sweep index.

This chart, which calculates the seat share based on individual parties’ vote share, had there been  proportional representation, has drawn a lot of criticism, both from the supporters of Bharatiya Janata Party as well as from supporters of regional parties such as Biju Janata Dal, which swept their own states but whose shares of votes in the national votes cast is small. On the other hand, parties such as BSP , which drew a blank but have a fairly high vote share distributed all across tend to gain from this. Many have used this as an example to attack data journalism per se.

But it must be noted that it is not data journalism that is at fault here. The analysis is at odds with the reality of diversity in India and the framework of federated structure with provision for regional parties. A state like Odisha, in this method, will have one/two representatives in the Lok Sabha.

At the time of writing this, interesting insights based on analysis of data is still coming in. We will report anything interesting that comes in, either here or through our Twitter handle @datajournoin.


Welcome to DataJourno

Welcome to DataJourno, a small but sincere initiative to popularize data journalism in India.

Data journalism is an idea whose time has come.

As a concept, it is certainly not new. For years, journalists have used data to support their stories, to analyze trends and once in a while, to create hypotheses to probe. The stories created through those means have been as popular and successful as any other story.

But a few things have changed in recent years which have raised a renewed interest in data journalism.

The most practical reason is the availability of tools that makes handling data far easier. Earlier, only those journalists who had a mastery over numbers could play with the data. Areas of journalism where this was important in any case—such as business journalism—saw fairly good examples of data journalism. This was also the reason why, in the minds of other journalists, data journalism came to be strongly associated with business journalism. But with the availability of easy-to-use tools these days, a journalist need not be an expert mathematician or statistician to analyze data and find out interesting trends. That suddenly makes data journalism a viable tool for any journalist.

Another reason which has raised interest in data journalism, of late, is the availability of plenty of data. Again, earlier, it is only listed business firms that released data. With the open data movement getting stronger and stronger, governments across the world are releasing far more data (mostly online these days). More importantly, those data are comparatively recent and can be meaningfully used. This makes the work of journalists far easier. They can focus on their core area—that is getting a scoop or analyzing a trend, based on their understanding of the area—without taking rounds of government ministries and agencies, just to get a report.

The third  phenomenon is the rise of social media. Social media combines the anecdotal with statistical. Earlier, a reporter spoke to a handful of people or media houses assigned time-consuming costly research work to market research agencies. Today, thanks to Twitter, Facebook, SurveyMonkey andLinkedIn, a journalist can create a poll in minutes and can get a significant number of responses in a matter of days, even hours. And it hardly costs anything. In such dipsticks, the role of data analyst has to be played by the writer, as there is no agency involved.

All these have pushed the journalist to the midst of a lot of data. And they are not complaining.

But a word of caution here. Data journalism is not primarily about data. It is about journalism. Just as you need nose for stories/news in any journalism, so do you need in data journalism. You are not a great data journalist if you are an expert in data crunching—there are many in those analytics/consulting companies who do that perhaps far better than you—but if you can do great stories using data. The final measure is how good is the story, not how good is the data analysis. In that sense, it is not anything drastically different from any other tools of journalism.

Here are some popular misconceptions about data journalism.

  1. Data journalism is about lots of numbers. You may just quote a single number in the final story, as long as that number tells you a story. In fact, the best stories are often those that are not number heavy. But numbers often take you to the story. Or, they make or break your hypothesis.
  2. Data journalism is about business journalism and social/developmental journalism. Not necessarily. True, traditionally, business journalists have done most of the data based stories. With the rise of open data, most of the good examples that you find today are  about governance/development/social indicators, they are by no means the only areas. Here is an Indian example of a very different area. Who is the actor for whom the great singer Mohd Rafi has sung maximum number of songs? The perception suggests it must be Dilip Kumar or Shami Kapoor. But analyze data and you find it is neither; not for that matter, even Rajendra Kumar. It is, Johny Walker. Now, that is a story. And now, that is example of data journalism. It often shows you a truth that may be very very counter intuitive. Isn’t that what every journalist wishes to do?
  3. Data journalism is about open data. A lot of good examples of data journalism that we are seeing today are those which use data from government and multilateral development agencies, that are made available to all proactively. That makes many associate data journalism with open data. Even Wikipedia, in it definition of data driven journalism refers to that. But that should not be the case. Data journalism should not concern itself with the source of the data. It is about a set of practices and tools.
  4. It is journalism with a cause. With the rise of open data, many NGOs and activists have used data and data analysis to bring accountability for public servants even initiate action in some areas, long ignored.  Many of them have used traditional media/own media to do good stories to argue their case and further the cause of development and governance. That is all very good. But the definition of data journalism should not be restricted by that. All that data journalism—or for that matter, any journalism—should strive for is a good story. Nothing more, nothing less.

Data journalism is here to stay. Hope, DataJourno will contribute in its own small way to further the cause.

Here are some thing what we plan to help in, as part of the community. We are looking for more ideas and suggestions.

  1. Showcase good examples of data journalism from Indian media
  2. Recognize the best among them
  3. Help journalists and students of journalism acquire skills in data journalism and data visualization by collaborating with employers and journalism schools
  4. Work actively with major sources of data including the government, NGOs, industry bodies, and research and consulting firms to make their data available to journalists
  5. Disseminate information about major happenings in data journalism globallly
  6. Make available resources to one and all in the community
  7. Help school children appreciate data and use visualization

The list would be modified based on your feedback and suggestions.