First Data Journalism Bootcamp Starts in Delhi

A 3-day data journalism bootcamp, co-hosted by the International Center for Journalists, the Hindustan Times, Hacks/Hackers New Delhi, Data{Meet} and the 9.9 School of Communication started on Friday in the National Capital Region of Delhi, India. The program—the first of its kind—has managed to attract a mix of veteran data journalists, young media professionals, data science enthusiasts, developers and designers.

More than 60 participants are participating in the workshop, which is seeing some good cross fertilization of ideas among journalists from different types of media and techies. The speakers and mentors are the who’s who of Indian open data/data journalism community, while some of the most well-known data journalists and professionals are participating in the program.

Over the next two days, the participants will work on real life stories involving data.

David Lemayian from Code for Africa giving a few smart tips




Which state is India’s biggest drinking state?

Data journalism is journalism first—and last. Basic principles of good journalism applies to data journalism as well. One of those fundamental principles is to check credibility of information. It starts with knowing where to get the most authentic information on a particular topic. Yes, even in the days of Google.

This story in The Hindu,  by Rukmini S, one of the very, very few practicing data journalists in India, beautifully illustrates that.

The topic is alcohol consumption in different states. And the context is Kerala’s decision to move towards prohibition. Some basic research, as the euphemism goes for Google search, convinced the media that Kerala is indeed the top per capita alcohol consuming state in India. And what can you beat it? The top drinking state heading towards prohibition…

But is Kerala really India’s most drinking state? It took a real data journalist to ask that question and bust the myth.

As Kerala takes the first steps towards prohibition, here’s a question: is Kerala really India’s biggest drinker? The media sure seems to think so; here’s the Times of India, saying so today (but giving no source), The Indian Expresssaid it in 2008 but the source study is nowhere on the internet and the Economist said so in 2013citing a Kerala-based advocacy group director. Various other reports cite Kerala’s 2008 Economic Review but this isn’t available online either.

Anyone who has any interest in tracking consumption pattern in India would know that the biggest agency that tracks that info is National Sample Survey Office (NSSO) under Ministry of Statistics and Programme Implementation, through its various “rounds” of surveys. Rukmini used that data to prove that it is not Kerala but Andhra Pradesh which is the biggest drinker.

Despite being a great reminder of what should not be passed off as data journalism, the story fails to excite. A simple and direct headline like “And you thought Kerala is the biggest drinker” could have been far more direct than a text-bookish headline like “India’s biggest drinkers.”

Nevertheless, it assures. That data journalism in India is in good hands.

DJ Showcase: (22 August 2014)

This one chart tells you everything you need to know about the state of Indian cricket

In yet another brilliant data journalism story, tells us the problem of Indian cricket in just one chart. Of course, you will appreciate the analysis a little better, if you are above 40. The chart shows that the cribbing that the 40-pluses do about T20 ruining test cricket may actually be right.

This chart is the gist of the story, This one chart tells you everything you need to know about the state of Indian cricket. 

Well, the chart pretty much tells you the entire story, if you know basics of cricket.   We like the story because of a number of reasons.

  1. It is application of data journalism in an area that that interests a large section of people in India. The story puts to rest a debate: whether or not the glamorous T20 format is impacting test cricket.  Whats more, it is extremely topical in the wake of India’s disastrous test series in England.
  2. You do not have to read the text at all. If you are tuned in to contemporary cricket, you get the story right away from the chart.
  3. It busts the myth that you need a lot of data analysis in the background to produce a good data journalism story. In this case, it is just simple addition, which a class II student can do.

In our showcase series on good data journalism stories, has clearly come out on top so far, even though it does not claim to be a data journalism site per se. It proves what DataJourno believes as its core philosophy: a good data journalism story must be a good story first. How much of data it contains or how great the visualization looks are not what determine the quality of a data journalism story.


Data Journalism in India: Nascent but noticeable

Shyamanuja Das

In a well-discussed (and well-tweeted) article published on the Global Investigative Journalism Network website, India’s Media – Missing the Data Journalism Revolution recently, journalist and academician, Priya Rajasekar, argues that Indian media, by and large, is still to wake up to the opportunity of data journalism.

Probably the first in-depth analysis on the subject in India, the article is fairly complete in terms of capturing the viewpoints of the entire spectrum of stakeholders—the academicians, the practitioners and other influencers. It even goes to explore the reasons behind what it calls the Indian media’s “not subscribing to the idea (of data journalism)”.

The basic assumption — that Indian media has not really taken to data journalism seriously — is not exactly way off the mark, if one takes into account only the traditional media. India surely does not have the likes of a Guardian Datablog and NYT Upshot.

But then, how many of the traditional media brands even in the developed markets have such initiatives? India’s The Hindu  actually has a dedicated section on data stories, though its nowhere near Guardian and NYT sites.

There are online ventures, though. Though not quite the Vox and FiveThirtyEight of India, some of them are making an impact. A few are dedicated to data journalism, while others are news and analysis sites but do have a few good data journalism stories. Some traditional media houses have also started exploring the area in a more focused manner, which interestingly, is being noticed by even the common readers.

Two events in the recent past have helped the cause in a big way.

One, of course, was the General Elections held in April – May 2014. India’s is the largest elections on earth, not just in terms of the size of the electorate but also in terms of number of political parties. India has more than thousand political parties, out of which about 60 are recognized national and state parties. That makes analyzing vote shares and linking that to seats won fairly complex and interesting. With the Election Commission sharing raw data, we saw a lot of good analysis this time. DataJourno carried a round-up of election coverage here.

The other was release of crime data by National Crime Records Bureau (NCRB). Though NCRB has been sharing this data for many years now, thanks to the growing awareness about data analysis, almost all newspapers did multiple stories this time, analyzing the data. Crime against women and regional trends in crime dominated the coverage.

Here is a round-up of some of the data journalism initiatives in India. These are among the most noticeable efforts, though the list is not exactly comprehensive. One clarification: there are quite a few other sites that have fairly decent content based on analyzing data. But they are not really journalistic stories, for there are no ‘stories’ in most of them. In fact, that is a big confusion that exists in data journalism—what is journalism and what is not. But then, that is a topic by itself and is not restricted to India. So, we will keep that for another day.

Here is the list, with examples wherever possible.

In addition, two other newspapers must be mentioned for their data journalism efforts, though they do not call it by that name. Mint, a business newspaper and Times of India, India’s largest selling English newspaper.  Mint was the first newspaper to start visualizing stories much before the excitement about data journalism started. It also does a number of data analysis stories but they are restricted to mostly macroeconomics, not of immense interest to the lay readers. The Times of  India, has started a regular section in its print version, called STATOITICS (TOI is a shorter version of its full name), where it presents interesting data through simple visualization.

The trend is new but is surely catching up. One challenge, though, is that number crunchers who can write some English are posing as data journalists, taking advantage of lack of presence of real journalists, many of whom are intimidated by numbers. So, instead of being the hot new area within journalism, data journalism has ended up becoming a poor cousin of data science and analytics.

Data Journalism: Why definitions matter as much as the numbers…

Shyamanuja Das

Data journalism is journalism first—and last. And there should not be any doubt in anyone’s mind about that.

Even as we celebrate the increased access to authentic data and availability of great anlaytics and visualization tool that has given a lot of power to the journalist community, we must not forget that the basic premises of journalism still stand. We must tell good stories. And we must question.

One of the most important questions about any data is the exact definition of what that data actually represents. One can say XXX is India’s largest e-commerce company. But what does “largest” mean? Highest revenue? Maximum number of users who buy from their site? Maximum number of transactions? It also depends on where the story appears and only a journalist knows what her readers would naturally assume it to be. For example, in a site like, the readers will assume the parameter to be either valuation or revenue; in a Times of India, few readers will naturally think about valuation.

The above may be an over-simplified example. In many cases, the fine prints need to be read carefully to question the data, especially when it looks counter-intuitive. There lies the irony. While is it true that the more counter-intuitive is the conclusion, the bigger is the story; it is also true that the more counter intuitive is the finding, the harder one must question. And you come back to the basics—nothing in life comes easy; surely not a good story. Data journalism or no data journalism.

There is an excellent example in today’s Times of India. In a story, India over-reporting green cover, the report points out that the flaw may lie with the definition of what is called forest. “A large area that the government has been including under the forest category actually comprises commercial plantations, including those for coffee, arecanut, cashew, rubber, fruit orchards, parks and gardens,” the story says, quoting researchers from Indian Institute of Science Bangalore. This, the researchers attribute to the definition of forest cover by the Forest Survey of India (FSI). It defines forest cover to be “all lands more than one hectares in area, with tree canopy density of more than 10%, irrespective of ownership and legal status”. This definition could well mean that man-made forests or monocultures (farmland used to grow only one type of crop) are being considered forests, it says.

If true, it challenges a basic fact all of us have believed: that India’s forest cover is growing. Over the last few years, media has reported this in a celebratory tone. Here is such a story published in Times of India in 2009: India’s forest cover rises to over 21%.

Here, no one had questioned it even if it looked a little counter intuitive, till the IISc researchers did. To prove their point, they have even given data on what exactly is the area covered by plantations and orchads, though it is not exactly clear from the  story whether the FSI has actually included these areas in its survey.

While this is an example of how questioning the definition has brought out the truth, you need not go further than the same day’s Times of India to find an example where these basic questions have not been considered. Take the story, Indian B-School graduates get jobs easily. It claims, quoting a survey by Graduate Management Admission Council, which conducts GMAT, that 92% of Indian management students had an offer of employment. The survey was conducted among 2014 batch of students.

So, what do you make out of the story? That 92% of management graduates in India land up with a job. Now, take a look at the data from the All India Survey of Higher Education, which was the basis for the preceding post in this site, And you thought an MBA degree is so exclusive. According to it, as many as 5.6 lakh students enrolled in management programs in 2010-11, the latest year for which the data is available. In 2012-13, which was the enrollment year for 2014 batch, that number must have been more. Even if we assume that it is the same as it was in 2010-11 (approx 5.5 lakh) and further assume that only 70% would complete it, going by the 92% figure, it means more than 3.5 lakh of those will land up with a job.

Now, let us look at how the Graduate Management Admission Council arrive at this 92% figure? By doing a survey among 111 universities in 20 countries. What is India’s share? We do not know, but it is safe to assume that it cannot be more than 10-12 at best. And which are these institutes/universities? Are they representative of all of India’s management schools? Or are they just the  tier-1 schools like the IIMs, ISB and FMS?

These are questions that must be asked. While the survey may be right in its own way if it says it is true only about tier 1 schools, the story does not even vaguely mentions that—such as “top B schools”. Without that, it means that it is true for whole of Indian management schools.

Exactly the kind of stuff that the mushrooming private management schools in India want to quote while selling a dream to unsuspecting students and parents.

This is the danger of relying on data without questioning what that data represents. Data today surely means more credibility. But it will soon lose that credibility if it is not questioned, not understood or not put in the proper context. Nice visualizations cannot compensate for lack of authenticity and context.

Data journalism is not so much about data as it is about journalism.

[Shyamanuja Das is a former editor and is currently a director at market research firm, Juxt.  He advises businesses, investors and marketers on effective use of public data and teaches data journalism. He is a co-founder of DataJourno


Welcome to DataJourno

Welcome to DataJourno, a small but sincere initiative to popularize data journalism in India.

Data journalism is an idea whose time has come.

As a concept, it is certainly not new. For years, journalists have used data to support their stories, to analyze trends and once in a while, to create hypotheses to probe. The stories created through those means have been as popular and successful as any other story.

But a few things have changed in recent years which have raised a renewed interest in data journalism.

The most practical reason is the availability of tools that makes handling data far easier. Earlier, only those journalists who had a mastery over numbers could play with the data. Areas of journalism where this was important in any case—such as business journalism—saw fairly good examples of data journalism. This was also the reason why, in the minds of other journalists, data journalism came to be strongly associated with business journalism. But with the availability of easy-to-use tools these days, a journalist need not be an expert mathematician or statistician to analyze data and find out interesting trends. That suddenly makes data journalism a viable tool for any journalist.

Another reason which has raised interest in data journalism, of late, is the availability of plenty of data. Again, earlier, it is only listed business firms that released data. With the open data movement getting stronger and stronger, governments across the world are releasing far more data (mostly online these days). More importantly, those data are comparatively recent and can be meaningfully used. This makes the work of journalists far easier. They can focus on their core area—that is getting a scoop or analyzing a trend, based on their understanding of the area—without taking rounds of government ministries and agencies, just to get a report.

The third  phenomenon is the rise of social media. Social media combines the anecdotal with statistical. Earlier, a reporter spoke to a handful of people or media houses assigned time-consuming costly research work to market research agencies. Today, thanks to Twitter, Facebook, SurveyMonkey andLinkedIn, a journalist can create a poll in minutes and can get a significant number of responses in a matter of days, even hours. And it hardly costs anything. In such dipsticks, the role of data analyst has to be played by the writer, as there is no agency involved.

All these have pushed the journalist to the midst of a lot of data. And they are not complaining.

But a word of caution here. Data journalism is not primarily about data. It is about journalism. Just as you need nose for stories/news in any journalism, so do you need in data journalism. You are not a great data journalist if you are an expert in data crunching—there are many in those analytics/consulting companies who do that perhaps far better than you—but if you can do great stories using data. The final measure is how good is the story, not how good is the data analysis. In that sense, it is not anything drastically different from any other tools of journalism.

Here are some popular misconceptions about data journalism.

  1. Data journalism is about lots of numbers. You may just quote a single number in the final story, as long as that number tells you a story. In fact, the best stories are often those that are not number heavy. But numbers often take you to the story. Or, they make or break your hypothesis.
  2. Data journalism is about business journalism and social/developmental journalism. Not necessarily. True, traditionally, business journalists have done most of the data based stories. With the rise of open data, most of the good examples that you find today are  about governance/development/social indicators, they are by no means the only areas. Here is an Indian example of a very different area. Who is the actor for whom the great singer Mohd Rafi has sung maximum number of songs? The perception suggests it must be Dilip Kumar or Shami Kapoor. But analyze data and you find it is neither; not for that matter, even Rajendra Kumar. It is, Johny Walker. Now, that is a story. And now, that is example of data journalism. It often shows you a truth that may be very very counter intuitive. Isn’t that what every journalist wishes to do?
  3. Data journalism is about open data. A lot of good examples of data journalism that we are seeing today are those which use data from government and multilateral development agencies, that are made available to all proactively. That makes many associate data journalism with open data. Even Wikipedia, in it definition of data driven journalism refers to that. But that should not be the case. Data journalism should not concern itself with the source of the data. It is about a set of practices and tools.
  4. It is journalism with a cause. With the rise of open data, many NGOs and activists have used data and data analysis to bring accountability for public servants even initiate action in some areas, long ignored.  Many of them have used traditional media/own media to do good stories to argue their case and further the cause of development and governance. That is all very good. But the definition of data journalism should not be restricted by that. All that data journalism—or for that matter, any journalism—should strive for is a good story. Nothing more, nothing less.

Data journalism is here to stay. Hope, DataJourno will contribute in its own small way to further the cause.

Here are some thing what we plan to help in, as part of the community. We are looking for more ideas and suggestions.

  1. Showcase good examples of data journalism from Indian media
  2. Recognize the best among them
  3. Help journalists and students of journalism acquire skills in data journalism and data visualization by collaborating with employers and journalism schools
  4. Work actively with major sources of data including the government, NGOs, industry bodies, and research and consulting firms to make their data available to journalists
  5. Disseminate information about major happenings in data journalism globallly
  6. Make available resources to one and all in the community
  7. Help school children appreciate data and use visualization

The list would be modified based on your feedback and suggestions.