Data Journalism

First Data Journalism Bootcamp Starts in Delhi

A 3-day data journalism bootcamp, co-hosted by the International Center for Journalists, the Hindustan Times, Hacks/Hackers New Delhi, Data{Meet} and the 9.9 School of Communication started on Friday in the National Capital Region of Delhi, India. The program—the first of its kind—has managed to attract a mix of veteran data journalists, young media professionals, data science enthusiasts, developers and designers.

More than 60 participants are participating in the workshop, which is seeing some good cross fertilization of ideas among journalists from different types of media and techies. The speakers and mentors are the who’s who of Indian open data/data journalism community, while some of the most well-known data journalists and professionals are participating in the program.

Over the next two days, the participants will work on real life stories involving data.

David Lemayian from Code for Africa giving a few smart tips



Data Journalism in India: Nascent but noticeable

Shyamanuja Das

In a well-discussed (and well-tweeted) article published on the Global Investigative Journalism Network website, India’s Media – Missing the Data Journalism Revolution recently, journalist and academician, Priya Rajasekar, argues that Indian media, by and large, is still to wake up to the opportunity of data journalism.

Probably the first in-depth analysis on the subject in India, the article is fairly complete in terms of capturing the viewpoints of the entire spectrum of stakeholders—the academicians, the practitioners and other influencers. It even goes to explore the reasons behind what it calls the Indian media’s “not subscribing to the idea (of data journalism)”.

The basic assumption — that Indian media has not really taken to data journalism seriously — is not exactly way off the mark, if one takes into account only the traditional media. India surely does not have the likes of a Guardian Datablog and NYT Upshot.

But then, how many of the traditional media brands even in the developed markets have such initiatives? India’s The Hindu  actually has a dedicated section on data stories, though its nowhere near Guardian and NYT sites.

There are online ventures, though. Though not quite the Vox and FiveThirtyEight of India, some of them are making an impact. A few are dedicated to data journalism, while others are news and analysis sites but do have a few good data journalism stories. Some traditional media houses have also started exploring the area in a more focused manner, which interestingly, is being noticed by even the common readers.

Two events in the recent past have helped the cause in a big way.

One, of course, was the General Elections held in April – May 2014. India’s is the largest elections on earth, not just in terms of the size of the electorate but also in terms of number of political parties. India has more than thousand political parties, out of which about 60 are recognized national and state parties. That makes analyzing vote shares and linking that to seats won fairly complex and interesting. With the Election Commission sharing raw data, we saw a lot of good analysis this time. DataJourno carried a round-up of election coverage here.

The other was release of crime data by National Crime Records Bureau (NCRB). Though NCRB has been sharing this data for many years now, thanks to the growing awareness about data analysis, almost all newspapers did multiple stories this time, analyzing the data. Crime against women and regional trends in crime dominated the coverage.

Here is a round-up of some of the data journalism initiatives in India. These are among the most noticeable efforts, though the list is not exactly comprehensive. One clarification: there are quite a few other sites that have fairly decent content based on analyzing data. But they are not really journalistic stories, for there are no ‘stories’ in most of them. In fact, that is a big confusion that exists in data journalism—what is journalism and what is not. But then, that is a topic by itself and is not restricted to India. So, we will keep that for another day.

Here is the list, with examples wherever possible.

In addition, two other newspapers must be mentioned for their data journalism efforts, though they do not call it by that name. Mint, a business newspaper and Times of India, India’s largest selling English newspaper.  Mint was the first newspaper to start visualizing stories much before the excitement about data journalism started. It also does a number of data analysis stories but they are restricted to mostly macroeconomics, not of immense interest to the lay readers. The Times of  India, has started a regular section in its print version, called STATOITICS (TOI is a shorter version of its full name), where it presents interesting data through simple visualization.

The trend is new but is surely catching up. One challenge, though, is that number crunchers who can write some English are posing as data journalists, taking advantage of lack of presence of real journalists, many of whom are intimidated by numbers. So, instead of being the hot new area within journalism, data journalism has ended up becoming a poor cousin of data science and analytics.

Is opinion superior?

It is difficult to understand the fascination for the words “column” and “opinion” among Indian journalists, as compared to “stories” and “reports”.  Many think if you get an “opportunity” to write an opinion column, you have arrived as a journalist.

This perceived sense of superiority of “opinion” often makes media pass off good reporting and even data analysis as opinion. This piece in Mint, Why India’s sanitation crisis is a public health emergency, is a fairly good example of data journalism, which tries to corelate India’s widespread practice of open defecation with malnutrition. The accompanying map too is a fairly good, if not extraordinary, visualization.

But why the hell should it be labeled as opinion? Is it to give it that supposed importance or is there no other sections that the editors can fit it into?

In fact, data journalism is not as new or rare as we think it is in India. Stories like these are actually data journalism pieces. Just that many publications do not realize it.



Data Journalism: Why definitions matter as much as the numbers…

Shyamanuja Das

Data journalism is journalism first—and last. And there should not be any doubt in anyone’s mind about that.

Even as we celebrate the increased access to authentic data and availability of great anlaytics and visualization tool that has given a lot of power to the journalist community, we must not forget that the basic premises of journalism still stand. We must tell good stories. And we must question.

One of the most important questions about any data is the exact definition of what that data actually represents. One can say XXX is India’s largest e-commerce company. But what does “largest” mean? Highest revenue? Maximum number of users who buy from their site? Maximum number of transactions? It also depends on where the story appears and only a journalist knows what her readers would naturally assume it to be. For example, in a site like, the readers will assume the parameter to be either valuation or revenue; in a Times of India, few readers will naturally think about valuation.

The above may be an over-simplified example. In many cases, the fine prints need to be read carefully to question the data, especially when it looks counter-intuitive. There lies the irony. While is it true that the more counter-intuitive is the conclusion, the bigger is the story; it is also true that the more counter intuitive is the finding, the harder one must question. And you come back to the basics—nothing in life comes easy; surely not a good story. Data journalism or no data journalism.

There is an excellent example in today’s Times of India. In a story, India over-reporting green cover, the report points out that the flaw may lie with the definition of what is called forest. “A large area that the government has been including under the forest category actually comprises commercial plantations, including those for coffee, arecanut, cashew, rubber, fruit orchards, parks and gardens,” the story says, quoting researchers from Indian Institute of Science Bangalore. This, the researchers attribute to the definition of forest cover by the Forest Survey of India (FSI). It defines forest cover to be “all lands more than one hectares in area, with tree canopy density of more than 10%, irrespective of ownership and legal status”. This definition could well mean that man-made forests or monocultures (farmland used to grow only one type of crop) are being considered forests, it says.

If true, it challenges a basic fact all of us have believed: that India’s forest cover is growing. Over the last few years, media has reported this in a celebratory tone. Here is such a story published in Times of India in 2009: India’s forest cover rises to over 21%.

Here, no one had questioned it even if it looked a little counter intuitive, till the IISc researchers did. To prove their point, they have even given data on what exactly is the area covered by plantations and orchads, though it is not exactly clear from the  story whether the FSI has actually included these areas in its survey.

While this is an example of how questioning the definition has brought out the truth, you need not go further than the same day’s Times of India to find an example where these basic questions have not been considered. Take the story, Indian B-School graduates get jobs easily. It claims, quoting a survey by Graduate Management Admission Council, which conducts GMAT, that 92% of Indian management students had an offer of employment. The survey was conducted among 2014 batch of students.

So, what do you make out of the story? That 92% of management graduates in India land up with a job. Now, take a look at the data from the All India Survey of Higher Education, which was the basis for the preceding post in this site, And you thought an MBA degree is so exclusive. According to it, as many as 5.6 lakh students enrolled in management programs in 2010-11, the latest year for which the data is available. In 2012-13, which was the enrollment year for 2014 batch, that number must have been more. Even if we assume that it is the same as it was in 2010-11 (approx 5.5 lakh) and further assume that only 70% would complete it, going by the 92% figure, it means more than 3.5 lakh of those will land up with a job.

Now, let us look at how the Graduate Management Admission Council arrive at this 92% figure? By doing a survey among 111 universities in 20 countries. What is India’s share? We do not know, but it is safe to assume that it cannot be more than 10-12 at best. And which are these institutes/universities? Are they representative of all of India’s management schools? Or are they just the  tier-1 schools like the IIMs, ISB and FMS?

These are questions that must be asked. While the survey may be right in its own way if it says it is true only about tier 1 schools, the story does not even vaguely mentions that—such as “top B schools”. Without that, it means that it is true for whole of Indian management schools.

Exactly the kind of stuff that the mushrooming private management schools in India want to quote while selling a dream to unsuspecting students and parents.

This is the danger of relying on data without questioning what that data represents. Data today surely means more credibility. But it will soon lose that credibility if it is not questioned, not understood or not put in the proper context. Nice visualizations cannot compensate for lack of authenticity and context.

Data journalism is not so much about data as it is about journalism.

[Shyamanuja Das is a former editor and is currently a director at market research firm, Juxt.  He advises businesses, investors and marketers on effective use of public data and teaches data journalism. He is a co-founder of DataJourno


Welcome to DataJourno

Welcome to DataJourno, a small but sincere initiative to popularize data journalism in India.

Data journalism is an idea whose time has come.

As a concept, it is certainly not new. For years, journalists have used data to support their stories, to analyze trends and once in a while, to create hypotheses to probe. The stories created through those means have been as popular and successful as any other story.

But a few things have changed in recent years which have raised a renewed interest in data journalism.

The most practical reason is the availability of tools that makes handling data far easier. Earlier, only those journalists who had a mastery over numbers could play with the data. Areas of journalism where this was important in any case—such as business journalism—saw fairly good examples of data journalism. This was also the reason why, in the minds of other journalists, data journalism came to be strongly associated with business journalism. But with the availability of easy-to-use tools these days, a journalist need not be an expert mathematician or statistician to analyze data and find out interesting trends. That suddenly makes data journalism a viable tool for any journalist.

Another reason which has raised interest in data journalism, of late, is the availability of plenty of data. Again, earlier, it is only listed business firms that released data. With the open data movement getting stronger and stronger, governments across the world are releasing far more data (mostly online these days). More importantly, those data are comparatively recent and can be meaningfully used. This makes the work of journalists far easier. They can focus on their core area—that is getting a scoop or analyzing a trend, based on their understanding of the area—without taking rounds of government ministries and agencies, just to get a report.

The third  phenomenon is the rise of social media. Social media combines the anecdotal with statistical. Earlier, a reporter spoke to a handful of people or media houses assigned time-consuming costly research work to market research agencies. Today, thanks to Twitter, Facebook, SurveyMonkey andLinkedIn, a journalist can create a poll in minutes and can get a significant number of responses in a matter of days, even hours. And it hardly costs anything. In such dipsticks, the role of data analyst has to be played by the writer, as there is no agency involved.

All these have pushed the journalist to the midst of a lot of data. And they are not complaining.

But a word of caution here. Data journalism is not primarily about data. It is about journalism. Just as you need nose for stories/news in any journalism, so do you need in data journalism. You are not a great data journalist if you are an expert in data crunching—there are many in those analytics/consulting companies who do that perhaps far better than you—but if you can do great stories using data. The final measure is how good is the story, not how good is the data analysis. In that sense, it is not anything drastically different from any other tools of journalism.

Here are some popular misconceptions about data journalism.

  1. Data journalism is about lots of numbers. You may just quote a single number in the final story, as long as that number tells you a story. In fact, the best stories are often those that are not number heavy. But numbers often take you to the story. Or, they make or break your hypothesis.
  2. Data journalism is about business journalism and social/developmental journalism. Not necessarily. True, traditionally, business journalists have done most of the data based stories. With the rise of open data, most of the good examples that you find today are  about governance/development/social indicators, they are by no means the only areas. Here is an Indian example of a very different area. Who is the actor for whom the great singer Mohd Rafi has sung maximum number of songs? The perception suggests it must be Dilip Kumar or Shami Kapoor. But analyze data and you find it is neither; not for that matter, even Rajendra Kumar. It is, Johny Walker. Now, that is a story. And now, that is example of data journalism. It often shows you a truth that may be very very counter intuitive. Isn’t that what every journalist wishes to do?
  3. Data journalism is about open data. A lot of good examples of data journalism that we are seeing today are those which use data from government and multilateral development agencies, that are made available to all proactively. That makes many associate data journalism with open data. Even Wikipedia, in it definition of data driven journalism refers to that. But that should not be the case. Data journalism should not concern itself with the source of the data. It is about a set of practices and tools.
  4. It is journalism with a cause. With the rise of open data, many NGOs and activists have used data and data analysis to bring accountability for public servants even initiate action in some areas, long ignored.  Many of them have used traditional media/own media to do good stories to argue their case and further the cause of development and governance. That is all very good. But the definition of data journalism should not be restricted by that. All that data journalism—or for that matter, any journalism—should strive for is a good story. Nothing more, nothing less.

Data journalism is here to stay. Hope, DataJourno will contribute in its own small way to further the cause.

Here are some thing what we plan to help in, as part of the community. We are looking for more ideas and suggestions.

  1. Showcase good examples of data journalism from Indian media
  2. Recognize the best among them
  3. Help journalists and students of journalism acquire skills in data journalism and data visualization by collaborating with employers and journalism schools
  4. Work actively with major sources of data including the government, NGOs, industry bodies, and research and consulting firms to make their data available to journalists
  5. Disseminate information about major happenings in data journalism globallly
  6. Make available resources to one and all in the community
  7. Help school children appreciate data and use visualization

The list would be modified based on your feedback and suggestions.