Data journalism is journalism first—and last. And there should not be any doubt in anyone’s mind about that.
Even as we celebrate the increased access to authentic data and availability of great anlaytics and visualization tool that has given a lot of power to the journalist community, we must not forget that the basic premises of journalism still stand. We must tell good stories. And we must question.
One of the most important questions about any data is the exact definition of what that data actually represents. One can say XXX is India’s largest e-commerce company. But what does “largest” mean? Highest revenue? Maximum number of users who buy from their site? Maximum number of transactions? It also depends on where the story appears and only a journalist knows what her readers would naturally assume it to be. For example, in a site like moneycontrol.com, the readers will assume the parameter to be either valuation or revenue; in a Times of India, few readers will naturally think about valuation.
The above may be an over-simplified example. In many cases, the fine prints need to be read carefully to question the data, especially when it looks counter-intuitive. There lies the irony. While is it true that the more counter-intuitive is the conclusion, the bigger is the story; it is also true that the more counter intuitive is the finding, the harder one must question. And you come back to the basics—nothing in life comes easy; surely not a good story. Data journalism or no data journalism.
There is an excellent example in today’s Times of India. In a story, India over-reporting green cover, the report points out that the flaw may lie with the definition of what is called forest. “A large area that the government has been including under the forest category actually comprises commercial plantations, including those for coffee, arecanut, cashew, rubber, fruit orchards, parks and gardens,” the story says, quoting researchers from Indian Institute of Science Bangalore. This, the researchers attribute to the definition of forest cover by the Forest Survey of India (FSI). It defines forest cover to be “all lands more than one hectares in area, with tree canopy density of more than 10%, irrespective of ownership and legal status”. This definition could well mean that man-made forests or monocultures (farmland used to grow only one type of crop) are being considered forests, it says.
If true, it challenges a basic fact all of us have believed: that India’s forest cover is growing. Over the last few years, media has reported this in a celebratory tone. Here is such a story published in Times of India in 2009: India’s forest cover rises to over 21%.
Here, no one had questioned it even if it looked a little counter intuitive, till the IISc researchers did. To prove their point, they have even given data on what exactly is the area covered by plantations and orchads, though it is not exactly clear from the story whether the FSI has actually included these areas in its survey.
While this is an example of how questioning the definition has brought out the truth, you need not go further than the same day’s Times of India to find an example where these basic questions have not been considered. Take the story, Indian B-School graduates get jobs easily. It claims, quoting a survey by Graduate Management Admission Council, which conducts GMAT, that 92% of Indian management students had an offer of employment. The survey was conducted among 2014 batch of students.
So, what do you make out of the story? That 92% of management graduates in India land up with a job. Now, take a look at the data from the All India Survey of Higher Education, which was the basis for the preceding post in this site, And you thought an MBA degree is so exclusive. According to it, as many as 5.6 lakh students enrolled in management programs in 2010-11, the latest year for which the data is available. In 2012-13, which was the enrollment year for 2014 batch, that number must have been more. Even if we assume that it is the same as it was in 2010-11 (approx 5.5 lakh) and further assume that only 70% would complete it, going by the 92% figure, it means more than 3.5 lakh of those will land up with a job.
Now, let us look at how the Graduate Management Admission Council arrive at this 92% figure? By doing a survey among 111 universities in 20 countries. What is India’s share? We do not know, but it is safe to assume that it cannot be more than 10-12 at best. And which are these institutes/universities? Are they representative of all of India’s management schools? Or are they just the tier-1 schools like the IIMs, ISB and FMS?
These are questions that must be asked. While the survey may be right in its own way if it says it is true only about tier 1 schools, the story does not even vaguely mentions that—such as “top B schools”. Without that, it means that it is true for whole of Indian management schools.
Exactly the kind of stuff that the mushrooming private management schools in India want to quote while selling a dream to unsuspecting students and parents.
This is the danger of relying on data without questioning what that data represents. Data today surely means more credibility. But it will soon lose that credibility if it is not questioned, not understood or not put in the proper context. Nice visualizations cannot compensate for lack of authenticity and context.
Data journalism is not so much about data as it is about journalism.
[Shyamanuja Das is a former editor and is currently a director at market research firm, Juxt. He advises businesses, investors and marketers on effective use of public data and teaches data journalism. He is a co-founder of DataJourno]