Metadata, “data about the data”, is crucial to interpret data correctly and to draw valid conclusions from it. Below are 3 stories that show what can happen when metadata is lacking or when it is being ignored.

Foreign born population of Brussels

“62 percent of people in Brussels not born in Belgium” is a pretty spectacular statistic. It deserves its place in the headline of a newspaper article.

Source: 62 procent Brusselaars niet hier geboren, tijd.be

Source: 62 procent Brusselaars niet hier geboren, tijd.be

According to the article, Brussels is the city with the second highest percentage of residents born abroad, after Dubai with 83%, but before Toronto, Auckland and Sydney. Even considering Brussels being the headquarter of the European Union, this is still a strikingly high number.

The author of the article (me, Maarten Lambrechts, your trainer) found the statistic in a very trusted source: the World Migration Report 2015, published by the International Organisation for Migration from the UN. The report contains the following chart:

Source: World Migration Report 2015, iom.int

Source: World Migration Report 2015, iom.int

The story proved “too good to be true”, and the source mentioned below the chart (”Compiled by IOM from various sources”) should have been a red flag.

The source for the share of the foreign-born population in Brussels is the article Belgium: A Country of Permanent Immigration. That article mentions:

In the two biggest cities, demographic data is proof of the permanent diverse nature of Belgium: in Antwerp, nearly 38 percent of its population is of foreign origin, while approximately 18 percent have a foreign nationality; in Brussels, nearly 62 percent is of foreign origin and approximately 31 percent have a foreign nationality.

So the original source uses “of foreign origin”, and not “foreign-born”. The article does not mention the definition of “of foreign origin”, but a widely used definition in Belgium is that you are of foreign origin if you have at least one parent who didn’t possess Belgian nationality at the moment he or she first registered in the country.

So the number for Brussels published in the World Migration Report was wrongly considered to be about the foreign-born population. The newspaper article based on it contained false claims as a result, and had to be rectified.

The authors of the World Migration Report didn’t read the metadata of the source they were using, and as a result they misinterpreted the definition of “of foreign origin” used in the source.

Kidnappings in Nigeria

In April 2014, 276 school girls were kidnapped from a school in Chibok in eastern Nigeria by terrorist group Boko Haram. US based data journalism medium FiveThirtyEight tried to put this horrific event in a historical context, and took a look at the bigger picture.

Source: Kidnapping of Girls in Nigeria Is Part of a Worsening Problem, fivethirtyeight.com

Source: Kidnapping of Girls in Nigeria Is Part of a Worsening Problem, fivethirtyeight.com

The worsening trend mentioned in the headline of the article is illustrated with the following chart:

Source: Kidnapping of Girls in Nigeria Is Part of a Worsening Problem, fivethirtyeight.com

Source: Kidnapping of Girls in Nigeria Is Part of a Worsening Problem, fivethirtyeight.com

The article quickly received criticism because of the source used for the data in this chart and the analysis in the article. The numbers were taken from the Global Database of Events, Language and Tone (GDELT), a database that collects data on events and locations on a daily basis in an automated way from thousands of broadcast, print and online news sources.

So instead of showing the “daily kidnappings in Nigeria”, the chart above is showing the “daily news reports about kidnappings in Nigeria”. These are of course very different metrics. A rising trend can be attributed to a higher number of kidnappings, but also to a higher number of news reports in general, or to news media paying more attention to the phenomenon.