Reading time: ~7min

Level of technicality: low, no previous knowldge required

 

Summary:

 

  • Data is a representation of facts. Data may or may or may not be accurate
  • Data is stored in digital format, which makes is processable by computers
  • Data is often said to be the oil of the 21st century, because it can be harnessed in many ways, notably to create AI system with machine learning
  • The amount of available data in the world has been following an exponential growth path in the past years

 

Have you used any Google or Facebook service today already? In almost 100% of the cases, the answer will be ‘yes’. Have you paid for any of these services?  In almost 100% of the cases, the answer will be ‘no’. These companies also happen to be among the top 10 most valuable publicly traded companies in the world as of 2020. You might ask then: “How have Google and Facebook become so rich while offering their services ‘for free’”? The answer is: data. They have centered their entire business models around monetizing the huge data assets they collect.

But what is data? Why is it so important in the 21st century? And how is it different from information, knowledge and facts? We will look at these questions in this article.

 

1. What is data?

 

Data is a representation of facts stored in digital form. This representation of facts may or may not be valid and accurate. Data can take many forms; that is why it helps to distinguish between the different types of data.

what-is-data

Figure 1: Data is a representation of some aspect of the real world. The real world is the sum of all facts.

The first thing to note is the different between data and facts. Facts are assertions of the truth. In the words of the famous philosopher Wittgenstein, “The world is everything that is the case” – i.e. the totality of all facts. What data is trying to do is depict and represent these facts and thereby the real world. This representation of the truth may or may not be valid and accurate. That means data can be wrong or inaccurate and that it therefore is not always a perfect reflection of the truth.

To clarify this point, consider these three examples:

Fact Data/ information How this data/ information might be inaccurate or even invalid
My current coordinates (physical location) GPS sensor data of my smartphone GPS sensors might be dysfunctional or might yield inaccurate results (everyone will have been in the situation where the GPS sensor of the smartphone was inaccurate during navigation.
Accident where I slipped and dropped my friend’s camera Report for my liability insurance in which I describe how the accident happened. My description of how I slipped is based on my subjective perception of the accident and I might not have sensed all of the things that were happening. So, the picture I draw with the description is subjective at best, or may even be wrong.
Temperature in a room.

Data of a temperature sensor.

A temperature sensor might be inaccurate. For example, if it is located in a bad spot it might measure a temperature which is unrepresentative of the actual environment.

 

2. What is the difference between data and information?

 

The terms information and data are often used interchangeably. However, they differ in two ways. First, the degree of consumability and meaning. Second, how they are stored.

Data can be any sequence of values, numbers, text, picture, files and so on. All of these things do not necessarily have to be informative to a consumer of that data. In most cases, data needs to be processed and put into context to make it informative for the consumer. Consider the following table containing cryptic sensor data. You would need a context or processing of this raw data to give it meaning. To make it both meaningful and consumable, we might visualize it in a graph, like done in figure 2 which is showing the temperature (OBJ_TEMP) of temperature sensor over time (PK_SEN_TI).

what-is-the-difference-between-data-and-information

Figure 2: Some cryptic sensor data

what-is-data-visualization-temperature

Figure 3: By visualizing data and providing a proper context we can turn it into information

The second distinction between information and data concerns their storage format. Data is stored on digital media, information can also be stored in analogue form such as books, cassettes, tapes or vinyl. In 1986 the global information storage capacity stood at 2.8 exabytes[1], which is 2.800.000 terabytes. Back then, the share of information stored in analogue form was 93%, the remaining 7% were stored in digital format. The year 2002 marks the point in time when we entered the “digital age”. From that point onwards more information was stored digitally than in analogue form. In 2007 the global information storage capacity had increased to 299 exabytes, which is 299.000.000 terabytes.* However, the percentage of information stored in analogue form fell to a mere 6%. The other 94% of information were already stored digitally on hard drives, DVDs and other digital storage technologies.[2]

 

3. Why is data important?

 

Data is information stored in digital form. Information has always been important (just think of the saying that information is knowledge, and knowledge is power). However, the rise of data is revolutionary for several reasons including:

  1. Harnessing computer power:

Storing data in digital rather than analogue form has enabled humankind to harness the power of computers and algorithms. With these we can collect, store and process information on unprecedented scales. Just imagine you had an entire bookshelf in digital and analogue form. Not only is the digital storage format much more convenient and efficient, but if, say, you are looking for a certain title you can simply have your computer search for it and get a result in no time. How long would you take to look for a certain title, phrase or word manually?

  1. Data is growing exponentially:

When information was stored exclusively in analogue form, the information storage capacity of human mankind was extremely limited. By storing information as data in digital form, we have decoupled the growth of information from these limitations. Thanks to computers, hard drives, our smartphones and other technological innovations we are able to create, store and process data at staggering levels. Today, data growth is following an exponential path. Remember that in 2007 the global information storage capacity stood at around 299 exabytes? It is estimated that the size of the global datasphere will increase from around 50,000 exabytes in 2020 to some 175,000 exabytes in 2025 (!). It is really hard to wrap your hand around these massive numbers.

  1. Data is the oil of the 21st century and driver of a whole new range of innovations and technologies, like artificial intelligence (AI)

Most importantly, data is often regarded as the fuel of the 21st century. Much like oil and electricity have powered innovations and economies in the past, data will be the (not so natural) resource that fuels these in the present and future. And as we have just seen, unlike oil data is the opposite of depletable – it continues to grow.

Arguably the most powerful, technological innovation that is fuelled by data is the creation of AI systems with machine learning. Artificial intelligence, i.e. algorithms or machines that can complete tasks that would usually require human intelligence such as recognizing objects in an image, is often said to equip us with mind power. So, in the future many of the tasks that we perform and that require our intelligence will be automatable. The steam engine is said to have equipped us with muscle power. Through the muscle power of machinery, we were able to automate many tasks that required our physical effort.

why-is-data-important-ai-artificial-intelligence-mind-power-muscle-power

Figure 4: Why is data important? Because it is the fuel that is driving many technological innovations, first and foremost the creation of AI systems. Data is therefore also considered “the oil of the 21st” century. Unlike oil, however, data is a non-depletable resource.

 

So, to gauge just how important data is you can think of it this way. The industrial revolutions were fuelled by oil. Just think about all the innovations that were are based on the combustion engine. Where would we be now if had not had that oil? It is almost unimaginable.

In the very same way it will be unimaginable in a few years to think about a world in which AI-powered innovations are absent. The vast majority of these innovations will be brought about by the creation and use of AI system, which in turn are fuelled by data. These innovation will bring about fundamental changes in our lives, economies and societies in essentially every aspect. As famous AI researcher Andrew Ng said: “Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years”[3]. So, the decades that lie ahead of us are comparable to the previous industrial revolution(s) – all because of data.

 

4. What are the different types of data?

 

Data can take many forms and there are many ways in which to classify different types of data. Arguably the most important dimension by which data is classified is by the degree of its organization. In that regard we distinguish between structured, semi-structured and unstructured data.

But there are many other criteria by which you can classify data. Another important one is the distinction between master data and transactional data. Or, many of us will still be familiar with the distinction between qualitative and quantitative data that we learned in school or university.

I have written more about the various type of data and why the distinction of different types of data is important in another blog article – so make sure to check out my blog.

 

5. How can data be used?

 

We have said that data is a representation of some facts. You can think of data as giving us an image of some part of the real world. For example, customer data is drawing a picture of the entity “customer”, eliciting for example sociodemographic variables such as their age, gender and maybe their address. Great, so we have this data, but how can it be used?

By analyzing the picture that data draws, we can draw valuable conclusions and even predict the future. The discipline that is concerned with that is called data analytics. There are many other concepts that fall within data analytics or are strongly related to it including machine learning, statistics, data visualization and many more.

 

6. Summary

 

In this post we have learned what data is and why it is important. Simply put, data is nothing else than a representation of facts stored in digital form.

Data is extremely important because it is the oil that is fueling many technological innovations, most importantly the creation of AI systems with machine learning. These innovations are and will be changing the lives of everyone in the decades to come. Unlike oil, however, data is a resource that is non depletable and continues to grow exponentially in size.

To find out how you can start leveraging data in your organization and to create your own data strategy contact me.

[1] 1 exabyte = 1,000,000 terabyte = 1,000,000,000 gigabyte. Note that one exabyte is roughly equal to the storage capacity of 162,000,000,000 books, assuming that one book contains around 100,000 words. That’s enough books to create 12 piles of books from the earth to the moon, assuming an average depth of 3cm of a book.

[2] Hilbert, M., & López, P. (2011). The world’s technological capacity to store, communicate, and compute information. science, 332(6025), 60-65.

[3] https://www.gsb.stanford.edu/insights/andrew-ng-why-ai-new-electricity