Many artificial intelligence systems rely heavily on data. But what is data anyway? Is data pictures of puppies on the internet, or is that just information? With Big Data gaining momentum, providing an influx of data that can be mined for information or used in machine learning projects, “data is the new oil”. In today’s digital world, we hear the word data all the time as it plays a crucial role in how computers process information. Also, digital data in its many forms is the headstone of artificial intelligence.
In this digital age, it’s crucial to know what data is, how it gets stored, and how computers process it into valuable information. This article will guide you through everything you need to know about data.
What Is Digital Data?
The famous quote “Data is the new oil” by the British mathematician Clive Humby is often misunderstood. Although the phrase showcases that data is as valuable as oil, it also claims that data, just like oil, isn’t useful in its raw state.
Data is a raw and unorganized fact such as numbers, text, images, and sounds, that require to be processed by a computer to make it valuable. This means that raw and unprocessed data is meaningless without interpretation, be it by humans or machines.
The word data alone can be available in different forms, such as bits and bytes in computers, texts on paper, or even ideas in a person’s mind. However, in this digital era, data usually refers to computer information that is either transmitted or stored.
Is Data the Same as Information?
Not exactly. Information is a set of data that has been processed to give meaning according to the given requirements. Information answers questions, provides insights, and supports decision-making. On the other hand, data is raw facts and figures such as numbers, symbols, text, or observations.
Let’s better understand the difference between data and information through an example:
Data: You have a set of letters: A, C, E, B, D.
Information: When you arrange these letters alphabetically you get the following information: “The letters in alphabetical order are A, B, C, D, E.
So we deduce that:
Data consists of individual letters.
Information is derived by organizing the data in a specific way to provide a meaningful sequence.
Here’s a table that further explains the difference:
How Data Gets Represented and Stored
Now that we understand what data is, let’s see how computer stores and represent data. Computers represent data as binary values: 0 and 1. Binary is a mathematical number system that alludes to a way of counting. The computer represents data by having switches for two states: OFF and ON. This means that the binary coding system would be 0 for OFF and 1 for ON.
One binary digit (0 or 1) is referred to as bit, short for binary digit. Bits can be grouped into larger chunks of data where 8 bits are known as 1 byte. As you see in the graphic below, an ON bulb refers to 1 and an OFF bulb refers to 0.
Let’s see for example what numbers as data can look like in a binary system.
Therefore, computers display data as binary 0 and 1 and display the data as it is on the screen.
Memory and storage are measured in units such as megabytes, gigabytes, and terabytes.
Digital Data Processing Cycle
The data processing cycle, also known as the information processing cycle, is a series of steps that data (input) goes through when it gets collected, processed, and transformed into useful information (output). The steps occur in a specific order, however, the whole process is repeated in a cyclic manner. The output of the first cycle can be fed as input into the second cycle, and so forth. There are generally six steps in the data processing cycle:
- Step 1 – Collection: The first step in the data processing cycle is collecting raw data. The type of data collected deduces the type of output data. Therefore, the raw data should be carefully collected in a well-defined manner.
- Step 2 – Preparation: Also known as data cleaning, this is the step where unnecessary and inaccurate data gets removed. This step checks data for errors, duplications, and missing data.
- Step 3 – Input: This is where the raw data gets transformed into machine-readable form.
- Step 4 – Data Processing: In this step, data gets processed using machine learning algorithms to generate a specific output.
- Step 5 – Output: The data will be displayed in a readable format like graphs, audio, video, and other formats for users.
- Step 6 – Storage: The output data is stored in this step for later use.
Types of Digital Data
There are many ways to classify data and sort them into different categories. However, there are four main categories of data as follows:
- Quantitative Data:
Includes data that represent quantities or numerical measurements.
-
- Nominal Data: Data that consists of categories or labels without any specific order. Example: Colors, gender.
- Ordinal Data: Data that has meaningful order or ranking. Example: The order of education degrees, high school, bachelor’s, master’s.
- Qualitative Data:
Includes data that represent qualities or characteristics.
-
- Discrete Data: Data that can only be of specific distinct values. Example: Number of students in a class.
- Continuous Data: Data that can take a wide range of values. Example: Temperature, weight.
That can also be classified based on structure. This is especially important for machine learning algorithms.
- Structured Data: Data that is organized or follows a specific schema. Example: Databases, spreadsheets.
- Unstructured Data: Data that lacks a specific format. Example: Documents, images, audio files.
- Semi-Structured Data: Data that has some structure but is not as rigid as structured data. Examples: XML and JSON data.
What Is Big Data?
You’ve probably come across the term “Big Data” before. Big data refers to extremely large and complex datasets that cannot be easily processed using traditional data processing tools and methods. Big data can be categorized into structured and unstructured data. Structured data includes data collected from organizations’ databases and spreadsheets. And unstructured data that can be gathered from social media sources.
The three Vs of Big data:
- Volume: Big data includes vast amounts of data that conventional databases cannot handle. The volume of data can range from terabytes to exabytes.
- Velocity: Big data is generated at high speeds. It often comes in real-time streams such as data from social media, sensors, and Internet of Things devices.
- Variety: Big data encompasses a variety of data types and formats. It includes structured, unstructured, and semi-structured data.
Big Data and AI
Since artificial intelligence requires a massive scale of data to learn and big data analytics leverages AI for better data analysis, big data and AI have a symbiotic relationship. The synergy between big data and AI has even led to remarkable advancements in technology, data analysis, and decision-making. Big data provides the raw material and context for AI to operate, while AI adds intelligence and automation to the analysis of big data.
Not just big data, AI generally relies on data as its sustenance. Large, diverse datasets fuel machine learning algorithms, enabling AI to learn, adapt, and make decisions. Without digital data, AI lacks the foundation required to operate effectively and intelligently.