How can a machine that lacks a brain differentiate an apple from a cherry? Or tell what number a handwritten digit is? Or detect fraud and predict stock prices? It is the fascinating world of artificial intelligence that gives machines human abilities. There are many ways to do so, one of which is deep learning. It’s the process of building artificial neural networks inspired by the structure of the human brain to train computer systems. It’s like giving a brain to a brainless machine. While it’s more calculus than a sci-fi excerpt, it’s still very interesting to dive into. So, what are neural networks?
What Are Neural Networks?
Neural networks, also known as artificial neural networks, are computational models that form the core of deep learning algorithms. The human neural network inspires their structure.
The brain is actually made up of billions of neurons. Let’s assume there are only two. When one neuron receives a message, the message passes through it and reaches nerve endings. Then, nerve endings pass the message to the following neuron through a synapse. The synapse is the space between two neurons that let them communicate.
Artificial neural networks are similar. They’re made up of neurons structured into multiple layers. One neuron, also known as a node, receives a message (an input) and then passes it to another neuron in the following layer through channels that connect both.
The message transfer through neural networks is a bit more complex than that. We’ll get to how neural networks work but before let’s understand how they became a thing.
Neural Network History
Although the concept of smart machines has existed for centuries, the centered focus on neural networks intensified in the past 100 years. Let’s dive into the neural networks’ history.
- 1943: Warren S. McCulloch and Walter Pitts published a paper entitled “A Logical Calculus of the Ideas Immanent in Nervous Activity”. The research explained how the brain produces complex patterns that can be simplified down to a binary logic structure with only true/false connections.
- 1958: Frank Rosenblatt developed the first neural network, the perceptron, documented in his research. He enabled a computer to learn how to distinguish between cards marked on the left and cards marked on the right.
- 1959: Bernard Widrow and Marcian Hoff developed models called “Adaline” and “Madaline”. Madaline was the first neural network applied to real-life problems. It’s an adaptive filter that eliminated echoes on phone lines and it’s still in use today.
- 1974: Paul Werbos applied backpropagation within neural networks.
- 1986: Rummelhart et al. proposed the Multilayer Perceptron – a multilayer neural network.
- 1989: Yann LeCun published a paper demonstrating how the use of constraints in backpropagation and its integration into the neural network architecture can be used to train algorithms.
- 1997: Schmidhuber & Hochreiter proposed a recurrent neural network framework (long short-term memory).
- 2018: Jacob Devlin and his colleagues from Google published BERT, a transformer-based model for Natural Language Processing
- 2020: OpenAI published GPT-3, a deep learning model that produces human-like text.
- 2022: OpenAI published ChatGPT, an advanced chatbot.
How Does A Neural Network Work?
We briefly explained what are neural networks and the history that led to where we are today. Now, let’s see how a neural network works.
Neural Network Structure
A neural network is made up of neurons structured into multiple layers. A neuron is a mathematical function that receives an input, processes it, and generates an output.
As you can see above, the neural network has an input layer, an output layer, and hidden layers in between. The input layer is the first layer of a neural network to which you feed the data as input (x1). The output layer is the last layer of the network and gives us the output. The hidden layers are where all the magic happens. The number of hidden layers varies depending on the neural network as well as the number of neurons.
Channels, Weights, and Biases
The channels are the connections through which neurons from a layer connect with neurons from another layer. Each channel has a “weight” (w1) which refers to how important the input is in regards to the output we want.
A bias (b1) refers to how easy it is to get a neuron to fire (give an output). If the bias is big, it means that it’s very easy for the node to give us an output. And, a low bias indicates that it’s difficult for the node to do so.
Steps of Training a Neural Network
Now you’re versed on what each element of a neural network is, it’s time you understand how it works. Let’s suppose we want our model to recognize the number 9. How would it do that?
Neural Network Training
The neural network has to understand what makes the number 9 a 9 and not an 8. So, it has to pick up the “features” of the number 9. A 9 has a loop at the top, and a vertical stroke in the bottom right.
In the case of machine learning algorithms, you’d have to feed the images and the features. However, in the case of deep learning, neural networks automatically extract these features. What you have to do, though, is provide enough information to increase the accuracy of your model.
So, if you want your model to recognize the number 9, you would have to gather and feed the model multiple pictures of number digits. This is called “training a neural network” when you’re feeding it data to give you the desired output.
We agreed that you’d have to feed the model multiple pictures of the number 9. Let’s narrow it down to feeding it one image of the digit 9. The number is present as 28 x 28 image pixels which amounts to a total of 784 pixels.
Each pixel is fed to a neuron in the first layer (input layer) in the neural network. This means that we would have 784 neurons in the first layer. The inputs would be referred to as x1, x2, x3, etc. The data passes through forward propagation – from the first layer to the last.
The first layer now has 784 neurons each having a pixel as an input. How is that data passed to the next neurons in the following layer?
Each input is assigned a “weight” which represents the influence of this input on the desired output. This means how important the input x1 is for us to get the final output as the digit 9. And, the neurons are assigned with “bias” which represents how easy it is for the neuron to give out an output.
But, how does a neuron give an output? The weighted sum of the inputs plus the bias is applied to a mathematical function called the “activation function” which determines whether or not a node fires. This is done throughout all layers up until the final one.
The final layer has 10 neurons, each consisting of one number digit from 0 to 9. The neuron that lights up gives us our output.
Enhancing Neural Network’s Results
Let’s say we feed the model an image of the digit 9 and it gives the number 8 as an output. Does this mean our neural network isn’t working and we should build another one? Nope. It means that our model needs enhancement.
After the result, we calculate the “cost function” which is the fancy name for “error”. It is the squared difference between our expected output and the output we got. Then, we have to tweak the weights and biases in order to minimize the error.
How Do We Optimize the Cost Function?
How do we change the weights and biases to minimize errors? We use what is known as “the gradient descent”. Gradient descent is a standard optimization algorithm. It refers to changing the parameters (weights and biases) for the cost function to reach the lowest point of the slope indicating minimal error.
After deciding on what weights and biases you want to enhance, you feed the neural network these changes through backward propagation. Backward propagation is when you’re going from the last layer to the first layer. You keep doing this until you reach maximum or close to maximum accuracy.
Types of Neural Networks
There are different types of neural networks that are used for different data and applications.
The perceptron model, invented in 1958, is the simplest and models neural networks model. It consists of the neuron that classifies the data into two categories. It works the same way as we described above:
- Neuron receives inputs
- Sums them up
- Applies activation function
- Gives an output
Application of Perceptrons
- Data classification
- Easy to understand, implement, and train
- Performs well on problems that are linearly separable (logical operations, linear regression, and binary classification)
- Limited expressive power and generalization ability
- Cannot process non-linear data
- Prone to overfitting and noise
Feedforward Neural Networks
A feedforward neural network is the simplest form of neural network after the perceptron. It’s made of two layers: an input layer and an output layer. Sometimes hidden layers are present in between but not necessarily as it depends on the use.
The data is fed in one direction only and never backward, hence the name. This means that weights are not updated as there is no backpropagation.
Applications of Feed Forward Neural Networks
- Simple classification
- Face Recognition
- Speech Recognition
- Simple to contain
- Equipped to deal with data that contains a lot of noise
- Can’t update weights
- Can’t use it for deep learning
Convolutional Neural Networks
Convolutional neural networks contain a three-dimensional arrangement of neurons instead of the usual two-dimensional arrangement. They’re formed of multiple layers:
- Input layer: Responsible for taking in data as inputs
- Convolution layers: Responsible for feature extraction (they produce maps)
- Pooling Layer: Responsible for the aggregation of maps produced from the convolutional layer
- Fully connected layer and output layer: Responsible for giving out outputs
Applications of Convolutional Neural Networks
- Image processing
- Computer vision
- Speech recognition
- Machine translation
- Efficient for deep learning
- Fewer parameters needed to learn in comparison with fully connected layers
- Hard to maintain
- Slower than other networks depending on the number of layers
Recurrent Neural Networks
Recurrent neural networks are characterized by their ability to use information from previous inputs to influence the current inputs and outputs. So, the information cycles through a loop, unlike feed-forward networks where the information goes in a forward direction only.
Applications of Recurrent Neural Networks:
- Text-to-speech processing
- Text processing like auto-suggest and grammar checks
- Sentiment analysis
- Processes sequential data where each sample can be assumed dependent on previous ones
- Can plan out several inputs and productions
- Difficult to train
- Difficult to process long sequential data
LTSM – Long short-term memory
Long short-term memory networks are like recurrent neural networks with the addition of memory cells. These cells can store information for long periods of time. LTSM networks use three gates:
- Input gate: Controls what data should be kept in memory
- Output gate: Controls the data given to the next layer
- Forget gate: Controls what data to dump and forget
Applications of LTSM
- Gesture recognition
- Speech recognition
- Text prediction
- Good at handling long-term dependencies
- Less susceptible to the vanishing gradient problem
- Very efficient at modeling complex sequential data
- More complicated than RNNs
- Requires more training data
- Doesn’t work well with highly non-linear data and data with a lot of noise
Applications of Neural Networks
We briefly touched upon how we can use each type of neural networks. But, let’s actually see how common the use of these networks is.
- Security: These networks are the core of facial recognition systems of surveillance. They match human faces with the digital images in their database. Offices commonly use these systems for selective entries.
- Finance: Neural networks are used to predict stock market prices.
- Marketing: Neural networks are the core of social media’s algorithms that make sure you get ads fit to your needs and your taste.
- Defense: The USA, Britain, and Japan among other countries use NN for developing solid defense strategies. They also use them for air patrols, maritime patrols, and controlling automated drones.
- Healthcare: Health professionals are using NN in image processing to detect cancer and other anomalies. They also use them to keep track of patients’ data.
- Weather forecasting: NN is used to help predict the weather as well as possibilities for natural disasters.
Advantages and Disadvantages
Artificial neural networks are definitely revolutionary and participated in the massive expansion of many fields but they also have their own set of limitations.
- High learning ability: NNs are capable of learning, making patterns, and adapting to new situations.
- Capability of handling non-linear relationships: NNs can learn non-linear relationships which is especially useful in image and speech recognition
- Ability to tolerate faults: Neural networks continue to function even if some neurons are no longer working.
- Parallel processing: NNs can handle many calculations at the same time which is why they’re able to process large datasets.
- Ability to generalize: NNs learn from their inputs and apply what they learn to new data. So, they make accurate predictions based on their own dataset.
- Overfitting: Some networks fail to generalize new data and can only process training data.
- High computational power needed: NNs require high computational power and time, especially for large data sets. This can be a disadvantage for those who have limited resources.
- Large training data required: NNs require large data sets to perform with high accuracy. If the dataset is small or biased, it would affect the model’s performance.
- Limited interpretability: Developers often face the black box problem with neural networks as they don’t understand how they arrive at their conclusions. This can be problematic when they need to know how the network reached a certain decision to fix it.
- Noise sensitivity: Noise in data can lead to inaccurate predictions and classifications.
Finally, there is still plenty of room for research and development of neural networks. But, so far we seem to be on the right track.