Computer is good in processing data. That is, if you specify exactly how to process it and what data you expect it to process. Real life does not work that way. In most cases, data is messy and full of noises. We also have to adjust our algorithm depending on what data we are processing. This can turn ugly quickly. Clearly we need a highly adaptive supercomputer which can change itself depending on the nature of our data and then give us appropriate output.
Smart guys then realized that "you know, we kinda have a highly adaptive supercomputer already!". That supercomputer is our brain. And thus artificial neural network, an algorithm mimicking how human brain works, was born.
This tutorial will guide you through the most basic stuff of neural network, culminating in your very own neural network system which you can immediately use for your data processing need. This tutorial will be presented in two-parts format. The first part will talk about the theory of artificial neural network (ANN) with some codes for its basic components. The second part will talk about how the net learns, including the remaining codes to wrap it all into a ready-to-use library.
Granted, the topic of artificial neural network is a vast one. The amount of research spent for this topic could never be wrapped in a two-part tutorial like this. However, we will try to get you up and running with the topic by the end of this tutorial.
In case you have already mastered this part, here is the link to Part 2.
The following applications will be needed in your system to follow through this tutorial.
- Java as our main programming language. Download the latest version here. Be sure to download the JDK and not just the JRE.
- IDE or text editor to code Java. We recommend Netbeans
Furthermore, we also assume that you have prior knowledge to Java programming in general.
Artificial Neural Network is an algorithm mimicking how human brain works. Specifically, the algorithm mimics how neurons in our central nervous system works. For those unfamiliar with the biological concept, neurons are cells in our nervous system which are electrically excitable, meaning that they conduct electricity through their body. A typical neuron accepts (get input) electricity from outside source, usually other neurons, and then transmit (output) electricity to somewhere else. Here is an illustration of neuron.
We can think of neuron as an object which accepts input and produce output based on those inputs. In real life, electricity emitted by neuron as output would depend on the type of neuron and how much electricity it had taken in. Substitute electricity with numbers, and suddenly neuron is not so different than a computer! So it is true that all of us have tiny, fully working computer in our brain.
Our goal then is to create a model (in programming term, an object) which accept inputs and produce outputs. The picture above shows us that neuron can accept lots of input (dendrites) but can only gives one output (Axon). Thus our model would look like this.
If you compare the two pictures, you may find the second one as a simplification of the first. Both takes a lot of input and gives a single output. The second picture, however, showed the inner working of a neuron. All of the inputs are summed into one value, and then went through something called activation function before going out as the output.
Now, what is activation function?
Neurons in real life do not merely take all electrical signal and output them as is. The exact output of a neuron would depend on its action potential. Different neuron cells may have different action potential, and thus the same input on each cell may produce different output. Activation function mimic this behavior by transforming summed value before passing them as output. An activation function is a mathematical function which takes a value and map it to different value. One example is Linear function, which simply means "do not change anything". That is, output = input.
Different activation function is used for different purposes. Heaviside Step function is used when we want the output to be strictly binary, that is 1 or 0. Sigmoid function is used when we want our output to be continuous, but only on the range between -1 and 1. There is no definite rule on which function we must use for our neuron; we have to choose based on what kind of output we are expecting from the neuron.
This tutorial will not cover the exact calculation step for each activation function, because most of them can be found in common math books and courses. However, we will still show the formula representation in code. Which is to say, let us dive straight to code. Our first code will be the activation function.
This is simply an interface for Activation Function. The idea is that we will implement this interface for any concrete activation function class. Looking at the interface, we can see that activation function object only need to do two things for user: evaluate, which is taking an input value and produce another value, and evaluateDerivate, which is producing the derivation of the function. Derivation is important in later stage, but for now just think of it as a necessity for activation function.
Let us create the concrete implementation. In this tutorial I will use three type of activation function, which are Linear, Sigmoid, and HyperbolicTangent. Basically we just fill the two methods defined in ActivationFunction interface with appropriate calculation for each type. The codes below shows how we implement the mathematical calculation into programmable format. Consult your math sources to crosscheck the validity of these codes!
Let us create the very basic element of Artificial Neural Network: the Neuron itself.
Our Neuron class would contains a certain integer named id, which is used to identify the Neuron. It also contains a variable func with the type of ActivationFunction. By putting it this way, we can make the Neuron use any type of activation function class as long as they implements the ActivationFunction interface. Finally we have a boolean named bias and a double named value to store the content of Neuron.
As we have discussed before, a Neuron contains differet ActivationFunction depending on the type of Neuron. In our class, it defaults to Linear function if it is not specified in the constructor. The third constructor overload has a boolean parameter named bias. If the value is true, it indicates that the neuron is simply a bias node. We will see how bias works later in the tutorial.
The operations getId(), getValue(), and isBias() is straightforward: simply return the desired attribute. setValue() use activation function to transform value going into the neuron and then store the result to value attribute. getValueDerivate() use this stored value to return derivation value based on the activation function.
Remember the neuron model from before. Imagine that we have a sample case like this:
This is a simple neuron with 4 inputs, which has sum value of 0.75. This value is then converted using HyperbolicTangent function, giving out 0.64107 as the result. If we are representing this example in code, using the Neuron and ActivationFunction classes that we have made, it will look like this:
Just like the example, we specify HyperbolicTangent as the activation function for our neuron object. After that we simply supply the sum value (0.75) to the setValue() method. Note that we put the summed value into the method, not the individual input values. The neuron object will now hold the result of activation function (0.64107). We can get this value by calling getValue(), or compute the derivation of this value with getValueDerivate().
A single Neuron is mainly a transformer to change input value based on activation function. It is not really that useful. It is the same in real life. A single neuron cell in the brain can not do much. However, our brain is actually an interweaving web of many, many neurons connected to each other, hence the name neural network. It is by being a network that our brain can harness its unbelieveable processing power way more powerful than any supercomputer in the current age.
In real life, neurons would mesh together to perform some kind of web of interconnecting neurons. The result looked somewhat like this. Pretty, is it not?
Source: BioEd Online
Remember our neuron picture. A neuron takes input through dendrites, processed it inside the body, and throw the result out through axon. The input coming into dendrites are, most of the time, axons from other neurons. The network is an infinite chain of neurons passing electricity around. There is also initial input and final output: the very first input coming into dendrites comes from our senses, while the final result of this processing will be either stored in the brain's memory system or passed out back to our senses to trigger reflex movement.
Our job is to simulate this electricity passing process in our artificial neural network program. Let us begin with the abstract concept. We simplified the picture above into this diagram.
We call the neurons as node because they can be thought of as points where all the connections (the lines) entered or got out. Usually the nodes are divided into three types, the input nodes, hidden nodes, and output nodes. Input nodes represent the very first input coming into neural system, which in real life would mostly come from the senses. Output nodes represent the final result of processing done by the neural system. Although the picture only shows one output nodes, it is possible to have more two or more output nodes.
Hidden nodes are the core of neural network system, and it represents the entirety of real life neurons in which electrical signals jumped from one neuron to another. They are called "hidden" because in an artificial neural network, the user simply put values into the system (input nodes), and receive the processing result from output nodes. They have no knowledge of how the values were processed by hidden nodes. Because of this property, artificial neural network is labeled a black box: user have absolutely no idea of what happened inside the system.
In the picture above, you may see what we called as bias node. While I called it a "node", bias node is not really a node per se. Bias is actually an arbitrary value determined by the system which is included in the calculation within the nodes. Usually, bias is set to 1 and its value will not change in the entire run of the calculation process. Why we use bias at all? We use bias to change the result of activation function. By putting bias into calculation, we can force activation function to have entirely different result. In short, bias is a control unit, like a button or switch, which we can manipulate to tweak the result of our system.
The last component which have to be explained is the weighted line. As you can see, the lines connect every node with each other. For a hidden node, the lines came from all nodes in its left and stretched out to all nodes on its right. It is a bit different for input, bias, and output node; both input and bias nodes only have outgoing lines, while output nodes take all the lines but gave out nothing. Due to this, we usually separated the network interconnectingo three layers. First is the input layer, which only send out values. Second is hidden layer containing all interconnecting hidden nodes, which both receives and send out values. This layer can have sub-layers inside it as each columns of hidden nodes are considered a layer by themselves. Finally, there is an output layer, which only receives values from hidden layer and calculates the final result
Think of these lines as highway. Values from the nodes will travel out from the node to others through this line. But there is a catch: remember that I labeled these lines weighted. Each lines have a multiplier of their own, and values traveling through them will be factored with that multiplier. So if the line has 0.6 multiplier and the value from the sending node is 23, the actual value received is 23 * 0.6 = 13.8.
With all these concepts in our hand, we can try peeking on what really happen in a neural network system. Here is a simple example.
This is a simple neural network comprising of 2 input nodes and 1 output nodes. The question is, what would the result in output node be?
Remember our discussion about neuron. We know that a neuron takes all input, sum them all, then put it through its activation function to get the final result. In our previous discussion, we simply sum them all. In a network, all incoming values must go through weighted lines, therefore we have to factor them in calculation as well.
S = (2.7 * 0.7) + (3.42 * 0.6) + (1.0 * 0.8)
Now we simply put the result to activation function, in this case a sigmoid function.
O = sigmoid(S) = 0.9914
Aaaaand we are done! Simple, is it not?
Of course, very rarely will we encounter a problem which only need 2 input nodes and 1 output node to solve. Some may even take hundreds of input nodes, thousands of output nodes, and even millions and billions of hidden nodes! Here is one sample of a more complex neural network system.
Do not let the scale fools you; no matter how big the system is, the calculation is done exactly as we have illustrated above. The only thing you need to do is repeating it for all the nodes. In the case of hidden nodes, the value that we got from above calculation is stored inside the node, and then got sent again to the next node. In this way all values from input nodes are calculated and propagated throughout the network until finally they all reached output nodes. This type of network, where values propagate from the leftmost layer (input) to the rightmost layer (output) in only one direction is called feed forward neural network.
Let us try a simulation of value propagation with the model above. In this example, the input values would be 14, 7, and 3. All bias would be 1, while the weights for all weighted lines would be randomized. All neurons would use sigmoid function as their activation function.
For the sub-layer 1, the neuron values would be:
Sublayer1 Node1 = S(14 * 0.3 + 7 * 0.2 + 3 * 0.3 + 1 * 0.12) = S(6.62) = 0.998668345 Sublayer1 Node2 = S(14 * 0.4 + 7 * 0.1 + 3 * 0.7 + 1 * 0.27) = S(8.67) = 0.99982837 Sublayer1 Node3 = S(14 * 0.6 + 7 * 0.5 + 3 * 0.8 + 1 * 0.6) = S(14.9) = 0.999999662
For the sub-layer 2, the neuron values would be:
Sublayer2 Node1 = S(0.998668345 * 0.3 + 0.99982837 * 0.6 + 0.999999662 * 0.82 + 1 * 0.4) = S(2.119497248) = 0.892783815 Sublayer2 Node2 = S(0.998668345 * 0.8 + 0.99982837 * 0.9 + 0.999999662 * 0.27 + 1 * 0.5) = S(2.468780118) = 0.921924003
Finally for the output layer, the neuron values would be:
Output Node1 = S(0.892783815 * 0.2 + 0.921924003 * 0. + 1 * 0.8) = S(1.070749163) = 0.74473936 Output Node2 = S(0.892783815 * 0.4 + 0.921924003 * 0.16 + 1 * 0.7) = S(1.204621367) = 0.769345879
The final result of our processing is in the output nodes: 0.74473936 and 0.769345879. These values have no meaning by themselves. Usually the significance of these values depends on the context of the processing; e.g. if you are calculating the probability of credit card acceptance, values nearing 1 would mean the application is accepted, while values nearing 0 would mean the application is rejected.
Of course, we do not recommend you to calculate all of this by hand; that would be a task both herculean and pointless. Remember that computer is really good at doing repetitive task. Let us just leave it to the machines!
We have reached the end of the first part for this tutorial. I hope that you have made an image of how artificial neural network works as of now. In the next part, we will learn how artificial neural network learn to recognize patterns and adjust itself to the problem at hand.