Welcome back to the Artificial Neural Network Tutorial. In this second and final part, we will implement the theories we have learned in the first part into a concrete Artificial Neural Network system running in Java. But before that, we will introduce two new concepts: **weight tables** and **backpropagation**

In case that you missed it, here is Part 1

## Weight Tables

In case you have not noticed, weights are important in artificial neural network calculations. While humans can trace weight lines quite easily, we need a better representation in our code. One way to do this is using a two-dimensional array with row number representing the outgoing node and the column number representing the recipient node. Remember that in our Neuron class we have an attribute called **id**. This will be the number representing our Neuron's identity, with the number incremented each time a new Neuron is made. Let us take a look on our previous example. The number inside the Neurons is the identity number, not the value to be propagated.

How would we convert this diagram into array? We made it like this.

We made an array of X rows and X columns, with X being the number of neurons in our neural network. To know the weight between neuron A and B, we simply get the value in weightTable[A][B]. If you look at the table, we put duplicate values in opposite row-column pair. So weightTable[A][B] and weightTable[B][A] would give us the same value. This will make our code easier. The rest of the array would not be used or simply left blank.

## Backpropagation

The calculation that we have done with our neural network is called **propagation**, as I have told you before in the first part. This is called doing **prediction**, where we entered input values into the neural network so we can get the result in output layer. Another process in neural network is called **training**. In this process, we know what result we want in advance, which we call as **target result**.

Why would we know what result we want in advance? It is because when we first created our neural network, it is still not adjusted for the problem at hand. These known results are used to train the network, just like what human do when they are learning by doing exams. For each results given by the entered input values, we compared the actual result with the target result. We call the comparison as **error values**. They determine how far our neural network is from doing a "good" prediction job. We would then adjust our neural network according to these values. Bigger values mean we have to adjust more, while smaller values mean we only need to tweak the network a bit.

This is how we will do the adjustment. Take a look at this sample network:

In this network, the value from A would be propagated along the weight lines to neuron B and C. If B and C are output neurons, the result on B and C would be the actual result for this network. Now suppose that we are using training data, and for this value we have the target result. We now want to find the error values, numbers representing how far away the actual result is from our target result. We do that using this formula. We will call this **Output Neuron Error Formula**

The formula works like this: instead of the usual activation function, take the derived form of that activation function and put the neuron value into the new function. The result of the function is then multiplied by the difference between target value (the value that we should have achieved) and actual value (the value that our output neuron produced). Look back on our Activation Function interface in part one. This is the reason why we have **evaluateDerivative()** method.

Okay, now we have the error rate. I have said before that we will adjust the neural network so the output will become what we want it to have in the first place (target value). While we can change activation function, computer can not create new functions out of nowhere. This left us with changing the **weight values** instead. It makes sense. Weight values are linearly multiplied with incoming values used in neuron output calculation, so changing weight values is the same with changing output value itself. The question is: how much we need to change the weight values?

The adjustment to weight values in sample network above uses this formula. We will call this **Weight Value Update Formula**

This one is simpler. If we have weight lines connecting neuron A and B, we simply add the old weight value to the result of multiplication between error value of neuron B (the neuron on right side) and the output of neuron A (the neuron on left side). The result of this equation would be our new weight value. Note that we have another symbol named **learning rate**. This is a constant which we used in weight adjustment so that the change in weight does not go too far. There is a whole mathematic reasoning here, but the point is that if we have too many drastic change, the network will not be able to create a stable system at the end. Because of this, we usually put learning rate at a small number. This relationship means that the higher our learning rate is, the faster our system will finish its adjustment. However, it will be more unstable when there is significant change in the input values. Because of this trade-off, the decision of how high the learning rate is would depend on the user.

That is all the steps if we assume that neuron A is an input neuron. However, most of the time we are not using 2-layer network like this sample. Neuron A would be most likely a neuron belonging in hidden layers. Therefore, it also has weight lines connecting into it. Those weight lines need to be adjusted as well, so we move over to those lines and use the formula above to calculate it. Which might left you asking this: "If we need error rate of neuron B for weight lines going into neuron B, and error rate of neuron C for weight lines going into neuron C, would that means **we need error rate of neuron A for weight lines going into neuron A**?"

The answer is yes, and to do that we need to use another formula.

Backpropagation is about **backwardly propagate** the error rate, from output neurons to all hidden layer neurons, in order to adjust the weight lines. That is why we always start from the output neurons, where we can get the error because we know what the target results are. Using those error values as a start, we put them into error values for other neurons. See neuron A now. We are going to put its error value using this formula. We will call this **Hidden Neuron Error Formula**

The first equation is the generic form for the formula. The one below it is how it is applied to our neuron A in the sample network. Basically, we simply multiplied the error value from neurons on the next layer to their respective weight lines, and summed them all in A. It is similar to how we calculate neuron values in propagation, is it not? The last thing to do is multiplying that result with activation function derivation like the one we did on output neurons.

These three formulas are the only ones we need to finish our neural network processing. We simply need to repeat the weight value update formula and hidden neuron update formula for every layers before output layer until we reach the input layer.

## The Code to Connect Them All

We finally reached the time to use all those theories and formulas in ou code. There is only one class that we will make now, and this will the be the main engine of our neural network system. We are going to create a **3-layer artificial neural network**, with one input layer, one output layer, and one hidden layer. This system would allow easy modification so that you can expand it to have more hidden layer if you need to.

package com.rd.ann; import com.rd.ann.functions.HyperbolicTangent; import com.rd.ann.functions.Linear; import com.rd.ann.functions.Sigmoid; import java.util.ArrayList; import java.util.List; import java.util.Random; public class NeuralNetwork { //data private final int inputCount; private final int outputCount; private final int hiddenLayerCount; private final double learningRate; //layers List> layers; private double weightTable[][]; private double errorTable[]; public NeuralNetwork(int inputCount, int outputCount, int hiddenLayerCount, double learningRate){ this.inputCount = inputCount; this.outputCount = outputCount; this.hiddenLayerCount = hiddenLayerCount; this.learningRate = learningRate; createNetwork(); } ...

We are creating the barebone for our neural network as well as putting necessary constant values like learning rate into the class through constructor. Attributes with ~Count on their name denotes how much neuron nodes exist within the layers. After setting the attributes, we call the method **createNetwork()** to establish the nodes, weight lines, and all necessary equations. We are creating this method now.

... private void createNetwork(){ /* create structure */ int nodeCount = 0; this.layers = new ArrayList<>(); //input layer List inLayer = new ArrayList<>(); for(int anni = 0; anni < inputCount; anni++){ inLayer.add(new Neuron(nodeCount++)); } inLayer.add(new Neuron(nodeCount++, true)); layers.add(inLayer); //hidden layer List hidLayer = new ArrayList<>(); for(int anni = 0; anni < hiddenLayerCount; anni++){ hidLayer.add(new Neuron(nodeCount++, new HyperbolicTangent())); } hidLayer.add(new Neuron(nodeCount++, true)); layers.add(hidLayer); //output layer List outLayer = new ArrayList<>(); for(int anni = 0; anni < outputCount; anni++){ outLayer.add(new Neuron(nodeCount++, new Sigmoid())); } layers.add(outLayer); /* create weight table */ Random rand = new Random(); this.weightTable = new double[nodeCount][nodeCount]; for(int i = 0; i < nodeCount; i++){ for(int j = i; j < nodeCount; j++){ if(i == j){ weightTable[i][j] = 0; } else{ weightTable[i][j] = weightTable[j][i] = rand.nextDouble(); } } } /* create error table */ this.errorTable = new double[nodeCount]; } ...

We are representing the layers as **List** containing **List of Neurons**, which we named as **layers**. The method create List of Neurons (layers) one by one, starting with the input layer, and put Neuron objects into them as much as we have specified in the constructor. Those List of Neurons are then added to **layers**. Note that for each layer in the code above, we add a single Neuron object after the iteration process. These single Neurons acted as the **bias** node, which we specify by adding a **true** value to the bias parameter in Neuron class. Finally we create the weight table, populating them with random numbers. Variable **i** and **j** denotes the identity number of each Neuron, so when both i and j are the same value we put 0 in the table (because it makes no sense for a Neuron to have link to itself in the system we are making). We also initialize an array named **errorTable**. We will use this to store the error value calculated in backpropagation process.

Now we create the propagation process. We also put backpropagation process along the way.

... private double propagate(double data[]){ /* PROPAGATE METHOD: PUSH ALL VALUES FROM INPUT TO EXIT */ int layerCount = layers.size(); int nodeCount = inputCount + outputCount + hiddenLayerCount + (layerCount-1); int it; //put values to input layer it = 0; for(Neuron in : layers.get(0)){ if(!in.isBias()) in.setValue(data[it++]); } //start from hidden layer because input doesn't need any calculation for(int i = 1; i < layerCount; i++){ //for each node, get value from previous layer for(Neuron cur : layers.get(i)){ if(!cur.isBias()){ double val = 0; for(Neuron prev : layers.get(i-1)){ val += prev.getValue() * weightTable[prev.getId()][cur.getId()]; } cur.setValue(val); } } } /* BACKPROPAGATION METHOD: PUSH ERROR TO ALL NODES AND ADJUST WEIGHT ACCORDINGLY */ //calculate error for output layer and put it in error table it = 0; for(Neuron out : layers.get(layerCount-1)){ errorTable[out.getId()] = (data[inputCount + it++] - out.getValue()) * out.getValueDerivate(); } //start from layer before output, going backward until just before input, calculate error for(int i = layerCount-2; i >= 0; i--){ for(Neuron cur : layers.get(i)){ //update weight lines for(Neuron next : layers.get(i+1)){ int a = cur.getId(); int b = next.getId(); weightTable[a][b] = weightTable[a][b] + (learningRate * errorTable[b] * cur.getValue()); } //calculate error using backpropagation double err = 0; for(Neuron next : layers.get(i+1)){ err += errorTable[next.getId()] * weightTable[next.getId()][cur.getId()]; } errorTable[cur.getId()] = err * cur.getValueDerivate(); } } /* RETURN MSE */ double totalError = 0; for(int i = inputCount + hiddenLayerCount + (layerCount-1) ; i < nodeCount; i++){ totalError += Math.pow(errorTable[i], 2); } return totalError / outputCount; } ...

This. This is the result of all those theories and formulas crammed into your brain from the previous two sections and the first part. Let us take a closer look.

The first half of this method is the **propagation** part where, we push our input values into the input nodes, and then propagate them all the way to output nodes. As you can see, the process is an iteration where we go through all layers and neurons, get the value of the neuron using our propagation formula and then put it to the neuron. The process is finished when we reached the last neuron in output layer.

The second half is the backpropagation process. We start by calculating the error for output nodes and then put it in the **errorTable**. The next thing we do is using iteration to process each layer, going backward from the hidden layer right before output layer all the way to input layer. In each layer, we iterate again on each neurons inside it and then do the following processes:

- Adjust all the weight values of weight lines coming from the neuron using Weight Value Update Formula.
- Calculate error value for this neuron using Hidden Neuron Error Formula.

We keep doing this until we reached input layer and finished processing all neurons and weight lines in the neural network. Next we calculate the **Mean Squared Error** by adding all output node error values in **errorTable**. Before adding them up, we need to factor the error value to the power of two. This is done to make sure that there is no minus value in error (since error can only be from 0 to 1). Finally we just divided **totalError** with total number of output nodes.

We almost completed our Neural Network class. There are two more methods that we need to make: **train()** and **predict**.

... public double[] train(double data[][]){ int dataCount = data.length; double[] error = new double[dataCount]; for(int i = 0; i < dataCount; i++){ if(inputCount + outputCount == data[i].length){ error[i] = propagate(data[i]); } else{ System.out.println("Error in data. Input and output number does not match"); break; } } return error; } ...

**train()** is a method which takes a two-dimensional array. Each row for the array contains collection of input values and target values, with input values on the left side and target values on the right side. This method actually only call **propagate()** method again and again depending on how much rows of data is entered as parameter. Each error value resulting from propagate method is saved in **error** array so we can know the error rate for each propagation. Hopefully, as more data is trained within the neural network, the weight values would get better suited with the problemset and error rate would go down.

... public double[] predict(double input[]){ double[] output = new double[outputCount]; int layerCount = layers.size(); int it; //put values to input layer it = 0; for(Neuron in : layers.get(0)){ if(!in.isBias()) in.setValue(input[it++]); } //start from hidden layer because input doesn't need any calculation for(int i = 1; i < layerCount; i++){ //for each node, get value from previous layer for(Neuron cur : layers.get(i)){ if(!cur.isBias()){ double val = 0; for(Neuron prev : layers.get(i-1)){ val += prev.getValue() * weightTable[prev.getId()][cur.getId()]; } cur.setValue(val); } } } //gave output it = 0; for(Neuron out : layers.get(layerCount - 1)){ output[it++] = out.getValue(); } return output; } }

Our last method is the **predict()** method. You might realize that **predict()** is a copy of the first half of **propagate()** method, namely the propagation phase. We do not use propagate() method because there is no need to do backpropagation. When we use **predict()**, we assume that our neural network has been stabilized for use. The input parameter for this method contains only input values, with no target values at hand. Therefore, there is no way to know the error rate for the result. This is okay because we are sure that the neural network is now "correct". Whatever the result is, we will take it as the final result.

## Training and Predicting with the Neural Network

Our artificial neural network is now ready to be used. Here is a sample usage. Put this code in your **main()** function.

import com.rd.ann.NeuralNetwork; import com.rd.ann.Neuron; import com.rd.ann.functions.HyperbolicTangent; public class main { public static void main(String[] args){ int dataCount = 50000; int inputCount = 2; int outputCount = 1; int hiddenCount = 3; NeuralNetwork ann = new NeuralNetwork(inputCount, outputCount, hiddenCount, 0.5); double[][] data = new double[dataCount][inputCount + outputCount]; for(int i = 0; i < dataCount; i+=4){ data[i] = new double[]{1, 1, 1}; data[i+1] = new double[]{0, 0, 0}; data[i+2] = new double[]{0, 1, 0}; data[i+3] = new double[]{1, 0, 0}; } double[] rmsList = ann.train(data); double[] output = ann.predict(new double[]{0, 0}); System.out.println("RMS:"); for(double rms : rmsList){ System.out.println(rms); } System.out.println("Output:"); for(double out : output){ System.out.println(out); } } }

This code produces a sample of 50,000 data, with 2 input values and 1 output values. The data is a simple AND operation, with output value being 1 if both input values are 1. In other cases, the output value is 0. We repeat these 4 data again and again until the entire **data** array is filled. We then put this array into **train()** method of our artificial neural network (named **ann** in this example) and take the error rates as an array named **rmsList**. This process trained our neural network to this problem set. Now our neural network will follow the AND rules.

To give it a test, we call the **predict()** method with a parameter of an array containing input values. In this code, it will return value reaaaaally close to 0. Give it a try with other combination of input values, like {1, 1} or {0, 1}. Neural network would very rarely, if not impossible, gave an exact prediction value like 0 or 1. In order to do that, we need to process the return data accordingly, for example converting values higher than 0.5 into 1.

You can try other combination of training data to make neural network do whatever we want. For example, you can tweak the training data so neural network follows the rule of XOR instead of AND. One thing to note is that the higher our number of training data is, the more accurate our neural network become. A set of training data of only 10 entries would make a poor neural network. In our case, 50,000 would produce a really accurate network although you can try lower number and find that the accuracy does not decrease much. It is up to experimentation to find the good balance between accuracy and training data number!

## Afterword

This ends our tutorial of Artificial Neural Network. In this tutorial, we learn about the way artificial neural network is constructed as well as how to code it in an object oriented programming language that is Java. As we have said in the first part, this is merely a shallow step in learning Artificial Neural Network. More advanced techniques and theories are available, which is already out of the scope of this tutorial. We encourage you to learn more by yourself if you are interested in this subject.

We have also enclosed the source code described in this tutorial in .zip format. You can download it in the attachment link below.

Hopefully this tutorial can serve as the starting point for your wonderful journey in mastering the subject of Artificial Neural Network.