First of all, prepare to execute the python script by installing the proper libraries in the Anaconda terminal:

1 2 3 |
conda install pandas pip install tensorflow pip install keras |

If you have problems to install tensorflow, try to do it with the following commands (according to anaconda.com)

1 2 |
conda create -n tensorflow_env tensorflow conda activate tensorflow_env |

I commented the next python code, taken from the book of Sandro Skansi “*Introduction to Deep Learning. From Logical Calculus to Artificial Intelligence*” (pag. 103, 104, 105).

The scenario is that we have a webshop selling books and other stuff, and we want to know whether a customer will abandon a shopping basket at checkout. This is why we are making a neural network to predict it.

It takes *data.csv* for the training and test phase and then applies the resulting algorithm to predict the new data of *new_data.csv,* which contains the same columns of *data.csv* but without the target value. All neurons will be having the logistic activation functions. It will show the prediction accuracy in the terminal.

This is *data.csv*:

includes_a_book,purchase_after_21,total,user_action

1,1,13.43,1

1,0,23.45,1

0,0,45.56,0

1,1,56.43,0

1,0,44.44,0

1,1,667.65,1

1,0,56.66,0

0,1,43.44,1

0,0,4.98,1

1,0,43.33,0

This is *new_data.csv*

includes_a_book,purchase_after_21,total

1,0,73.75

0,0,64.97

1,0,3.78

0,0,60

Main code, called *ffnn.py*:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
import pandas as pd import numpy as np from keras.models import Sequential from keras.layers.core import Dense TARGET_VARIABLE ="user_action" # the variable we want to predict TRAIN_TEST_SPLIT=0.5 # training part / test part HIDDEN_LAYER_SIZE=30 raw_data = pd.read_csv("data.csv") mask = np.random.rand(len(raw_data)) < TRAIN_TEST_SPLIT # select a random set of data, maintaining the train split rate tr_dataset = raw_data[mask] # get train dataframe the original Pandas dataframe te_dataset = raw_data[~mask] # get test dataframe the original Pandas dataframe # conversions of dataframes into Numpy array needed from Keras tr_data = np.array(raw_data.drop(TARGET_VARIABLE,axis=1)) tr_labels = np.array(raw_data[[TARGET_VARIABLE]]) te_data = np.array(te_dataset.drop(TARGET_VARIABLE,axis=1)) te_labels = np.array(te_dataset[[TARGET_VARIABLE]]) ffnn = Sequential() # Sequential model is a linear stack of layers. New layers are added via the .add() method # Size of hidden layer is set (= HIDDEN_LAYER_SIZE) # Size of input layer is set (=3D vectors as single data inputs, since only 3 columns are used as inputs) # The activation function is set as logistic nonlinearity ("sigmoid") ffnn.add(Dense(HIDDEN_LAYER_SIZE, input_shape=(3,),activation="sigmoid")) ffnn.add(Dense(1, activation="sigmoid")) # create 1 output neuron (also having logistic activation function) # error function is set, "stochastic gradient descent" is choose as optimizer and the accuracy is the metric to calculate: ffnn.compile(loss="mean_squared_error", optimizer="sgd", metrics=['accuracy']) # verbose = 1 means accuracy and loss are printed after each epoch of training ffnn.fit(tr_data, tr_labels, epochs=150, batch_size=2,verbose=1) metrics = ffnn.evaluate(te_data, te_labels, verbose=1) print("%s: %.2f%%" % (ffnn.metrics_names[1],metrics[1]*100)) new_data = np.array(pd.read_csv("new_data.csv")) results = ffnn.predict(new_data) # predicted values for the column TARGET_VARIABLE are printed, according to the new # inputs given in new_data print(results) |

Execute if by typing in the Anaconda terminal:

1 |
python ffnn.py |

Screenshot of result with 150 epochs:

Screenshot of result with 300 epochs:

The results are expected to vary since the input training samples are taken randomly.

## Leave a Reply