In this installment, we will train an image recognition model that can classify an image, e.g. a scanned copy, and tell if it is an Australian ID, e.g. driving license or visa scan, etc.


We will use an approach called Transfer Learning. In this approach, you take an existing Convolutional Neural Network and retrain its last few layers. Think of it this way. You’ve already got a network that can detect differences of aeroplane from a dog, but you need to retrain it to pick more subtle differences, i.e. the difference between a scanned invoice and a scanned passport.

TensorFlow is based on the concept of a tensor which is a mathematical vector that contains the features of an image. We will grab the penultimate layer of tensors and retrain it with some sample images of a Medicare card, an Australian Visa and Victoria Driver’s license.

Once the model is trained, we will use a simple Support Vector Machine and classify and predict the likelihood of the uploaded image to be an Australian ID. The output of the SVC classifier will be a predicted class along with a likelihood probability. Example:

(Visa, 0.83)Model thinks 83% the image is that of an Australian Visa
(Medicare, 0.89)89%, it is a Medicare
(License, 0.45)45% it is a license

If the confidence percentage is low, it means that image is not in the class of our interest, e.g. in the last example the uploaded image is most likely not a license. As a rule of thumb, a probability of 0.80 is good mark for the prediction to be reliable.

Training Pool

Below are the screenshots of the samples that I used as a training for my image classification model. As you can see, images differ in terms of angles, positioning, colours, etc. system can still learn based on important properties and disregard irrelevant properties.

Australia Visa Image

Image 1: Australian Visa Training Set


Medicare training set

Image 2: Medicare Training Set


victoria Driver's License

Image 3: Victoria Driver’s License Training Set


Training Phase

The training procedure involves categorising all the training images into a folder which is named after their class. As you can see in the screenshots above, the Windows folders are named after the class, i.e. DriversLicense, Medicare and Visa

We then iterate over all these images and pass them to the penultimate layer of TensorFlow which gives us a feature tensor (a 2048 dimensional array of that image), we then label the image with its respective class.

Support Vector Machine

Once we have the feature tensor and label of every image, our training dataset is complete and we feed it to a Support Vector Machine and train the model. To save time, I pickled the model so that it can be reused for all predictions.

I know some of this terminology may be new to you, but in the next post, I will explain the architecture and some sample code that generates the predictions. Then it will start falling in place.

See you then!