This is how an image generator in Tensorflow works?

Before discussing what is an image generator one should be aware of Convolutional Neural Network.

In each image there would be a lot of wasted space, it will be interesting to see if there was a way that we could condense the image down to the important features that distinguish what makes it a shoe, or a handbag, or a shirt. That’s where convolutions come in any kind of image processing, it usually involves having a filter and passing that filter over the image in order to change the underlying image. The process works a little bit like this, for every pixel, take its value, and take a look at the value of its neighbors. The idea here is that some convolutions will change the image in such a way that certain features in the image get emphasized.

Now, that’s a very basic introduction to what convolutions do, and when combined with something called pooling, they can become really powerful. But simply, pooling is a way of compressing an image. A quick and easy way to do this is to go over the image of four pixels at a time i.e., the current pixel and its neighbors underneath and to the right of it. Of these four, pick the biggest value and just keep that. So, for example, my 16 pixels on the left are turned into the four pixels, by looking at them in two-by-two grids and picking the biggest value. This will preserve the features that were highlighted by the convolution, while simultaneously quartering the size of the image. We have horizontal and vertical axes.

We don’t have to do all the maths for filtering and compressing, we will simply define convolutional and pooling layers to do the job for us. So now let’s take a look at convolutions and pooling in code.

model = tf.keras.models.Sequential([
    # Note the input shape is the desired size of the image 150x150 with 3 bytes color
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2), 
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'), 
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(), 
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'), 
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('cats') and 1 for the other ('dogs')
    tf.keras.layers.Dense(1, activation='sigmoid')  
])

Every input image is 300×300 pixels, with 3 bytes to define a color(RBG, if it’s the grey image we’ll specify 1 byte). Pooling helps to reduce the information in an image while maintaining features.

Flatten( one of the Neural Network layer) takes input and turns it into a simple linear array. 

The interesting stuff happens in the middle layer, sometimes also called a hidden layer. 

This model.summary method allows you to inspect the layers of the model, and see the journey of the image through the convolutions, and here is the output.

The model.summary() method call, prints a summary of the Neural Network.

model.summary()

The “output shape” column shows how the size of your feature map evolves in each successive layer. The convolution layers reduce the size of the feature maps by a bit due to padding, and each pooling layer halves the dimensions.

You looked at convolutions and got a glimpse for how they worked. By passing filters over an image to reduce the amount of information, they then allowed the neural network to effectively extract features that can distinguish one class of image from another. You also saw how pooling compresses the information to make it more manageable. This is a really nice way to improve our image recognition performance.

The algorithms we are learning is really the real stuff that is used today in many commercial applications. For example, if you look at the way a real self-driving car today uses cameras to detect other vehicles or pedestrians to try to avoid them, they use convolutional neural networks for that part of the task, very similar to what you are learning. And in fact, in other contexts, using a convolutional neural network for example, we can take a picture of a crop and try to tell if it has a disease coming.

Image Generator in Tensor Flow:

One of the features of the image generator in TensorFlow is that you can point it at a directory and then the sub-directories of that will automatically generate labels for you. So for example, consider this directory structure. You have an image directory and in that, you have sub-directories for training and validation. When you put sub-directories in these for horses and humans and store the requisite images in there, the image generator can create a feeder for those images and auto label them for you. So for example, if I point an image generator at the training directory, the labels will be horses and humans and all of the images in each directory will be loaded and labeled accordingly (Each epoch is loading the data, calculating the convolutions and then trying to match the convolutions to labels.)

Convolutions improve image recognition by isolating the features in images. Applying Convolutions on top of our Deep neural network effects training in many ways it depends on many factors. It might make your training faster or slower, and a poorly designed Convolutional layer may even be less efficient than a plain Deep Neural Network!

Sometimes our training data is close to 1.000 accuracy, but our validation data isn’t, what’s the risk here are probably we were over-fitting on your training data ( this means our model just memorizing the things. Even though our model has 100% training accuracy it cannot predict correctly which it doesn’t train on.) For example, we trained our model to distinguish what is human and what is a horse if it came through some outlier, model will fail to predict the correct one.

Our model may fail to label this one correctly because we too don’t know what to call it as a horse or human or else horseman?

Convolutional Neural Networks are better for classifying images like horses and humans because, In these images, the features may be in different parts of the frame. There’s a wide variety of horses. There’s a wide variety of humans

If we reduce the size of the images, the training results will be different because we removed some convolutions to handle the smaller images. (We need to adjust how many convents i.e., convolutional neural network are needed for the desired accuracy and also we need to adjust neurons accordingly there’s no other way to choose these but some good practices exist.

So, far we have learned how to use TensorFlow to implement a basic neural network, going up all the way to basic Convolutional Neural Network.

So with a smaller data set, you are at great risk of over-fitting; with a larger data set, then you have less risk of over-fitting, but over-fitting can still happen.

We’ll learn another method for dealing with over-fitting, which is that TensorFlow provides very easy to use tools for data augmentation, where you can, for example, take a picture of a cat, and if you take the mirror image of the picture of a cat, it still looks like a cat. So why not do that, and throw that into the training set. Or for example, you might only have upright pictures of cats, but if the cat’s lying down, or it’s on its side, then one of the things you can do is rotate the image. So it’s like part of the image augmentation, rotation, skewing, flipping, moving it around the frame. One of the things I find really fascinating about it is particularly if you’re using a large public data set, is then you flow all the images off directly, and the augmentation happens as it’s flowing. So you’re not editing the images themselves directly. You’re not changing the data set. It all just happens in memory. This is all done as part of TensorFlow’s Image Generation.

So then another strategy, of course for avoiding over-fitting, is to use existing models and to have transfer learning. Yeah. So I don’t think anyone has as much data as they wish, for the problems we really care about. So Transfer Learning, lets you download the neural network, that maybe someone else has trained on a million images, or even more than a million images. 

So take an inception network, that someone else has trained, download those parameters, and use that to bootstrap your learning process, maybe with a smaller data set. That has been able to spot features that you may not have been able to spot in your data set, so why not be able to take advantage of that and speed-up training yours. So use transfer learning and TensorFlow lets you do that easily.