Myths About Data Augmentation
Explaining the actual/right meaning of Data Augmentation, its variations, and dispelling myths

Data Augmentation is a very important topic of deep learning, it is the backbone of multiple deep learning which are generalized in nature. It enables deep learning models to broaden their ability to learn about the images in detail.
Data Augmentation major use is the field of images, the models which are mostly related to Image classification are supposed to Data Augmentation so that better results are achieved in terms of accuracy, etc. Some of the examples of Deep Learning Models which are mostly used for image classification are CNN (Convolutional Neural Network), Le-Net, ResNet, etc.
What is Data Augmentation?
It is a technique of generating data with random features as specified by the user from the original data. For example, Look at the thumbnail image of this course, it is having one dog at left, consider that to be the original image, that image is of a dog & that is confirmed. But, what about the image at the right part, that is also of a dog. To identify this, it is a very easy task for humans because their brain understands images in a perfect manner, but computers are best at numbers, that is why the image is converted into pixels. That is why it is very difficult for the computer to identify variations.
Here, Data Augmentation comes into the picture, the actual work of Data Augmentation is to generate variety from the dataset like the one shown in the thumbnail image. New Data can be generated from multiple methods/variations which are provided the inbuilt functions(discussed later in this blog).
Some example generation of new data can be by rotating the image by some angle, flipping the image, increasing/decreasing the brightness of the image, etc. One more augmented data example image is shown below.

This generalization benefit of Data Augmentation helps any model to become more generalized, & this is the reason behind the increased validation accuracy of a model. It is possible that after using Data Augmentation, the training accuracy of the model might decrease as compared to a model which is not using Data Augmentation, & the reason behind it is the data provided to the model is not redundant, data augmentation provides random data which the model has not seen before & that is our main goal also while developing a model. Without Data Augmentation, the dataset can contain data that hardly differs, in that case, it is sure that the training accuracy of the model will be very high but that leads to overfitting also & low generalization.
Therefore, Data Augmentation is the key to make the model generalized & it is highly recommended to use in Deep Learning Models.
The most popular way to implement Data Augmentation is by using the ImageDataGenerator function present in the Library “keras.preprocessing.image”.
The myth about Data Augmentation!
The biggest myth about Data Augmentation that is even believed by most of the corporate professionals also is that Data Augmentation provides random data with the original data to the Model to train. This is absolutely wrong. The right working of data augmentation is by providing only the random generated data from the original data to the model using generators, original data is never provided to the Model to train so that the model can be more generalized by the randomly generated data.
Types of Data Augmentation
It can be classified into 3 categories:
- Data Expansion
- Data Augmentation
- Data Expansion with Data Augmentation
Based on the requirement & use-case, anyone among the 3 categories is used.
Data Expansion
This type of data augmentation is very rarely used. In this category, the data present is very less, therefore with that fewer data, its variations are generated & the combined data is feed to the Model for training.
This approach is not very efficient because even though augmented data is generated, but in reality, the real data is very less by which augmented data is generated, therefore the model will not become much efficient & generalized.
Data Augmentation
This is the most popular approach, in this approach, augmented data is generated & with that data, the model is trained which leads to the creation of a generalized trained model.
Data Expansion with Data Augmentation
In this approach, both of the above-explained approaches are combined & rest every concept remains the same.
This concludes the explanation for this respective blog!
I hope my article explains each and everything related to the topic with all the deep concepts and explanations. Thank you so much for investing your time in reading my blog & boosting your knowledge. If you like my work, then I request you to give an applaud to this blog!