Hi,

Over the past two weeks, we've learned how to build highly efficient data pipelines for training deep neural networks with TensorFlow and the "tf.data" module.

However, one thing we haven't discussed is how to apply data augmentation inside of a "tf.data" training pipeline.

Applying data augmentation when training a neural network to be deployed to a real-world environment is a must. Data augmentation improves the ability of the model to generalize, and therefore to make better, more accurate predictions.

In today's brand new tutorial, I'll show you two methods to incorporate data augmentation into your "tf.data" pipeline.

Image

The big picture: As we've shown in the past two tutorials in this series, the "tf.data" module is >4x faster than Keras' "ImageDataGenerator" for loading and processing datasets.

However, one of the benefits of "ImageDataGenerator" is that it has data augmentation built in. We just set a few parameters and data augmentation is automatically applied.

That's not the case with "tf.data."

Instead, we need to define the sequence of augmentation operations ourselves. Luckily, that's not as hard as it sounds.

How it works: There are two ways to apply data augmentation using "tf.data."

  1. Using the "Sequential" class and the "preprocessing" module: This is the most similar to "ImageDataGenerator," and therefore the easiest to use.
  2. Utilizing built-in TensorFlow operations inside the "tf.image" module: This method is a little more tedious but gives you full control over every single operation applied to the image.

My thoughts: For most deep learning practitioners the first method is sufficient. It's easy to use and achieves essentially the same result of "ImageDataGenerator."

Yes, but: You guessed it — applying data augmentation with "tf.data" requires more code than just calling "ImageDataGenerator" itself. However, the speedup you obtain in data throughput is well worth the additional effort.

Stay smart: Data augmentation is a must — it improves the ability of your model to generalize and make better, more accurate predictions on data it hasn't seen before.

Furthermore, "tf.data" is the key to building fast, efficient data pipelines that allow you to train your neural networks in a fraction of the time.

Yes, you will have to write a bit more code to leverage both together, but it's 100% worth it.

Click here to learn how to incorporate data augmentation into your "tf.data" pipeline.


Adrian Rosebrock
Chief PyImageSearcher