Hi,

When I was 13 years old I remember two movies having a big impact on me — the first being Gone in 60 Seconds, and the second, Fast & Furious.

The common theme between the two?

I was a kid only a few short years away from getting my license and all I could think about was getting a car and driving fast, just like Nicholas Cage, Vin Diesel, and Paul Walker...

...except that when I turned 16, I didn't get a super sweet, souped-up Honda Civic or a Dodge Charger with 400+ horsepower.

No, instead, I got my mom's cherry red 1996 Ford Windstar. That baby pulled along its hefty 4200-lb weight with an ultra slow 4-speed automatic, sporting a very modest 3.8L V6.

That van could go 0-60 in about 12 seconds...provided that the rusted out floorboards didn't rattle, shake, and disintegrate to the point where I dropped out the bottom of the car.

The big picture: No, this email isn't about how I was the laughing stock of my high school in my mom's minivan (okay, maybe it is just a little bit…)

But as you already know, training a deep neural network on a large dataset can take a long time, so anything we can do to improve the data loading/preprocessing pipeline can dramatically improve the wall time it takes to train a network.

Up until the release of TensorFlow v2, we primarily used Keras' ImageDataGenerator class to build data pipelines — this is the equivalent of trying to drag race my mom's 1996 Ford Windstar minivan.

But with TensorFlow v2, we have a PyTorch-like tf.data submodule that can lead to tremendous improvements in data pipeline speed (up to 38x faster, in fact):

Image

The new tf.data module is like taking a brand new Dodge Hellcat Challenger and putting four nitrous oxide boosters in the trunk — but unlike four NOS boosters, you won't have to worry about blowing yourself up when using tf.data.

Image

How it works: Under the hood, the tf.data module implements multi-threading and multi-processing methods to take full advantage of your CPU, ensuring data batches are always ready, pre-fetched, and cached for your neural network.

Essentially, whenever your network needs another batch of data, tf.data ensures it's there!

My thoughts: The tf.data module is a no-brainer option to improve the speed of your TensorFlow data pipelines. If you're not already using it now, you should be.

Yes, but: The biggest downside with replacing an ImageDataGenerator with tf.data is that it does take 10-20 lines more code at minimum.

Arguably most important is that data augmentation can become more of a hassle (which I'll discuss in a couple weeks).

That said, the performance gains are well worth it.

Stay smart: Don't train neural networks with a 1996 Ford Windstar with rusted out floorboards.

Instead, upgrade to tf.data — you'll feel like you're driving a Maserati (and you won't have to shell out the $100K+ price tag either).

Click here to learn how to use tf.data to improve your data pipeline speeds

Go Deeper: This lesson is part of PyImageSearch University, the best place to master computer vision and deep learning.  

If you are already a customer, you can find the video, Colab notebook, code, and assessment for this lesson at A gentle introduction to tf.data with TensorFlow which is part of the Deep Learning 125 — Data Pipelines with tf.data course.

If you are not a member, you can join here.
 
Adrian Rosebrock
Chief PyImageSearcher

P.S. If you're interested in learning how to successfully apply deep learning to your own projects, I would recommend reading my book, Deep Learning for Computer Vision with Python.

Inside the book you'll find:
  • Super-practical walkthroughs that present solutions to actual real-world image classification (ResNet, VGG, etc.), object detection (Faster R-CNN, SSDs, RetinaNet, etc.), and segmentation (Mask R-CNN) problems
  • Hands on tutorials (with lots of code) that show you not only the algorithms behind deep learning for computer vision but their implementations as well.
  • A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition
If you're interested in learning more about the book, I'd be happy to send you a PDF containing the Table of Contents and a few sample chapters:

Click here to grab the PDF of sample chapters and Table of Contents

After clicking the above link, you'll receive a separate email with the PDF in a few short moments.