Automatic differentiation and gradient tape

In this tutorial we will cover automatic differentiation, a key technique for optimizing machine learning models.

Setup

We will use the TensorFlow R package:

library(tensorflow)

Gradient Tapes

TensorFlow provides the tf$GradientTape API for automatic differentiation - computing the gradient of a computation with respect to its input variables.

Tensorflow “records” all operations executed inside the context of a tf$GradientTape onto a “tape”. Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a “recorded” computation using reverse mode differentiation.

For example:

## tf.Tensor(
## [[8. 8.]
##  [8. 8.]], shape=(2, 2), dtype=float32)

You can also request gradients of the output with respect to intermediate values computed during a “recorded” tf$GradientTape context.

## tf.Tensor(8.0, shape=(), dtype=float32)

By default, the resources held by a GradientTape are released as soon as GradientTape$gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method as resources are released when the tape object is garbage collected. For example:

## tf.Tensor(108.0, shape=(), dtype=float32)
## tf.Tensor(6.0, shape=(), dtype=float32)
rm(t)  # Drop the reference to the tape

Recording control flow

Because tapes record operations as they are executed, R control flow (using ifs and whiles for example) is naturally handled:

## tf.Tensor(12.0, shape=(), dtype=float32)
## tf.Tensor(12.0, shape=(), dtype=float32)
## tf.Tensor(4.0, shape=(), dtype=float32)

Higher-order gradients

Operations inside of the GradientTape context manager are recorded for automatic differentiation. If gradients are computed in that context, then the gradient computation is recorded as well. As a result, the exact same API works for higher-order gradients as well. For example:

## tf.Tensor(3.0, shape=(), dtype=float32)
## tf.Tensor(6.0, shape=(), dtype=float32)