Ragged tensors

Overview

Your data comes in many shapes; your tensors should too. Ragged tensors are the TensorFlow equivalent of nested variable-length lists. They make it easy to store and process data with non-uniform shapes, including:

  • Variable-length features, such as the set of actors in a movie.
  • Batches of variable-length sequential inputs, such as sentences or video clips.
  • Hierarchical inputs, such as text documents that are subdivided into sections, paragraphs, sentences, and words.
  • Individual fields in structured inputs, such as protocol buffers.

What you can do with a ragged tensor

Ragged tensors are supported by more than a hundred TensorFlow operations, including math operations (such as tf$add and tf$reduce_mean), array operations (such as tf$concat and tf$tile), string manipulation ops (such as tf$substr), and many others:

library(tensorflow)
digits <- tf$ragged$constant(
  list(list(3, 1, 4, 1), list(), list(5, 9, 2), list(6), list())
)
words = tf$ragged$constant(
  list(list("So", "long"), list("thanks", "for", "all", "the", "fish"))
)
tf$add(digits, 3)
## tf.RaggedTensor(values=Tensor("Add_1:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))
## Tensor("RaggedReduceMean/truediv:0", shape=(5,), dtype=float32)
tf$concat(list(digits, list(list(5, 3))), axis=0L)
## tf.RaggedTensor(values=Tensor("RaggedConcat/concat:0", shape=(10,), dtype=float32), row_splits=Tensor("RaggedConcat/concat_1:0", shape=(7,), dtype=int64))
## tf.RaggedTensor(values=Tensor("RaggedTile/Tile:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedTile/concat_1:0", shape=(6,), dtype=int64))
tf$strings$substr(words, 0L, 2L)
## tf.RaggedTensor(values=Tensor("Substr_1:0", shape=(7,), dtype=string), row_splits=Tensor("RaggedConstant_1/RaggedFromRowSplits/row_splits:0", shape=(3,), dtype=int64))

There are also a number of methods and operations that are specific to ragged tensors, including factory methods, conversion methods, and value-mapping operations.

As with normal tensors, you can use R-style indexing to access specific slices of a ragged tensor. For more information, see the section on Indexing below.

## Tensor("RaggedGetItem/strided_slice_5:0", shape=(4,), dtype=float32)
## tf.RaggedTensor(values=Tensor("RaggedGetItem_1/GatherV2:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedGetItem_1/RaggedRange:0", shape=(6,), dtype=int64))
## Warning: Negative numbers are interpreted python-style when subsetting tensorflow tensors.(they select items by counting from the back). For more details, see: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#basic-slicing-and-indexing
## To turn off this warning, set 'options(tensorflow.extract.warn_negatives_pythonic = FALSE)'
## tf.RaggedTensor(values=Tensor("RaggedGetItem_2/GatherV2:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedGetItem_2/RaggedRange:0", shape=(6,), dtype=int64))

And just like normal tensors, you can use Python arithmetic and comparison operators to perform elementwise operations. For more information, see the section on Overloaded Operators below.

## tf.RaggedTensor(values=Tensor("Add_3:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))
digits + tf$ragged$constant(list(list(1, 2, 3, 4), list(), list(5, 6, 7), list(8), list()))
## tf.RaggedTensor(values=Tensor("Add_5:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedTile_1/concat_1:0", shape=(6,), dtype=int64))

If you need to perform an elementwise transformation to the values of a RaggedTensor, you can use tf$ragged$map_flat_values, which takes a function plus one or more arguments, and applies the function to transform the RaggedTensor’s values.

## tf.RaggedTensor(values=Tensor("Add_6:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))

Constructing a ragged tensor

The simplest way to construct a ragged tensor is using tf$ragged$constant, which builds the RaggedTensor corresponding to a given nested list:

sentences <- tf$ragged$constant(list(
    list("Let's", "build", "some", "ragged", "tensors", "!"),
    list("We", "can", "use", "tf.ragged.constant", ".")))
paragraphs <- tf$ragged$constant(list(
    list(list('I', 'have', 'a', 'cat'), list('His', 'name', 'is', 'Mat')),
    list(list('Do', 'you', 'want', 'to', 'come', 'visit'), list("I'm", 'free', 'tomorrow'))
))
paragraphs
## tf.RaggedTensor(values=tf.RaggedTensor(values=Tensor("RaggedConstant_4/values:0", shape=(17,), dtype=string), row_splits=Tensor("RaggedConstant_4/RaggedFromRowSplits/row_splits:0", shape=(5,), dtype=int64)), row_splits=Tensor("RaggedConstant_4/RaggedFromRowSplits_1/row_splits:0", shape=(3,), dtype=int64))

Ragged tensors can also be constructed by pairing flat values tensors with row-partitioning tensors indicating how those values should be divided into rows, using factory classmethods such as tf$RaggedTensor$from_value_rowids, tf$RaggedTensor$from_row_lengths, and tf$RaggedTensor$from_row_splits.

tf$RaggedTensor$from_value_rowids

If you know which row each value belongs in, then you can build a RaggedTensor using a value_rowids row-partitioning tensor:

tf$RaggedTensor$from_value_rowids(
    values=as.integer(c(3, 1, 4, 1, 5, 9, 2, 6)),
    value_rowids=as.integer(c(0, 0, 0, 0, 2, 2, 2, 3)))
## tf.RaggedTensor(values=Tensor("RaggedFromValueRowIds/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromValueRowIds/concat:0", shape=(5,), dtype=int64))

tf.RaggedTensor.from_row_lengths

If you know how long each row is, then you can use a row_lengths row-partitioning tensor:

## tf.RaggedTensor(values=Tensor("RaggedFromRowLengths/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromRowLengths/concat:0", shape=(5,), dtype=int64))

tf.RaggedTensor.from_row_splits

If you know the index where each row starts and ends, then you can use a row_splits row-partitioning tensor:

row_splits

row_splits

## tf.RaggedTensor(values=Tensor("RaggedFromRowSplits/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromRowSplits/row_splits:0", shape=(5,), dtype=int64))

See the tf.RaggedTensor class documentation for a full list of factory methods.

What you can store in a ragged tensor

As with normal Tensors, the values in a RaggedTensor must all have the same type; and the values must all be at the same nesting depth (the rank of the tensor):

tf$ragged$constant(list(list("Hi"), list("How", "are", "you"))) # ok: type=string, rank=2
## tf.RaggedTensor(values=Tensor("RaggedConstant_5/values:0", shape=(4,), dtype=string), row_splits=Tensor("RaggedConstant_5/RaggedFromRowSplits/row_splits:0", shape=(3,), dtype=int64))
tf$ragged$constant(list(list("one", "two"), list(3, 4))) # bad: multiple types
tf$ragged$constant(list("A", list("B", "C"))) # bad: multiple nesting depths

This is a small introduction to Ragged Tensors in TensorFlow. See the complete tutorial (in Python) here.