Ragged tensors
Overview
Your data comes in many shapes; your tensors should too. Ragged tensors are the TensorFlow equivalent of nested variable-length lists. They make it easy to store and process data with non-uniform shapes, including:
- Variable-length features, such as the set of actors in a movie.
- Batches of variable-length sequential inputs, such as sentences or video clips.
- Hierarchical inputs, such as text documents that are subdivided into sections, paragraphs, sentences, and words.
- Individual fields in structured inputs, such as protocol buffers.
What you can do with a ragged tensor
Ragged tensors are supported by more than a hundred TensorFlow operations,
including math operations (such as tf$add
and tf$reduce_mean
), array operations
(such as tf$concat
and tf$tile
), string manipulation ops (such as
tf$substr
), and many others:
library(tensorflow)
digits <- tf$ragged$constant(
list(list(3, 1, 4, 1), list(), list(5, 9, 2), list(6), list())
)
words = tf$ragged$constant(
list(list("So", "long"), list("thanks", "for", "all", "the", "fish"))
)
tf$add(digits, 3)
## tf.RaggedTensor(values=Tensor("Add_1:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))
## Tensor("RaggedReduceMean/truediv:0", shape=(5,), dtype=float32)
## tf.RaggedTensor(values=Tensor("RaggedConcat/concat:0", shape=(10,), dtype=float32), row_splits=Tensor("RaggedConcat/concat_1:0", shape=(7,), dtype=int64))
## tf.RaggedTensor(values=Tensor("RaggedTile/Tile:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedTile/concat_1:0", shape=(6,), dtype=int64))
## tf.RaggedTensor(values=Tensor("Substr_1:0", shape=(7,), dtype=string), row_splits=Tensor("RaggedConstant_1/RaggedFromRowSplits/row_splits:0", shape=(3,), dtype=int64))
There are also a number of methods and operations that are specific to ragged tensors, including factory methods, conversion methods, and value-mapping operations.
As with normal tensors, you can use R-style indexing to access specific slices of a ragged tensor. For more information, see the section on Indexing below.
## Tensor("RaggedGetItem/strided_slice_5:0", shape=(4,), dtype=float32)
## tf.RaggedTensor(values=Tensor("RaggedGetItem_1/GatherV2:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedGetItem_1/RaggedRange:0", shape=(6,), dtype=int64))
## Warning: Negative numbers are interpreted python-style when subsetting tensorflow tensors.(they select items by counting from the back). For more details, see: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#basic-slicing-and-indexing
## To turn off this warning, set 'options(tensorflow.extract.warn_negatives_pythonic = FALSE)'
## tf.RaggedTensor(values=Tensor("RaggedGetItem_2/GatherV2:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedGetItem_2/RaggedRange:0", shape=(6,), dtype=int64))
And just like normal tensors, you can use Python arithmetic and comparison operators to perform elementwise operations. For more information, see the section on Overloaded Operators below.
## tf.RaggedTensor(values=Tensor("Add_3:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))
## tf.RaggedTensor(values=Tensor("Add_5:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedTile_1/concat_1:0", shape=(6,), dtype=int64))
If you need to perform an elementwise transformation to the values of a RaggedTensor, you can use tf$ragged$map_flat_values
, which takes a function plus one or more arguments, and applies the function to transform the RaggedTensor’s values.
## tf.RaggedTensor(values=Tensor("Add_6:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))
Constructing a ragged tensor
The simplest way to construct a ragged tensor is using
tf$ragged$constant
, which builds the
RaggedTensor
corresponding to a given nested list
:
sentences <- tf$ragged$constant(list(
list("Let's", "build", "some", "ragged", "tensors", "!"),
list("We", "can", "use", "tf.ragged.constant", ".")))
paragraphs <- tf$ragged$constant(list(
list(list('I', 'have', 'a', 'cat'), list('His', 'name', 'is', 'Mat')),
list(list('Do', 'you', 'want', 'to', 'come', 'visit'), list("I'm", 'free', 'tomorrow'))
))
paragraphs
## tf.RaggedTensor(values=tf.RaggedTensor(values=Tensor("RaggedConstant_4/values:0", shape=(17,), dtype=string), row_splits=Tensor("RaggedConstant_4/RaggedFromRowSplits/row_splits:0", shape=(5,), dtype=int64)), row_splits=Tensor("RaggedConstant_4/RaggedFromRowSplits_1/row_splits:0", shape=(3,), dtype=int64))
Ragged tensors can also be constructed by pairing flat values tensors with
row-partitioning tensors indicating how those values should be divided into
rows, using factory classmethods such as tf$RaggedTensor$from_value_rowids
,
tf$RaggedTensor$from_row_lengths
, and
tf$RaggedTensor$from_row_splits
.
tf$RaggedTensor$from_value_rowids
If you know which row each value belongs in, then you can build a RaggedTensor
using a value_rowids
row-partitioning tensor:
tf$RaggedTensor$from_value_rowids(
values=as.integer(c(3, 1, 4, 1, 5, 9, 2, 6)),
value_rowids=as.integer(c(0, 0, 0, 0, 2, 2, 2, 3)))
## tf.RaggedTensor(values=Tensor("RaggedFromValueRowIds/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromValueRowIds/concat:0", shape=(5,), dtype=int64))
tf.RaggedTensor.from_row_lengths
If you know how long each row is, then you can use a row_lengths
row-partitioning tensor:
tf$RaggedTensor$from_row_lengths(
values=as.integer(c(3, 1, 4, 1, 5, 9, 2, 6)),
row_lengths=as.integer(c(4, 0, 3, 1)))
## tf.RaggedTensor(values=Tensor("RaggedFromRowLengths/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromRowLengths/concat:0", shape=(5,), dtype=int64))
tf.RaggedTensor.from_row_splits
If you know the index where each row starts and ends, then you can use a row_splits
row-partitioning tensor:
tf$RaggedTensor$from_row_splits(
values=as.integer(c(3, 1, 4, 1, 5, 9, 2, 6)),
row_splits=as.integer(c(0, 4, 4, 7, 8)))
## tf.RaggedTensor(values=Tensor("RaggedFromRowSplits/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromRowSplits/row_splits:0", shape=(5,), dtype=int64))
See the tf.RaggedTensor
class documentation for a full list of factory methods.
What you can store in a ragged tensor
As with normal Tensor
s, the values in a RaggedTensor
must all have the same
type; and the values must all be at the same nesting depth (the rank of the
tensor):
## tf.RaggedTensor(values=Tensor("RaggedConstant_5/values:0", shape=(4,), dtype=string), row_splits=Tensor("RaggedConstant_5/RaggedFromRowSplits/row_splits:0", shape=(3,), dtype=int64))
This is a small introduction to Ragged Tensors in TensorFlow. See the complete tutorial (in Python) here.