Reference

tftables.open_file(filename, batch_size, **kw_args)

Open a HDF5 file for streaming with multitables. Batches will be retrieved with size batch_size. Additional keyword arguments will be passed to the multitables.Streamer object.

Parameters:
  • filename – Filename for the HDF5 file to be read.
  • batch_size – The size of the batches to be fetched by this reader.
  • kw_args – Optional arguments to pass to multitables.
Returns:

A FileReader instance.

tftables.load_dataset(filename, dataset_path, batch_size, queue_size=8, input_transform=None, ordered=False, cyclic=True, processes=None, threads=None)

Convenience function to quickly and easily load a dataset using best guess defaults. If a table is loaded, then the input_transformation argument is required. Returns an instance of FIFOQueueLoader that loads this dataset into a fifo queue.

This function takes a single argument, which is either a tensorflow placeholder for the requested array or a dictionary of tensorflow placeholders for the columns in the requested table. The output of this function should be either a single tensorflow tensor, a tuple of tensorflow tensors, or a list of tensorflow tensors. A subsequent call to loader.dequeue() will return tensors in the same order as input_transform.

For example, if an array is stored in uint8 format, but we want to cast it to float32 format to do work on the GPU, the input_transform would be:

def input_transform(ary_batch):
    return tf.cast(ary_batch, tf.float32)

If, instead we were loading a table with column names label and data we need to transform this into a list. We might use something like the following to also do the one hot transform.

def input_transform(tbl_batch):
    labels = tbl_batch['labels']
    data = tbl_batch['data']

    truth = tf.to_float(tf.one_hot(labels, num_labels, 1, 0))
    data_float = tf.to_float(data)

    return truth, data_float

Then the subsequent call to loader.dequeue() returns these int the same order:

truth_batch, data_batch = loader.dequeue()

By default, this function does not preserve on-disk ordering, and gives cyclic access. The disk ordering can be preserved using the ordered argument; however, this may result in slower read performance.

Parameters:
  • filename – The filename to the HDF5 file.
  • dataset_path – The internal HDF5 path to the dataset.
  • batch_size – The size of the batches to be loaded into tensorflow.
  • queue_size – The size of the tensorflow FIFO queue.
  • input_transform – A function that transforms the batch before being loaded into the queue.
  • ordered – Preserve the on-disk ordering of the requested dataset.
  • cyclic – Data will be loaded in an endless loop that wraps around the end of the dataset.
  • processes – Number of concurrent processes that multitables should use to read data from disk.
  • threads – Number of threads to use to preprocess data and load the FIFO queue.
Returns:

a loader for the dataset

class tftables.FileReader(filename, batch_size, **kw_args)

This class reads batches from datasets in a HDF5 file.

close()

Closes the internal queue, signaling the background processes to stop. This calls the multitables.Streamer.Queue.close method.

Returns:None
feed()

Generator for feeding a tensorflow operation. Each iteration returns a feed_dict that contains the data for one batch. This method reads data for all placeholders created.

Returns:A generator which yields tensorflow feed_dicts
get_batch(path, **kw_args)

Get a Tensorflow placeholder for a batch that will be read from the dataset located at path. Additional key word arguments will be forwarded to the get_queue method in multitables. This defaults the multitables arguments cyclic and ordered to true.

When ordering of batches is unimportant, the ordered argument can be set to False for potentially better performance. When reading from multiple datasets (eg; when examples and labels are in two different arrays), it is recommended to set ordered to True to preserve synchronisation.

If the dataset is a table (or other compound-type array) then a dictionary of placeholders will be returned instead. The keys of this dictionary correspond to the column names of the table (or compound sub-types).

Parameters:
  • path – The internal HDF5 path to the dataset to be read.
  • kw_args – Optional arguments to be forwarded to multitables.
Returns:

Either a placeholder or a dictionary depending on the type of dataset. If the dataset is a plain array, a placeholder representing once batch is returned. If the dataset is a table or compound type, a dictionary of placeholders is returned.

get_fifoloader(queue_size, inputs, threads=None)

Convenience method for creating a FIFOQueueLoader object. See the FIFOQueueLoader constructor for documentation on parameters.

Parameters:
  • queue_size
  • inputs
  • threads – Defaults to 1 if ordered access to this reader was requested, otherwise defaults to 2.
Returns:

class tftables.FIFOQueueLoader(reader, size, inputs, threads=1)

A class to handle the creation and population of a Tensorflow FIFOQueue.

begin(*args, **kwds)

Convenience context manager for starting and stopping the loader. :param tf_session: The current Tensorflow session. :param catch_termination: Catch the termination of the loop for non-cyclic access. :return:

static catch_termination()

In non-cyclic access, once the end of the dataset is reached, an exception is called to halt all access to the queue. This context manager catches this exception for silent handling of the termination condition. :return:

dequeue()

Returns a dequeue operation. Elements defined by the input tensors and supplied by the reader are returned from this operation. This calls the dequeue method on the internal Tensorflow FIFOQueue.

Returns:A dequeue operation.
start(sess)

Starts the background threads. The enqueue operations are run in the given Tensorflow session.

Parameters:sess – Tensorflow session.
Returns:None
stop(sess)

Stops the background threads, and joins them. This should be called after all operations are complete.

Parameters:sess – The Tensorflow operation that this queue loader was started with.
Returns: