Batch generators

BatchGenerator class

class keras_batchflow.base.batch_generators.BatchGenerator(data, x_structure, y_structure=None, batch_transforms=None, batch_size=32, shuffle=True, train_mode=True, encoder_adaptor=None)

Basic batch generator. It is also a root class for all other batch generators.

This batch generator is a python generator object that returns new data at each new iteration. It is built to b used in Keras's *_generator functions.

At every new iteration, it selects small chunk of a dataset and sends it to stack of transformers specified at creation time. The generator makes sure that each datapoint will be selected once in one end-to-end walk

Parameters:

  • data - a Pandas dataframe containing a dataset with both x and y
  • x_structure - tuple or tuple of tuples - a structure describing mapping of dataframe columns to pre-fitted encoders and to keras model inputs. When model has a single input x_structure will look like x_structure=('column_name', encoder). When model has multiple inputs, keras expects a tuple of numpy arrays as model X's. The structure will look like x_structure=(('column_name1', encoder1), ('column_name2', encoder2) If encoder is None, the column values will be converted to numpy array and passed unchanged. If you want to add a constant to inputs or outputs, you can add tuples with column_name = None and constant value instead of encoder, like so: (None, value). Example: x_structure=(('column_name1', encoder1), ('column_name2', None), (None, 1) - values in column_name1 are encoded by encoder1, values from column_name2 are passed through unchanged, the third column in the x structure will be a constant of 1. So the batch could be (np.array(...), np.array(...), np.array(1, 1, ...))
  • y_structure - (optional) tuple or list of tuples - a structure describing mapping of dataframe columns to pre-fitted encoders and to keras model output. When model has multiple output, keras expects a list of numpy arrays as model Y's. Default: None. Same rules and same format applies (see x_structure)
  • batch_transforms - (optional) a single instance or list of BatchTransformer - a stack of batch transformers that are applied to batches before splitting to columns. These are useful when variables interact during transform. For example, in feature dropout, when only one randomly selected feature out of multiple input features have to be dropped. Default: None
  • batch_size - (optional) int max length of generated batch. The last batch of a dataset can be smaller if total size of dataframe is not multiple of a batch_size. Default: 32
  • shuffle - (optional) bool, if true, the input dataframe is shuffled before each new epoch. Default: False
  • train_mode - (optional) bool. If true, both X and Y are returned, otherwise only X is returned
  • encoder_adaptor - (optional) str or a single instance of a class derived from keras_batchflow.base.batch_shapers.IEncoderAdaptor class. String values supported: 'numpy' and 'pandas'. If not provided, 'numpy' is used. This parameter sets format that encoders are using. Sklearn encoders are created for numpy arrays hence the default value is numpy. If your encoders require pandas format, use 'pandas'. Alternatively, if your encoders need some special format, create your instance derived from IEncoderAdaptor class

Triplet PK Generator class

class keras_batchflow.base.batch_generators.TripletPKGenerator(data, triplet_label, classes_in_batch, samples_per_class, x_structure, y_structure=None, **kwargs)

This class implements a batch generator for generic triplet network described in this paper TODO: add more details