Training works best if the examples are in random order. Use tf.data.Dataset.shuffle to randomize entries, setting buffer_size to a value larger than the number of examples (120 in this case). To train the model faster, the dataset's batch size is set to 32 examples to train at once. train_dataset = tf.data.TextLineDataset(train_dataset_fp) train_dataset = train_dataset.skip(1) # skip the first header row train_dataset = train_dataset.map(parse_csv) # parse each row train_dataset = train_dataset.shuffle(buffer_size=1000) # randomize train_dataset = train_dataset.batch(32)
features, label = iter(train_dataset).next() >>> print("example features:", features[0]) example features: tf.Tensor([5.1 3.7 1.5 0.4], shape=(4,), dtype=float32) >>> print("example label:", label[0]) example label: tf.Tensor(0, shape=(), dtype=int32)
tensorflow output
example features: tf.Tensor([6. 2.7 5.1 1.6], shape=(4,), dtype=float32) example label: tf.Tensor(1, shape=(), dtype=int32)