In Python/Tensorflow: how to convert string representation of 2D arrays from text files into something TF can use

Question:

I need to load from text files rows that contain string representations of 2D arrays, for later use in training a Tensorflow CNN, but I cannot get the strings converted into a format Tensorflow likes. I have tried all sorts of combinations of apply/map/various functions, but always get some cryptic error. Below is a toy example code that is close to working, but still throws an error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported
object type numpy.ndarray)

import tensorflow as tf
import numpy as np
import pandas as pd
from ast import literal_eval

def df_to_dataset(dataframe):
    Y = tf.convert_to_tensor( dataframe['Y'].values )
    X = tf.convert_to_tensor(
         dataframe['X'].apply(literal_eval).apply(np.array).values
       )
    return tf.data.Dataset.from_tensor_slices( ( X , Y ) 
)

data = [[ 1, "[[0,1],[0,1]]" ] , [ 0 , "[[1,0],[1,0]]" ]]
df = pd.DataFrame(data, columns=['Y','X'])
dataset = df_to_dataset(df)
for feature in dataset.take(1):
    print( feature )
Asked By: deprekate

||

Answers:

So your dataframe displays as:

In [161]: df
Out[161]: 
   Y              X
0  1  [[0,1],[0,1]]
1  0  [[1,0],[1,0]]

Though that doesn’t show the string quotes.

In [162]: df['Y'].values
Out[162]: array([1, 0])

THe X column is a 1d array of strings, object dtype:

In [163]: df['X'].values
Out[163]: array(['[[0,1],[0,1]]', '[[1,0],[1,0]]'], dtype=object)

With the eval, values is now a array of lists:

In [164]: from ast import literal_eval
In [165]: df['X'].apply(literal_eval)
Out[165]: 
0    [[0, 1], [0, 1]]
1    [[1, 0], [1, 0]]
Name: X, dtype: object
In [166]: df['X'].apply(literal_eval).values
Out[166]: array([list([[0, 1], [0, 1]]), list([[1, 0], [1, 0]])], dtype=object)

But if instead we extract it as a list:

In [168]: df['X'].apply(literal_eval).to_list()
Out[168]: [[[0, 1], [0, 1]], [[1, 0], [1, 0]]]

We can easily turn that into an array:

In [169]: np.array(_)
Out[169]: 
array([[[0, 1],
        [0, 1]],

       [[1, 0],
        [1, 0]]])

Back to the array form, we can "reduce" that using stack

In [170]: np.stack(df['X'].apply(literal_eval).values)
Out[170]: 
array([[[0, 1],
        [0, 1]],

       [[1, 0],
        [1, 0]]])

stack is like concatenate or vstack except it adds a dimension, acting more like np.array.

Now the tensorflow conversion should work.

Your second apply, only changes the array of lists into an array of arrays.

In [174]: df['X'].apply(literal_eval).apply(np.array).values
Out[174]: 
array([array([[0, 1],
              [0, 1]]), array([[1, 0],
                               [1, 0]])], dtype=object)

np.stack works on that as well.

Answered By: hpaulj
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.