Why can't we use a fill_value when reshaping a dataframe (array)?
Question:
I have this dataframe :
df = pd.DataFrame([list("ABCDEFGHIJ")])
0 1 2 3 4 5 6 7 8 9
0 A B C D E F G H I J
I got an error when trying to reshape the dataframe/array :
np.reshape(df, (-1, 3))
ValueError: cannot reshape array of size 10 into shape (3)
I’m expecting this array (or a dataframe with the same shape) :
array([['A', 'B', 'C'],
['D', 'E', 'F'],
['G', 'H', 'I'],
['J', nan, nan]], dtype=object)
Why NumPy can’t guess the expected shape by completing the missing values with nan
?
Answers:
One option is to use divmod()
df.set_axis(list(divmod(df.columns,3)),axis=1).stack(level=0).to_numpy()
Output:
array([['A', 'B', 'C'],
['D', 'E', 'F'],
['G', 'H', 'I'],
['J', nan, nan]])
Another possible solution, based on numpy.pad
, which inserts the needed np.nan
into the array:
n = 3
s = df.shape[1]
m = s // n + 1*(s % n != 0)
np.pad(df.values.flatten(), (0, m*n - s),
mode='constant', constant_values=np.nan).reshape(m,n)
Explanation:
-
s // n
is the integer division of the length of the original array and the number of columns (after reshape).
-
s % n
gives the remainder of the division s // n
. For instance, if s = 9
, then s // n
is equal to 3 and s % n
equal to 0.
-
However, if s = 10
, s // n
is equal to 3 and s % n
equal to 1. Thus, s % n != 0
is True
. Consequently, 1*(s % n != 0)
is equal to 1, which makes m = 3 + 1
.
-
(0, m*n - s)
means the number of np.nan
to insert at the left of the array (0, in this case) and the number of np.nan
to insert at the right of the array (m*n - s
).
Output:
array([['A', 'B', 'C'],
['D', 'E', 'F'],
['G', 'H', 'I'],
['J', nan, nan]], dtype=object)
I have this dataframe :
df = pd.DataFrame([list("ABCDEFGHIJ")])
0 1 2 3 4 5 6 7 8 9
0 A B C D E F G H I J
I got an error when trying to reshape the dataframe/array :
np.reshape(df, (-1, 3))
ValueError: cannot reshape array of size 10 into shape (3)
I’m expecting this array (or a dataframe with the same shape) :
array([['A', 'B', 'C'],
['D', 'E', 'F'],
['G', 'H', 'I'],
['J', nan, nan]], dtype=object)
Why NumPy can’t guess the expected shape by completing the missing values with nan
?
One option is to use divmod()
df.set_axis(list(divmod(df.columns,3)),axis=1).stack(level=0).to_numpy()
Output:
array([['A', 'B', 'C'],
['D', 'E', 'F'],
['G', 'H', 'I'],
['J', nan, nan]])
Another possible solution, based on numpy.pad
, which inserts the needed np.nan
into the array:
n = 3
s = df.shape[1]
m = s // n + 1*(s % n != 0)
np.pad(df.values.flatten(), (0, m*n - s),
mode='constant', constant_values=np.nan).reshape(m,n)
Explanation:
-
s // n
is the integer division of the length of the original array and the number of columns (after reshape). -
s % n
gives the remainder of the divisions // n
. For instance, ifs = 9
, thens // n
is equal to 3 ands % n
equal to 0. -
However, if
s = 10
,s // n
is equal to 3 ands % n
equal to 1. Thus,s % n != 0
isTrue
. Consequently,1*(s % n != 0)
is equal to 1, which makesm = 3 + 1
. -
(0, m*n - s)
means the number ofnp.nan
to insert at the left of the array (0, in this case) and the number ofnp.nan
to insert at the right of the array (m*n - s
).
Output:
array([['A', 'B', 'C'],
['D', 'E', 'F'],
['G', 'H', 'I'],
['J', nan, nan]], dtype=object)