Spliting and Pivoting a column of different string length in Python
Question:
I have this dataframe.
+-----+--------+--------------------------------+
|ID |Date |Text |
+-----+--------+--------------------------------+
|1 |1 Jan |This is a text |
|2 |2 Jan |Text can be of variant length |
+-----+--------+--------------------------------+
How can i split and pivot the Text column to the ID and Date?
+-----+--------+-------+
|ID |Date |Text |
+-----+--------+-------+
|1 |1 Jan |This |
|1 |1 Jan |is |
|1 |1 Jan |a |
|1 |1 Jan |text |
|2 |2 Jan |Text |
|2 |2 Jan |can |
|2 |2 Jan |be |
|2 |2 Jan |of |
|2 |2 Jan |variant|
|2 |2 Jan |length |
+-----+--------+-------+
I know that for pivot, I can use df.stack()
but i am having trouble with splitting it due to the difference in length for each text.
I would really appreciate any help.
Answers:
try this code and refer this documentation https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html
df = pd.DataFrame({'col1':[1,2],'col2':['1 Jan', '2 Jan'],'col3':['This is a text','Text can be of varient length']})
df['col3'] = df['col3'].str.split(' ')
a = df.explode('col3')
print(a)
Output:
col1 col2 col3
0 1 1 Jan This
0 1 1 Jan is
0 1 1 Jan a
0 1 1 Jan text
1 2 2 Jan Text
1 2 2 Jan can
1 2 2 Jan be
1 2 2 Jan of
1 2 2 Jan varient
1 2 2 Jan length
I have this dataframe.
+-----+--------+--------------------------------+
|ID |Date |Text |
+-----+--------+--------------------------------+
|1 |1 Jan |This is a text |
|2 |2 Jan |Text can be of variant length |
+-----+--------+--------------------------------+
How can i split and pivot the Text column to the ID and Date?
+-----+--------+-------+
|ID |Date |Text |
+-----+--------+-------+
|1 |1 Jan |This |
|1 |1 Jan |is |
|1 |1 Jan |a |
|1 |1 Jan |text |
|2 |2 Jan |Text |
|2 |2 Jan |can |
|2 |2 Jan |be |
|2 |2 Jan |of |
|2 |2 Jan |variant|
|2 |2 Jan |length |
+-----+--------+-------+
I know that for pivot, I can use df.stack()
but i am having trouble with splitting it due to the difference in length for each text.
I would really appreciate any help.
try this code and refer this documentation https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html
df = pd.DataFrame({'col1':[1,2],'col2':['1 Jan', '2 Jan'],'col3':['This is a text','Text can be of varient length']})
df['col3'] = df['col3'].str.split(' ')
a = df.explode('col3')
print(a)
Output:
col1 col2 col3
0 1 1 Jan This
0 1 1 Jan is
0 1 1 Jan a
0 1 1 Jan text
1 2 2 Jan Text
1 2 2 Jan can
1 2 2 Jan be
1 2 2 Jan of
1 2 2 Jan varient
1 2 2 Jan length