Extracting three columns from a single column using pandas

Question

I have a csv file in which a column is itself a dictionary. This column contains three attributes each of which I want as a separate column in the resultant dataframe.

From the answer How to split a single column into three columns in pandas (python)? I am trying to use the following line of code to achieve the desired result:

df[['one', 'two', 'three']] = pd.DataFrame([ x.split(',') for x in df['statistics'].tolist() ])

But when I execute the above line of code I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~AppDataLocalTempipykernel_150843622721730.py in <module>
----> 1 df[['one', 'two', 'three']] = pd.DataFrame([ x.split(',') for x in df['statistics'].tolist() ])

~AppDataLocalTempipykernel_150843622721730.py in <listcomp>(.0)
----> 1 df[['one', 'two', 'three']] = pd.DataFrame([ x.split(',') for x in df['statistics'].tolist() ])

AttributeError: 'dict' object has no attribute 'split'

I attach the df for ready reference:

kind            etag                        id          statistics
youtube#video   WerWpr9_ht7SPd646jeYvOrMdFU 1isoZVQ9DxY {'viewCount': '133155', 'likeCount': '9199', 'favoriteCount': '0', 'commentCount': '1271'}
youtube#video   IkNRr3T_bPnPpfsRPJ9ZmpFLWQI 2izTju-uxrk {'viewCount': '103436', 'likeCount': '3930', 'favoriteCount': '0', 'commentCount': '712'}
youtube#video   ea_8Q2h6XDamfLZNhIL0HM3UZw4 oUOUI4_mS5c {'viewCount': '61008', 'likeCount': '3119', 'favoriteCount': '0', 'commentCount': '210'}
youtube#video   LjxX4UdBSR88LO41UtUf6cSBsV4 ONrmi30DkJc {'viewCount': '58111', 'likeCount': '2885', 'favoriteCount': '0', 'commentCount': '141'}
youtube#video   D98h38VbjEri485pD7dYrOyfoGM RA7t76Ie1TE {'viewCount': '77895', 'likeCount': '3394', 'favoriteCount': '0', 'commentCount': '216'}
youtube#video   4sa3me5UXvRmHb_4rNUKG0XhuVs boomn3StWJ0 {'viewCount': '57257', 'likeCount': '3187', 'favoriteCount': '0', 'commentCount': '159'}
youtube#video   e37d1Q_PIJj0ckLAE1Sv-ukVHDw AV3vptOJVaE {'viewCount': '67967', 'likeCount': '3371', 'favoriteCount': '0', 'commentCount': '207'}
youtube#video   Ly4sowP9gxeM-3iNgLUUWydTiaU vq6PEiPXGVk {'viewCount': '213144', 'likeCount': '8917', 'favoriteCount': '0', 'commentCount': '550'}
youtube#video   ubupKrV7LSJJCmyw4PBPY91BmPo toDp4JS5cwI {'viewCount': '316336', 'likeCount': '9160', 'favoriteCount': '0', 'commentCount': '747'}
youtube#video   g6W6BiuT7Af1alJmvmNtgXzZVLw qFOcxBGmOjQ {'viewCount': '468641', 'likeCount': '16106', 'favoriteCount': '0', 'commentCount': '1021'}
youtube#video   jhRggyXoTq_PAghKVfqVaZptT8I 6SOKGnf84Ik {'viewCount': '210653', 'likeCount': '10222', 'favoriteCount': '0', 'commentCount': '591'}
youtube#video   2kXYv_ycWt_AhVLV7ZfQ7KR6zFo q-wZ1819y7c {'viewCount': '214089', 'likeCount': '11232', 'favoriteCount': '0', 'commentCount': '571'}
youtube#video   p7RePnFd9fXm6PU_UEBCSDs-iyQ 8I4S5Ery92s {'viewCount': '352246', 'likeCount': '15854', 'favoriteCount': '0', 'commentCount': '655'}
youtube#video   mJ3OiBk5QpRTlJs-TH_rzEDHLJE aeSqTAwm5NI {'viewCount': '347399', 'likeCount': '13567', 'favoriteCount': '0', 'commentCount': '713'}
youtube#video   iQWVTcoYkgmNjJTy93eo6fqdbrM yPwIprzFfF0 {'viewCount': '361987', 'likeCount': '15262', 'favoriteCount': '0', 'commentCount': '559'}
youtube#video   XArq68sxje-985r9BAvs05Jj-HA Gg0wYPxbmjA {'viewCount': '1466364', 'likeCount': '52941', 'favoriteCount': '0', 'commentCount': '4278'}
youtube#video   F0_58PVsa6pPEmphN1sEYZBe0sU ZcjXo8KtWRY {'viewCount': '230492', 'likeCount': '7322', 'favoriteCount': '0', 'commentCount': '622'}
youtube#video   emkAGoMq-kgWTEwJeNOh3EshkiU ur7hLYv404I {'viewCount': '279350', 'likeCount': '9968', 'favoriteCount': '0', 'commentCount': '1187'}
youtube#video   fXqmKxY3vFPYnutf0MqQKoyZQV4 wpgA-rRBqs8 {'viewCount': '215555', 'likeCount': '7564', 'favoriteCount': '0', 'commentCount': '451'}
youtube#video   2ml-vwsPQ_5jdgA2UdxoTc4ZXnk sG5rnRb-FI8 {'viewCount': '283075', 'likeCount': '9599', 'favoriteCount': '0', 'commentCount': '747'}

I require the resultant df to be as follows:

Asked By: Huzefa Sadikot

||

Source

Answer 1

Each entry in your "statistics" columns is actually a dictionary (not a string) which explains the error message you were receiving.

You can use the following code to create a DataFrame from these entries:

new_df = pd.DataFrame.from_records(df['statistics'])

print(new_df.head())
   viewCount likeCount favoriteCount commentCount
0     133155      9199             0         1271
1     103436      3930             0          712
2      61008      3119             0          210
3      58111      2885             0          141
4      77895      3394             0          216

Then we can merge these 2 DataFrames together like so:

final_df = df.drop(columns=['statistics']).join(new_df)

print(final_df.head())
            kind                         etag           id viewCount likeCount favoriteCount commentCount
0  youtube#video  WerWpr9_ht7SPd646jeYvOrMdFU  1isoZVQ9DxY    133155      9199             0         1271
1  youtube#video  IkNRr3T_bPnPpfsRPJ9ZmpFLWQI  2izTju-uxrk    103436      3930             0          712
2  youtube#video  ea_8Q2h6XDamfLZNhIL0HM3UZw4  oUOUI4_mS5c     61008      3119             0          210
3  youtube#video  LjxX4UdBSR88LO41UtUf6cSBsV4  ONrmi30DkJc     58111      2885             0          141
4  youtube#video  D98h38VbjEri485pD7dYrOyfoGM  RA7t76Ie1TE     77895      3394             0          216

Answered By: Cameron Riddell

Extracting three columns from a single column using pandas

Question:

Answers: