Is there a way to create a new column for values that are in the original column either using Excel or Python?
Question:
I have the below Pandas Series with values that look like this:
Info
# ID 3.22.33.2
Location: Texas
Address: 1321 madeupstreet
Name: mike b
Address: 6.3.1
There are almost 1000 rows with this data so the problem I am having is:
Question
Can I run a code in python or Excel to extract the values from these rows in such a way that it would place ID#s on a separate column, Location in a another column …etc
So it would look something like this:
IDs
Location
Address
Name
3.22.33.2
Texas
1321 Madeupstreet
mike b
Some items wont have a Name and in that case, could it just leave it blank? or write No name found?
I tried creating separate lists (this data came from a text file) but that method is not working for me, and I do not have any code to share at the moment.so I copy pasted all the values into an excel sheet.
Note (I do not care about the second Address line, so if it is easier to ignore that is fine).
Answers:
Based on the specific series and for the sake of easiness I suggest you can try:
df['Index'] = df['Info'].replace("# ","",regex=True).str.split().str[0]
df['Values'] = [' '.join(x) for x in df['Info'].replace("# ","",regex=True).str.split().str[1:]]
output = df.set_index('Index').drop(columns='Info').T
Returning:
Index ID Location: Address: Name: Address:
Values 3.22.33.2 Texas 1321 madeupstreet mike b 6.3.1
IIUC, split
the header and values, then pivot_table
:
out = (df['Info']
.str.split(':|(?<=ID)s', expand=True)
.set_axis(['col', 'value'], axis=1)
.assign(index=lambda d: d['col'].str.endswith('ID').cumsum())
.pivot_table(index='index', columns='col', values='value', aggfunc='first')
.rename_axis(index=None, columns=None)
)
output:
# ID Address Location Name
1 3.22.33.2 1321 madeupstreet Texas mike b
It was the row with ID
entry that caused hastle but here is my solution
I first changed the entry with ID
to have :
just like all other rows
ID_index = df.Info[0].index('ID')
df.Info[0] = df.Info[0][:ID_index+2] + ':' + df.Info[0][ID_index+2:]
Then split the rows by :
and transpose the series
new_df = df.Info.str.split(':', expand=True).T
new_df.columns = new_df.iloc[0].values
new_df = new_df.iloc[1:]
gives
# ID Location Address Name Address
1 3.22.33.2 Texas 1321 madeupstreet mike b 6.3.1
I have the below Pandas Series with values that look like this:
Info |
---|
# ID 3.22.33.2 |
Location: Texas |
Address: 1321 madeupstreet |
Name: mike b |
Address: 6.3.1 |
There are almost 1000 rows with this data so the problem I am having is:
Question
Can I run a code in python or Excel to extract the values from these rows in such a way that it would place ID#s on a separate column, Location in a another column …etc
So it would look something like this:
IDs | Location | Address | Name |
---|---|---|---|
3.22.33.2 | Texas | 1321 Madeupstreet | mike b |
Some items wont have a Name and in that case, could it just leave it blank? or write No name found?
I tried creating separate lists (this data came from a text file) but that method is not working for me, and I do not have any code to share at the moment.so I copy pasted all the values into an excel sheet.
Note (I do not care about the second Address line, so if it is easier to ignore that is fine).
Based on the specific series and for the sake of easiness I suggest you can try:
df['Index'] = df['Info'].replace("# ","",regex=True).str.split().str[0]
df['Values'] = [' '.join(x) for x in df['Info'].replace("# ","",regex=True).str.split().str[1:]]
output = df.set_index('Index').drop(columns='Info').T
Returning:
Index ID Location: Address: Name: Address:
Values 3.22.33.2 Texas 1321 madeupstreet mike b 6.3.1
IIUC, split
the header and values, then pivot_table
:
out = (df['Info']
.str.split(':|(?<=ID)s', expand=True)
.set_axis(['col', 'value'], axis=1)
.assign(index=lambda d: d['col'].str.endswith('ID').cumsum())
.pivot_table(index='index', columns='col', values='value', aggfunc='first')
.rename_axis(index=None, columns=None)
)
output:
# ID Address Location Name
1 3.22.33.2 1321 madeupstreet Texas mike b
It was the row with ID
entry that caused hastle but here is my solution
I first changed the entry with ID
to have :
just like all other rows
ID_index = df.Info[0].index('ID')
df.Info[0] = df.Info[0][:ID_index+2] + ':' + df.Info[0][ID_index+2:]
Then split the rows by :
and transpose the series
new_df = df.Info.str.split(':', expand=True).T
new_df.columns = new_df.iloc[0].values
new_df = new_df.iloc[1:]
gives
# ID Location Address Name Address
1 3.22.33.2 Texas 1321 madeupstreet mike b 6.3.1