Changing pipe separated data to a pandas Dataframe

Question:

I have pipe-separated values like this:

https|clients4.google.com|application/octet-stream|2296|
https|clients4.google.com|text/html; charset=utf-8|0|
....
....
https|clients4.google.com|application/octet-stream|2291|

I have to create a Pandas DataFrame out of this data, with each column given a name.

Asked By: itsaruns

||

Answers:

Here you go:

>>> import pandas as pd

>>> pd.read_csv('data.csv', sep='|', index_col=False, 
                 names=['protocol', 'server', 'type', 'value'])
Out[7]:
     protocol server                 type                        value
0    https    clients4.google.com    application/octet-stream    2296
1    https    clients4.google.com    text/html; charset=utf-8    0
2    https    clients4.google.com    application/octet-stream    2291
Answered By: elyase

If the data is a string, StringIO from the standard io library could be used to convert to file-like object, which could be read as csv. Also, since the data don’t seem to have a header, header=None could be passed, so that pandas won’t read the first row of data as the header. Also an off-the-shelf method to add a prefix to column names (add_prefix()) could be used to make column labels more "label-like".

data = """
https|clients4.google.com|application/octet-stream|2296|
https|clients4.google.com|text/html; charset=utf-8|0|
https|clients4.google.com|application/octet-stream|2291|
"""

from io import StringIO
sio = StringIO(data)
df = pd.read_csv(sio, sep='|', header=None).add_prefix('col_').dropna(how='all', axis=1)

res

Answered By: cottontail