Pandas: how to read from markdown string?

Question:

I have a table on a git issue which has data about weekly workers.

The table written with markdown, and is like:

start | end | main | sub   
-- | -- | -- | --    
1/30 | 2/6 | Alice | Bob   
2/6 | 2/13 | Charlie | Dave   

I can get the current date, and can get that markdown text from rest api.

What I get from rest api is a string, which separates lines with rn.

And what I wanna do more is to figure out the weekly workers, but I’m stuck on this.

Do you have any good ideas?

Asked By: yoon

||

Answers:

For further applications, I think you should form the table become records with each record as a dictionary whose key is the columns’ name.

First, get headers of the table by:

tableStr = 'start | end | main | sub'
    + 'rn' + '-- | -- | -- | --'
    + 'rn' + '1/30 | 2/6 | Alice | Bob'
    + 'rn' + '2/6 | 2/13 | Charlie | Dave'

headersStr = tableStr[:tableStr.find('rn')]
headers = [h.strip() for h in headersStr.split('|')]

then, parse the table to records by:

records = []
for rowStr in tableStr.split('rn')[2:]:
    row = [entry.strip() for entry in rowStr.split('|')]
    record = {headers[i]:row[i] for i in range(len(headers))}
    records.append(record)

print(records)

you would get in console:

[{'start': '1/30', 'end': '2/6', 'main': 'Alice', 'sub': 'Bob'}, {'start': '2/6', 'end': '2/13', 'main': 'Charlie', 'sub': 'Dave'}]

I’m not sure about how you define weekly worker, but you could do something with it. For example:

reverseSortedBySub = sorted(records, key=lambda x: x['sub'], reverse=True)
print(reverseSortedBySub)

then you’d get:

[{'start': '2/6', 'end': '2/13', 'main': 'Charlie', 'sub': 'Dave'}, {'start': '1/30', 'end': '2/6', 'main': 'Alice', 'sub': 'Bob'}]
Answered By: Goomoonryong

I prefer reading them via pandas, then you can do a lot more:

import pandas as pd
from io import StringIO

tableStr = StringIO('start | end | main | sub'
    + 'rn' + '-- | -- | -- | --'
    + 'rn' + '1/30 | 2/6 | Alice | Bob'
    + 'rn' + '2/6 | 2/13 | Charlie | Dave')

df = pd.read_table(tableStr, sep="|", header=0, skipinitialspace=True).dropna(axis=1, how='all').iloc[1:] 
df

Output:

start   end   main      sub
1/30    2/6   Alice     Bob
2/6     2/13  Charlie   Dave
Answered By: Minh-Long Luu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.