Convert markdown table to json with python

Question:

I am trying to figure out, what is the easiest way to convert some markdown table text into json using only python. For example, consider this as input string:

| Some Title | Some Description             | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game           | 5           |
| Bloodborne | This one is even better      | 2           |
| Sekiro     | This one is also pretty good | 110101      |

The output should be like this:

[
    {"Some Title":"Dark Souls","Some Description":"This is a fun game","Some Number":5},
    {"Some Title":"Bloodborne","Some Description":"This one is even better","Some Number":2},
    {"Some Title":"Sekiro","Some Description":"This one is also pretty good","Some Number":110101}
]

Note: Ideally, the output should be RFC 8259 compliant, aka use double quotes " instead of single quotes ‘ around they key value pairs.

I’ve seen some JS libraries that do that, but nothing for python only.

Asked By: Kyu96

||

Answers:

You can treat it as a multi-line string and parse it line by line while splitting at n and |

Simple code that does that:

import json

my_str='''| Some Title | Some Description             | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game           | 5           |
| Bloodborne | This one is even better      | 2           |
| Sekiro     | This one is also pretty good | 110101      |'''

def mrkd2json(inp):
    lines = inp.split('n')
    ret=[]
    keys=[]
    for i,l in enumerate(lines):
        if i==0:
            keys=[_i.strip() for _i in l.split('|')]
        elif i==1: continue
        else:
            ret.append({keys[_i]:v.strip() for _i,v in enumerate(l.split('|')) if  _i>0 and _i<len(keys)-1})
    return json.dumps(ret, indent = 4) 
print(mrkd2json(my_str))
[
    {
        "Some Title": "Dark Souls",
        "Some Description": "This is a fun game",
        "Some Number": "5"
    },
    {
        "Some Title": "Bloodborne",
        "Some Description": "This one is even better",
        "Some Number": "2"
    },
    {
        "Some Title": "Sekiro",
        "Some Description": "This one is also pretty good",
        "Some Number": "110101"
    }
]

PS: Don’t know about any library that does that, will update if I find anything!

Answered By: Kuldeep Singh Sidhu

My approach was very similar to @Kuldeep Singh Sidhu’s:


md_table = """
| Some Title | Some Description             | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game           | 5           |
| Bloodborne | This one is even better      | 2           |
| Sekiro     | This one is also pretty good | 110101      |
"""

result = []

for n, line in enumerate(md_table[1:-1].split('n')):
    data = {}
    if n == 0:
        header = [t.strip() for t in line.split('|')[1:-1]]
    if n > 1:
        values = [t.strip() for t in line.split('|')[1:-1]]
        for col, value in zip(header, values):
            data[col] = value
        result.append(data)

Result is:

[{'Some Title': 'Dark Souls',
  'Some Description': 'This is a fun game',
  'Some Number': '5'},
 {'Some Title': 'Bloodborne',
  'Some Description': 'This one is even better',
  'Some Number': '2'},
 {'Some Title': 'Sekiro',
  'Some Description': 'This one is also pretty good',
  'Some Number': '110101'}]
Answered By: mullinscr

You could let csv do the main work and do something like the following:

import csv
import json

markdown_table = """| Some Title | Some Description             | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game           | 5           |
| Bloodborne | This one is even better      | 2           |
| Sekiro     | This one is also pretty good | 110101      |"""

lines = markdown_table.split("n")

dict_reader = csv.DictReader(lines, delimiter="|")
data = []
# skip first row, i.e. the row between the header and data
for row in list(dict_reader)[1:]:
    # strip spaces and ignore first empty column
    r = {k.strip(): v.strip() for k, v in row.items() if k != ""}
    data.append(r)

print(json.dumps(data, indent=4))

This is the output

[
    {
        "Some Title": "Dark Souls",
        "Some Description": "This is a fun game",
        "Some Number": "5"
    },
    {
        "Some Title": "Bloodborne",
        "Some Description": "This one is even better",
        "Some Number": "2"
    },
    {
        "Some Title": "Sekiro",
        "Some Description": "This one is also pretty good",
        "Some Number": "110101"
    }
]
Answered By: wolfrevo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.