Fastest method to post a pandas dataframe from Jupyter Notebooks into a Stack Overflow problem?


I just posted my first Stack Overflow question. I spent ages formatting a pandas dataframe table in the post. What steps do you use to post a pandas dataframe in a Stack Overflow question?

I searched for an answer and found this detailed post: How to make good reproducible pandas examples. I followed the instructions and used pd.read_clipboard, but I still had to spend a significant amount of time formatting the table to make it look correct.

I also found a similar question with only one answer with a downvote: How to display a pandas dataframe on a Stack Overflow question body.

I tried to copy the dataframe from Jupyter and pasting it into a Blockquote. As mentioned, I also ran pd.read_clipboard('ss+') in Jupyter to copy it to the clipboard and then posted it into a Blockquote. I also tried creating a table and posting the values in the table. All of these methods required that I tweak the formatting to make it look properly formatted.

Here is an example dataframe to use:

df = pd.DataFrame(
    [['Captain', 'Crunch', 72],
     ['Trix', 'Rabbit', 36],
     ['Count', 'Chocula', 41],
     ['Tony', 'Tiger',  54],
     ['Buzz', 'Bee', 28],
     ['Toucan', 'Sam', 38]],
    columns=['first_name', 'last_name', 'age'])
Asked By: swilson




The easiest method I found was to use print(df.to_markdown()).

This will convert the data into mkd format which can be interpreted by SO. For example with your dataframe, the output is:

first_name last_name age
0 Captain Crunch 72
1 Trix 36 Rabbit
2 Count Chocula 41
3 Tony 54 Tiger
4 Buzz 28 Bee
5 Toucan Sam 38

Note you might need to install tabulate module.


Another option is to use df.head().to_dict('list'), but it might not be the best one for large datasets (will work for minimum reproducible examples though)

{'first_name': ['Captain', 'Trix', 'Count', 'Tony', 'Buzz'], 'last_name': ['Crunch', 36, 'Chocula', 54, 28], 'age': [72, 'Rabbit', 41, 'Tiger', 'Bee']}

Anyone can use this by passing it through pd.DataFrame()

Answered By: Suraj Shourie

Here is how I would share your data example in a post for SO, leaving out the comments I included for assistance here:

#paste the contents of the comma-separated file between two sets of triple ticks
#then include in the post the code to make the df instead of 
# assuming people know to use use the table and use read_table
# because this catches any issues, too, because displaying `df` should give starting point
import io
import pandas as pd
df = pd.read_csv(io.StringIO(s))

(See another example here.)

The nice thing is it lets you draft that by hand or customize it some in a text editor if you want.

Preparation behind-the-scenes

If it was already a dataframe there is no reason to fuss with formatting a table. Let Pandas make it.

To make that I took your dataframe code and did this:

import pandas as pd
df = pd.DataFrame([['Captain', 'Crunch', 72],
               ['Trix', 36, 'Rabbit'],
               ['Count', 'Chocula', 41],
               ['Tony', 54, 'Tiger'],
               ['Buzz', 28, 'Bee'],
               ['Toucan', 'Sam', 38]],
              columns=['first_name', 'last_name', 'age'])
df.to_csv("df_as_csv.csv", index = False)

Then I pasted the content in the .csv into the s string content in the block above.

I prefer .tsv and found it more human readable; however @wjandrea as pointed out Stack Overflow converts tabs to spaces when rendering posts, so that doesn’t work well. Fortunately, comma de-limited can be easily edited and customized by hand to some extent. (And if you really prefer .tsv like me, you can encode it in SO and it will work in Python using t to function as tabs, like so `s=”’first_nametlast_nametage”’ for first line example. You can use Python to do the replacement if you want and it remains hand-editable this way.)

Answered By: Wayne


The most reproducible option is to_dict('tight'), this handles data, indexes names, indexes with multiple levels:

data = df.to_dict('tight')
print(data) # this is the output to provide in the question

Then to load the data (here with a more complex example):

data = {
 'index': [0, 1, 2, 3, 4, 5],
 'columns': [('level_0', 'first_name'),
  ('level_0', 'last_name'),
  ('level_0', 'age')],
 'data': [['Captain', 'Crunch', 72],
  ['Trix', 'Rabbit', 36],
  ['Count', 'Chocula', 41],
  ['Tony', 'Tiger', 54],
  ['Buzz', 'Bee', 28],
  ['Toucan', 'Sam', 38]],
 'index_names': ['index'],
 'column_names': [None, None]}

df = pd.DataFrame.from_dict(data, orient='tight')


      first_name last_name age
0        Captain    Crunch  72
1           Trix    Rabbit  36
2          Count   Chocula  41
3           Tony     Tiger  54
4           Buzz       Bee  28
5         Toucan       Sam  38


For small datasets, I like:


Which directly copies a nice padded table:

  first_name last_name  age
0    Captain    Crunch   72
1       Trix    Rabbit   36
2      Count   Chocula   41
3       Tony     Tiger   54
4       Buzz       Bee   28
5     Toucan       Sam   38

This can be read, after copying the block of text, with:

df = pd.from_clipboard()

The poor man’s version is:



Another interesting option is to provide the data as CSV. By providing no file name, the CSV output is returned as string:



Then reading with:

import io

df = pd.read_csv(io.StringIO('''first_name,last_name,age
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.