Generating Challenging plots with Matplotlib

Question:

I have a data set that take the following form

Sample Data

The test date above may not necessarily be in chronological sequence.
I want to have 4 vertical subplots (4 rows x 1 column) representing Flow Rate, Temperature, Pressure and Concentration. Each subplot will have plots for Sample A, Sample B and Sample C.
The legend will have the Sample Names as well.

Any help in achieving this goal will greatly be appreciated.

I have tried generating the dictionary for the dataset given the constraints above without success.

Asked By: Normad68

||

Answers:

Step by step.

  1. Import packages.

    These are what I used.

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    
  2. We need a figure and 4 subplots stacked vertically.

    This is my favourite way to do that, although there are many ways.

    fig, axs = plt.subplots(4, 1, figsize=(10, 12), constrained_layout=True)
    

    Here we are saying we want subplots in 4 rows, 1 column,

    and setting the figsize to be 10 units in x and 12 units in y.

    I chose the figsize by testing different sizes and choosing the best.

    constrained layout is a shortcut to getting matplotlib to balance the spacing of artists (‘artists’ are any objects we add to a figure) it is similar to tight_layout but is a bit slower and gives better results.

    this gives (half size for space here):

enter image description here

  1. I got your data by importing it into excel from image (then I cleaned up the errors):

    [google drive link][2]

enter image description here

  1. get some lists to help shorten code later:

    samples = [s for s in data['Sample'].unique()]
    columns = [c for c in data.columns[2:]]
    
  2. This is the part I think you really needed help with. There are many ways to achieve this same result, I used a nested for loop.

    1st loop : iterate the columns.
    2nd loop : iterate the samples.

    for i, col in enumerate(columns):
      for s in samples:
    

    Here we enumerate the columns list because each subplot refers to a columns data, so we will index the subplots list (i.e. ‘axs’) using i.

  3. we now need to group data by sample, and pull the values from [‘Test Date’] and [‘col’] columns (remember ‘col’ is the current column we are iterating over in the columns list). Again there are many ways to do this but my usual method is of the form:

    DataFrame[DataFrame['Column'] == 'category']
    

    We are saying: from DataFrame get all rows where the value in DataFrame[‘Column’] is equal to category.

    Then we can get the value we want from those rows by using the regular ndarray indexing, i.e. -> list[row][column].

    So we have:

    data[data['Sample'] == s]['Test Date']
    

    and

    data[data['Sample'] == s][col]
    
  4. We now can plot the data we grabbed. I wasn’t sure which type of plot you wanted so I used the generic matplotlib.plot() function, when we use this function to plot several lists of data, it will automatically set different colors the lines.

    axs[i].plot(data[data['Sample'] == s]['Test Date'],data[data['Sample'] == s][col], label=s) 
    

So this is our nested loop to plot the data and set the legend:

fig, axs = plt.subplots(4, 1, figsize=(10, 11), constrained_layout=True)
for i, col in enumerate(columns):
    for s in samples:
        axs[i].plot(data[data['Sample'] == s]['Test Date'], data[data['Sample'] == s][col], label=s)
        axs[i].set_ylabel(col)
        axs[i].set_xlabel('Date')
        axs[i].set_title(str(col) + ' vs. Date')
        axs[i].legend(loc='center right', bbox_to_anchor=(.9655, 0.655), ncol=1)

Which Produces this set of subplots:

enter image description here

Here is a link to a colab version of the python notebook where i worked through your question.

Python Notebook

Answered By: Kiefer_Michael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.