Aggregate time series data to make a scatter plot

Question:

I want to make time series scatter plot for my time series data, where my data has categorical columns which needs to be aggregated by group to make plotting data first, then make scatter plot either using seaborn or matplotlib. My data is product sales prices time series data, I want to see each product owner’ price trend on different market threshold along times. I tried of using pandas.pivot_table, groupby for shaping plotting data, but couldn’t get desired plot that I want to make.

reproducible data:

here is example product data that I used; wheres I want to see each dealer’s price trend on different protein type with respect to threshold.

my attempt

here is my current attempt to aggregate my data for making plotting data but it is not giving my right plot. I bet the my way of aggregating plotting data is not correct. Can anyone point me out how to make this right to get desired plot?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn

mydf = pd.read_csv('foo.csv')
mydf=mydf.drop(mydf.columns[0], axis=1)
mydf['expected_price'] = mydf['price']*76/mydf['threshold']

g = mydf.groupby(['dealer','protein_type'])
newdf= g.apply(lambda x: pd.Series([np.average(x['threshold'])])).unstack()

but above attempt is not working because I want to have plotting data for each dealer’s market purchase price on different protein_type with different threshold along the daily time series. I don’t know what’s best way of dealing with this time series. Can anyone suggest me or correct me how to get this right?

I also tried pandas/pivot_table for aggregating my data but it is still not representing plotting data.

pv_df= pd.pivot_table(mydf, index=['date'], columns=['dealer', 'protein_type', 'threshold'],values=['price'])
pv_df= pv_df.fillna(0)
pv_df.groupby(['dealer', 'protein_type', 'threshold'])['price'].unstack().reset_index()

but above attempt is still not working. Also in my data, date is not continuous so I assume I could make plot of monthly time series line chart.

my attempt for making plot:

here is my attempt for making plot:

def scatterplot(x_data, y_data, x_label, y_label, title):
    fig, ax = plt.subplots()
    ax.scatter(x_data, y_data, s = 30, color = '#539caf', alpha = 0.75)

    ax.set_title(title)
    ax.set_xlabel(x_label)
    ax.set_ylabel(y_label)
    fig.autofmt_xdate()

desired output:

I want either line chart or scatter plot where x-axis shows monthly time series while y-axis shows price of each different protein_type on different threshold value for each different dealer along monthly time series. Here is the example possible line chart I want to have:

example line chart

Asked By: kim

||

Answers:

Updated with threshold

Option 1

  • This option was implemented after seeing the results of Option 1.
    • There is a lot of unexplained information in the plots and they do not clearly present the data
  • To clearly present the data, each plot should contain only 3 dimensions of data (e.g. date, values and cats) for one dealer, one threshold, and one protein_type.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import timedelta

# read the data in and parse the date column and set threshold as a str
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# calculate expected price
df['expected_price'] = df.price*76/df.threshold

# set threshold as a category
df.threshold = df.threshold.astype('category')

# set the index
df = df.set_index(['date', 'dealer', 'protein_type', 'threshold'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination', 'quantity']).stack().reset_index().rename(columns={'level_4': 'cats', 0: 'values'})

# plot
for pt in dfl.protein_type.unique():
    for t in dfl.threshold.unique():
        data = dfl[(dfl.protein_type == pt) & (dfl.threshold == t)]
        if not data.empty:
            utc = len(data.threshold.unique())
            f, axes = plt.subplots(nrows=utc, ncols= 2, figsize=(20, 4), squeeze=False)
            for j in range(utc):
                for i, d in enumerate(dfl.dealer.unique()):
                    data_d = data[data.dealer == d].sort_values(['cats', 'date']).reset_index(drop=True)
                    p = sns.scatterplot('date', 'values', data=data_d, hue='cats', ax=axes[j, i])
                    if not data_d.empty:
                        p.set_title(f'{d}nThreshold: {t}n{pt}')
                        p.set_xlim(data_d.date.min() - timedelta(days=60), data_d.date.max() + timedelta(days=60))
                    else:
                        p.set_title(f'{d}: No Data AvailablenThreshold: {t}n{pt}')
                    
            plt.show()

First four plots

enter image description here

Option 2

  • This results in 4 separate figures with threshold as a category type.
  • threshold must first be left as an int for the expected_price calculation, and then converted.
  • Note that my data does not have the extra unnamed column, so that will still need to be dropped, which is not shown in the following code.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read the data in and parse the date column and set threshold as a str
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# calculate expected price
df['expected_price'] = df.price*76/df.threshold

# set threshold as a category
df.threshold = df.threshold.astype('category')

# set the index
df = df.set_index(['date', 'dealer', 'protein_type', 'threshold'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination', 'quantity']).stack().reset_index().rename(columns={'level_4': 'cats', 0: 'values'})

# plot four plots with threshold
for d in dfl.dealer.unique():
    for pt in dfl.protein_type.unique():
        plt.figure(figsize=(13, 7))
        data = dfl[(dfl.protein_type == pt) & (dfl.dealer == d)]
        sns.lineplot('date', 'values', data=data, hue='threshold', style='cats')
        plt.yscale('log')
        plt.title(f'{d}: {pt}')
        plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

enter image description here
enter image description here

Original without threshold as a category

  • I don’t understand what you’re doing with the following:
    • newdf= g.apply(lambda x: pd.Series([np.average(x['threshold'])])).unstack()
    • I don’t think this is integral to the main issue, which is plotting the data
  • First, the dataframe needs to be converted to a long format and 'destination' needs to be dropped
  • There are to many dimensions to plot on a single figure
    • x='date', y='values', hue='cats', style='dealer'
    • 'protein_type' needs to have a separate figure
    • However, the data overlaps to much to be readable with 'dealer' included, so 4 plots are required.

DataFrame Setup:

  • Note that my data does not have the extra unnamed column, so that will still need to be dropped, which is not shown in the following code.
  • Use pandas.DataFrame.stack to convert the dataframe to a long form

Option 1:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read the data in
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# your calculation
df['expected_price'] = df['price']*76/df['threshold']

# set the index
df = df.set_index(['date', 'dealer', 'protein_type'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination']).stack().reset_index().rename(columns={'level_3': 'cats', 0: 'values'})

# display(dfl.head())
        date            dealer protein_type            cats    values
0 2001-12-22  Alpha Food Corps      chicken       threshold     50.00
1 2001-12-22  Alpha Food Corps      chicken        quantity  39037.00
2 2001-12-22  Alpha Food Corps      chicken           price      0.50
3 2001-12-22  Alpha Food Corps      chicken  expected_price      0.76
4 2001-12-27  Alpha Food Corps         beef       threshold     85.00

Option 2: Rolling Mean

df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])
df['expected_price'] = df['price']*76/df['threshold']
df = df.set_index('date')

# groupby aggregate rolling mean and stack
dfl = df.groupby(['dealer', 'protein_type'])[['expected_price', 'price']].rolling(7).mean().stack().reset_index().rename(columns={'level_3': 'cats', 0: 'values'})

Option 1: Two Plots

  • The 'dealer' data is to similar to be differentiated (price collusion anyone?)
for pt in dfl.protein_type.unique():
    plt.figure(figsize=(9, 5))
    data = dfl[dfl.protein_type == pt]
    sns.lineplot('date', 'values', data=data, hue='cats', style='dealer')
    plt.xlim(datetime(2001, 11, 1), datetime(2004, 8, 1))
    plt.yscale('log')
    plt.title(pt)
    plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

enter image description here

  • Even with only 'price' and 'expected_price', 'dealer' can’t be determined.

enter image description here

Option 2: Four Plots

seaborn.FacetGrid

g = sns.FacetGrid(data=dfl, col='dealer', row='protein_type', hue='cats', height=5, aspect=1.5)
g.map(sns.lineplot, 'date', 'values').add_legend()
plt.yscale('log')
g.set_xticklabels(rotation=90)

enter image description here

  • Plot of data from rolling mean

enter image description here

Nested Loop

  • This will product one column of 4 figures, selected first for dealer and then protein_type.
  • Optionally, swap the order of dealer and protein
for d in dfl.dealer.unique():
    for pt in dfl.protein_type.unique():
        plt.figure(figsize=(10, 5))
        data = dfl[(dfl.protein_type == pt) & (dfl.dealer == d)]
        sns.lineplot('date', 'values', data=data, hue='cats')
        plt.xlim(datetime(2001, 11, 1), datetime(2004, 8, 1))
        plt.yscale('log')
        plt.title(f'{d}: {pt}')
        plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

CSV Sample:

date,dealer,threshold,quantity,price,protein_type,destination
2001-12-22,Alpha Food Corps,50,39037,0.5,chicken,UK
2001-12-27,Alpha Food Corps,85,35432,1.8,beef,UK
2001-12-29,Alpha Food Corps,50,32142,0.5,chicken,UK
2001-12-30,Alpha Food Corps,85,34516,1.8,beef,UK
2002-01-02,Alpha Food Corps,85,39930,1.8,beef,UK
2002-01-04,Alpha Food Corps,85,40709,1.8,beef,UK
2002-01-08,Alpha Food Corps,94,37641,2.2,beef,UK
2002-01-08,Alpha Food Corps,85,37545,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,37564,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,37607,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,41706,1.8,beef,UK
2002-01-08,Alpha Food Corps,90,41628,2.1,beef,UK
2002-01-08,Alpha Food Corps,65,35720,0.9,chicken,UK
2002-01-09,Alpha Food Corps,94,1581,2.2,beef,UK
2002-01-09,Alpha Food Corps,85,11426,1.8,beef,UK
2002-01-09,Alpha Food Corps,85,37489,1.8,beef,UK
2002-01-09,Alpha Food Corps,90,15630,2.1,beef,UK
2002-01-09,Alpha Food Corps,80,3136,1.6,beef,UK
2002-01-10,Alpha Food Corps,85,41919,1.8,beef,UK
2002-01-10,Alpha Food Corps,90,39932,2.1,beef,UK
2002-01-10,Alpha Food Corps,90,41665,2.1,beef,UK
2002-01-10,Alpha Food Corps,90,41860,2.1,beef,UK
2002-01-10,Alpha Food Corps,65,39879,0.9,chicken,UK
2002-01-10,Alpha Food Corps,65,39884,0.9,chicken,UK
2002-01-11,Alpha Food Corps,90,37613,2.1,beef,UK
2002-01-12,Alpha Food Corps,90,41855,2.1,beef,UK
2002-01-13,Alpha Food Corps,90,37585,2.1,beef,UK
2002-01-15,Alpha Food Corps,85,41618,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41721,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41869,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41990,1.8,beef,UK
2002-01-15,Alpha Food Corps,90,41744,2.1,beef,UK
2002-01-15,Alpha Food Corps,90,41936,2.1,beef,UK
2002-01-15,Alpha Food Corps,65,41684,1.0,chicken,UK
2002-01-15,Alpha Food Corps,65,41776,1.0,chicken,UK
2002-01-16,Alpha Food Corps,94,35891,2.2,beef,UK
2002-01-16,Alpha Food Corps,85,39985,1.8,beef,UK
2002-01-16,Alpha Food Corps,85,41754,1.8,beef,UK
2002-01-16,Alpha Food Corps,85,41811,1.8,beef,UK
2002-01-16,Alpha Food Corps,90,39838,2.1,beef,UK
2002-01-16,Alpha Food Corps,80,3244,1.7,beef,UK
2002-01-17,Alpha Food Corps,94,22245,2.2,beef,UK
2002-01-17,Alpha Food Corps,85,5186,1.8,beef,UK
2002-01-17,Alpha Food Corps,90,2016,2.1,beef,UK
2002-01-17,Alpha Food Corps,90,40875,2.1,beef,UK
2002-01-17,Alpha Food Corps,65,41440,1.0,chicken,UK
2002-01-18,Alpha Food Corps,94,12525,2.2,beef,UK
2002-01-18,Alpha Food Corps,94,31325,2.2,beef,UK
2002-01-18,Alpha Food Corps,85,15486,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,29992,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,39938,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,41777,1.8,beef,UK
2002-01-18,Alpha Food Corps,90,9475,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,9960,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,41676,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,41816,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,42036,2.1,beef,UK
2002-01-18,Alpha Food Corps,65,41673,1.0,chicken,UK
2002-01-19,Alpha Food Corps,85,19961,1.8,beef,UK
2002-01-19,Alpha Food Corps,90,19955,2.1,beef,UK
2002-01-19,Alpha Food Corps,90,40437,2.1,beef,UK
2002-01-19,Alpha Food Corps,65,41574,1.0,chicken,UK
2002-01-19,Alpha Food Corps,65,41700,1.0,chicken,UK
2002-01-20,Alpha Food Corps,94,23278,2.2,beef,UK
2002-01-20,Alpha Food Corps,85,9230,1.8,beef,UK
2002-01-20,Alpha Food Corps,85,38842,1.8,beef,UK
2002-01-20,Alpha Food Corps,90,9173,2.1,beef,UK
2002-01-20,Alpha Food Corps,90,38608,2.1,beef,UK
2002-01-20,Alpha Food Corps,50,39191,0.8,chicken,UK
2002-01-22,Alpha Food Corps,94,41741,2.2,beef,UK
2002-01-22,Alpha Food Corps,85,39879,1.8,beef,UK
2002-01-22,Alpha Food Corps,85,41683,1.8,beef,UK
2002-01-22,Alpha Food Corps,85,41958,1.8,beef,UK
2002-01-22,Alpha Food Corps,90,41833,2.1,beef,UK
2002-01-23,Alpha Food Corps,94,20294,2.2,beef,UK
2002-01-23,Alpha Food Corps,85,15553,1.8,beef,UK
2002-01-23,Alpha Food Corps,85,40753,1.8,beef,UK
2002-01-23,Alpha Food Corps,85,41740,1.8,beef,UK
2002-01-23,Alpha Food Corps,90,1892,2.1,beef,UK
2002-01-23,Alpha Food Corps,90,39850,2.1,beef,UK
2002-01-23,Alpha Food Corps,80,3231,1.7,beef,UK
2002-01-23,Alpha Food Corps,65,41415,1.1,chicken,UK
2002-01-24,Alpha Food Corps,90,35473,2.1,beef,UK
2002-01-24,Alpha Food Corps,90,41824,2.1,beef,UK
2002-01-24,Alpha Food Corps,65,41721,1.1,chicken,UK
2002-01-25,Alpha Food Corps,85,19983,1.8,beef,UK
2002-01-25,Alpha Food Corps,85,35823,1.8,beef,UK
2002-01-25,Alpha Food Corps,90,19949,2.1,beef,UK
2002-01-25,Alpha Food Corps,90,41800,2.1,beef,UK
2002-01-25,Alpha Food Corps,65,40990,1.1,chicken,UK
2002-01-26,Alpha Food Corps,90,39938,2.1,beef,UK
2002-01-26,Alpha Food Corps,90,40641,2.1,beef,UK
2002-01-26,Alpha Food Corps,90,41550,2.1,beef,UK
2002-01-27,Alpha Food Corps,94,16589,2.2,beef,UK
2002-01-27,Alpha Food Corps,85,11669,1.8,beef,UK
2002-01-27,Alpha Food Corps,90,24982,2.1,beef,UK
2002-01-27,Alpha Food Corps,65,29819,1.1,chicken,UK
2002-01-29,Alpha Food Corps,94,37516,2.2,beef,UK
2002-01-29,Alpha Food Corps,85,37378,1.8,beef,UK
2002-01-29,Alpha Food Corps,85,37535,1.8,beef,UK
2002-01-29,Alpha Food Corps,85,40174,1.8,beef,UK
2002-01-29,Alpha Food Corps,90,37831,2.1,beef,UK
2002-01-30,Alpha Food Corps,94,34435,2.2,beef,UK
2002-01-30,Alpha Food Corps,94,39640,2.2,beef,UK
2002-01-30,Alpha Food Corps,85,1619,1.8,beef,UK
2002-01-30,Alpha Food Corps,85,3058,1.8,beef,UK
2002-01-30,Alpha Food Corps,85,20929,1.8,beef,UK
2002-01-30,Alpha Food Corps,90,3641,2.1,beef,UK
2002-01-30,Alpha Food Corps,90,20974,2.1,beef,UK
2002-01-30,Alpha Food Corps,90,31160,2.1,beef,UK
2002-01-30,Alpha Food Corps,92,38189,2.3,beef,UK
2002-01-31,Alpha Food Corps,94,8804,2.2,beef,UK
2002-01-31,Alpha Food Corps,85,17398,1.8,beef,UK
2002-01-31,Alpha Food Corps,90,13963,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,37673,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,40330,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,40511,2.2,beef,UK
2002-01-31,Alpha Food Corps,80,38290,1.9,beef,UK
2002-01-31,Alpha Food Corps,92,37193,2.3,beef,UK
2002-02-01,Alpha Food Corps,94,5011,2.2,beef,UK
2002-02-01,Alpha Food Corps,85,18783,1.8,beef,UK
2002-02-01,Alpha Food Corps,85,41827,1.8,beef,UK
2002-02-01,Alpha Food Corps,90,16394,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,23013,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,39923,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,41417,2.1,beef,UK
2002-02-01,Alpha Food Corps,80,15592,1.7,beef,UK
2002-02-01,Alpha Food Corps,80,38364,1.9,beef,UK
2002-02-01,Alpha Food Corps,92,37605,2.3,beef,UK
2002-02-01,Alpha Food Corps,92,39234,2.3,beef,UK
2002-02-02,Alpha Food Corps,90,34578,2.1,beef,UK
2002-02-02,Alpha Food Corps,90,41661,2.1,beef,UK
2002-02-02,Alpha Food Corps,80,3157,1.7,beef,UK
2002-02-02,Alpha Food Corps,65,41272,1.2,chicken,UK
2002-02-02,Alpha Food Corps,65,41503,1.2,chicken,UK
2002-02-02,Alpha Food Corps,92,36207,2.3,beef,UK
2002-02-05,Alpha Food Corps,94,41559,2.2,beef,UK
2002-02-05,Alpha Food Corps,85,41549,1.8,beef,UK
2002-02-05,Alpha Food Corps,85,41753,1.8,beef,UK
2002-02-05,Alpha Food Corps,85,41908,1.8,beef,UK
2002-02-05,Alpha Food Corps,90,39813,2.1,beef,UK
2002-02-05,Alpha Food Corps,90,41526,2.1,beef,UK
2002-02-05,German Food Corps,80,36031,1.9,beef,UK
2002-02-05,German Food Corps,50,38538,0.9,chicken,UK
2002-02-05,Alpha Food Corps,50,38772,0.9,chicken,UK
2002-02-05,German Food Corps,50,39099,0.9,chicken,UK
2002-02-05,German Food Corps,50,39132,0.9,chicken,UK
2002-02-05,German Food Corps,50,39207,0.9,chicken,UK
2002-02-06,Alpha Food Corps,85,41947,1.8,beef,UK
2002-02-06,German Food Corps,80,37287,1.9,beef,UK
2002-02-06,Alpha Food Corps,89,43201,2.1,beef,UK
2002-02-06,German Food Corps,50,38553,0.9,chicken,UK
2002-02-06,German Food Corps,50,38837,0.9,chicken,UK
2002-02-06,Alpha Food Corps,50,38985,0.9,chicken,UK
2002-02-06,German Food Corps,65,40386,1.4,chicken,UK
2002-02-06,Alpha Food Corps,65,41851,1.2,chicken,UK
2002-02-06,Alpha Food Corps,92,38405,2.3,beef,UK
2002-02-06,German Food Corps,73,37731,1.5,chicken,UK
2002-02-07,Alpha Food Corps,85,41097,1.9,beef,UK
2002-02-07,Alpha Food Corps,90,39582,2.1,beef,UK
2002-02-07,German Food Corps,65,38832,1.4,chicken,UK
2002-02-07,German Food Corps,50,39269,0.9,chicken,UK
2002-02-07,German Food Corps,50,40129,0.9,chicken,UK
2002-02-07,German Food Corps,50,41124,0.8,chicken,UK
2002-02-07,German Food Corps,65,41739,1.2,chicken,UK
2002-02-08,Alpha Food Corps,85,20034,1.8,beef,UK
2002-02-08,German Food Corps,85,33503,1.9,beef,UK
2002-02-08,German Food Corps,85,40780,1.9,beef,UK
2002-02-08,Alpha Food Corps,90,19913,2.1,beef,UK
2002-02-08,Alpha Food Corps,90,36682,2.1,beef,UK
2002-02-08,Alpha Food Corps,90,41624,2.1,beef,UK
2002-02-08,German Food Corps,65,37503,1.4,chicken,UK
2002-02-08,German Food Corps,50,38973,0.9,chicken,UK
2002-02-08,German Food Corps,50,39069,0.9,chicken,UK
2002-02-08,German Food Corps,50,40697,0.9,chicken,UK
2002-02-08,German Food Corps,92,36103,2.3,beef,UK
2002-02-08,Alpha Food Corps,92,38278,2.3,beef,UK
2002-02-09,Alpha Food Corps,90,39842,2.1,beef,UK
2002-02-09,Alpha Food Corps,90,16553,2.3,beef,UK
2002-02-09,Alpha Food Corps,80,18739,1.9,beef,UK
2002-02-09,German Food Corps,80,36349,1.9,beef,UK
2002-02-09,German Food Corps,65,35238,1.4,chicken,UK
2002-02-09,German Food Corps,50,38391,0.9,chicken,UK
2002-02-09,Alpha Food Corps,50,38819,0.9,chicken,UK
2002-02-09,German Food Corps,50,41691,0.9,chicken,UK
2002-02-09,Alpha Food Corps,92,40245,2.3,beef,UK
2002-02-09,German Food Corps,73,37323,1.5,chicken,UK
2002-02-09,German Food Corps,90,40312,2.2,beef,UK
2002-02-10,Alpha Food Corps,90,42108,2.1,beef,UK
2002-02-10,German Food Corps,65,37831,1.4,chicken,UK
2002-02-11,Alpha Food Corps,50,38591,0.9,chicken,UK
2002-02-12,Alpha Food Corps,94,41559,2.3,beef,UK
2002-02-12,Alpha Food Corps,85,40968,1.8,beef,UK
2002-02-12,Alpha Food Corps,85,41985,1.8,beef,UK
2002-02-12,German Food Corps,50,38931,0.9,chicken,UK
2002-02-12,German Food Corps,50,38986,0.9,chicken,UK
2002-02-12,German Food Corps,92,39684,2.3,beef,UK
2002-02-12,German Food Corps,73,36619,1.5,chicken,UK
2002-02-13,Alpha Food Corps,85,41291,1.8,beef,UK
2002-02-13,Alpha Food Corps,85,41892,1.8,beef,UK
Answered By: Trenton McKinney

In a lineplot, as far as I know, you can represent only 4 dimension:

  • x axis, you can use it for the date
  • y axis, you can use it for the price
  • line hue, you can use it for the threshold
  • line style, you can use it for the dealer

But you want to take into account a 5-th dimension: protein_type. For that, I suggest to use a subplot as in the code below:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)

# convert 'date' type to datetime and sort values by threshold, then by date
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = 1,
                       figsize = (10, 10),
                       sharex = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # filter dataframe
    df_filtered = mydf[mydf['protein_type'] == protein_type]

    # set up plot
    sns.lineplot(ax = ax[i],
                 data = df_filtered,
                 x = 'date',
                 y = 'price',
                 hue = 'threshold',
                 style = 'dealer',
                 legend = 'full',
                 ci = False)

    # set up subplot title and legend
    ax[i].set_title(f'Protein type = {protein_type}')
    ax[i].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.85,
                    bottom = 0.05,
                    left = 0.05,
                    hspace = 0.15)

# show the plot
plt.show()

enter image description here


In the above plot could be difficoult to appreciate differences between dealers, so you can separate them in another subplot grid like in the code below:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)

# convert 'date' type to datetime and sort values by threshold, then by date
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold, one column for each dealer
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = len(mydf['dealer'].unique()),
                       figsize = (10, 10),
                       sharex = True,
                       sharey = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # loop over dealer
    for j, dealer in enumerate(mydf['dealer'].unique(), 0):

        # filter dataframe
        df_filtered = mydf[(mydf['protein_type'] == protein_type) & (mydf['dealer'] == dealer)]

        # set up plot
        sns.lineplot(ax = ax[i, j],
                     data = df_filtered,
                     x = 'date',
                     y = 'price',
                     hue = 'threshold',
                     legend = 'full',
                     ci = False)

        # set up subplot title and legend
        ax[i, j].set_title(f'Protein type = {protein_type} | Dealer = {dealer}')
        ax[i, j].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.9,
                    bottom = 0.05,
                    left = 0.05,
                    wspace = 0.3,
                    hspace = 0.2)

# show the plot
plt.show()

enter image description here


Finally, if you want to compare price with expected_price, you can use the style dimension for this task.
This requires a different aggragation of the dataframe: you have to stack price and expected_price columns in a unique column. You can do this with the pd.melt method.
Check the code below as a reference:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)
mydf['expected_price'] = mydf['price']*76/mydf['threshold']

# convert 'date' type to datetime
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')

# reshape dataframe
mydf = pd.melt(frame = mydf,
               id_vars = ['date', 'dealer', 'threshold', 'quantity', 'protein_type', 'destination'],
               value_vars = ['price', 'expected_price'],
               var_name = 'price type',
               value_name = 'price value')

# sort values by threshold, then by date
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold, one column for each dealer
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = len(mydf['dealer'].unique()),
                       figsize = (10, 10),
                       sharex = True,
                       sharey = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # loop over dealer
    for j, dealer in enumerate(mydf['dealer'].unique(), 0):

        # filter dataframe
        df_filtered = mydf[(mydf['protein_type'] == protein_type) & (mydf['dealer'] == dealer)]

        # set up plot
        sns.lineplot(ax = ax[i, j],
                     data = df_filtered,
                     x = 'date',
                     y = 'price value',
                     hue = 'threshold',
                     style = 'price type',
                     legend = 'full',
                     ci = False)

        # set up subplot title and legend
        ax[i, j].set_title(f'Protein type = {protein_type} | Dealer = {dealer}')
        ax[i, j].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.9,
                    bottom = 0.05,
                    left = 0.05,
                    wspace = 0.3,
                    hspace = 0.2)

# show the plot
plt.show()

enter image description here

Answered By: Zephyr