How to format datetimes in Power BI for a matplotlib visualization
Question:
I have a problem with the Power BI dataset that I could not resolve in the past month so In the below picture you can see my steps which are numbered.
In the first step, you can see my data source which is in an Excel file.
In the second to 5th steps, you can see my steps in the power query that involves transforming every column to text, running the python in power query to populate the dataset data frame, obtaining the dataset in the text format, and the last transformation that changed everything to data again.
My problems are visible in pictures 6 and 7.
In picture 6 I get an ISO 8601 format and I can not convert it to date ( e.g, I tried pd.to_datetime or datetime.date.fromisoformat )
In picture 7, this problem gets worse. I not only have problems with conversion but also I run not problems with NaT and other issues from time to time and nothing works
also, the reason that I changed every column to text in step 2 was because of what I read here: Python script in Power BI returns date as Microsoft.OleDb.Date
So I would be grateful if you help me with this. I do not how to make those plots because I keep running into various errors
also, here is the code
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset = pandas.DataFrame(Scale, Y 1, Y 2, Y 3, Y 4, Y 5, Y 6)
# dataset = dataset.drop_duplicates()
# Paste or type your script code here:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
plt.figure(figsize=(15, 16))
# In below plot everything is text (ISO 8601 format) and not date
#plt.plot(dataset['Scale'],dataset['Scale'])
#Below line does not create any plot at all
plt.plot(dataset['Y 1'],dataset['Y 1'])
plt.show()
Answers:
- I think it’s preferable to let the script do all of the work. As such, manage the data transformation and plotting with
pandas
.
- In Data view, set all the columns to Text
- Load
dataset
into pandas
with df = pd.DataFrame(dataset)
pd.to_datetime
to format the columns. To format multiple columns with the same format, use .apply
, which is vectorized for each column
df['col'] = pd.to_datetime(df['col'], format='...')
for a single column.
- Note that
pandas
formats the xtick labels differently depending on the time range of the data. There are many existing questions, on SO, dealing with formatting the look and frequency of the ticks and labels.
- This answer shows that when the columns don’t contain a time component, using
.dt.date
will cause the plot API to center the date xtick labels. However, within Power BI, that results in TypeError: no numeric data to plot
, so use one of the options in the code block to center the xtick labels.
pd.DataFrame.plot
to directly plot the dataframe. This uses matplotlib
as the default backend.
- Tested with
python 3.11.3
, pandas 2.0.1
, matplotlib 3.7.1
, Microsoft Power BI Desktop v2.117.984.0 64-bit (May 2023)
In Python Script editor
import pandas as pd
import matplotlib.pyplot as plt
# load the data into a dataframe
df = pd.DataFrame(dataset)
# format all of the columns, specifying the format
df = df.apply(lambda col: pd.to_datetime(col, format='%m/%d/%Y'), axis=0)
# plot the dataframe directly with pandas.DataFrame.plot
# use the y= parameter to set specific columns instead of all columns, e.g. y=['Y1']
ax = df.plot(x='Scale', title='In Data view, set each column as Text', figsize=(12, 7), rot=0)
# change the tick label to center aligned
for tick in ax.xaxis.get_major_ticks():
tick.label1.set_horizontalalignment('center')
# optionally, this can be used to center the labels instead
# plt.setp(ax.get_xticklabels(), ha="center")
# show the plot
plt.show()
Data view
Report view
Python script editor
Visualization
I have a problem with the Power BI dataset that I could not resolve in the past month so In the below picture you can see my steps which are numbered.
In the first step, you can see my data source which is in an Excel file.
In the second to 5th steps, you can see my steps in the power query that involves transforming every column to text, running the python in power query to populate the dataset data frame, obtaining the dataset in the text format, and the last transformation that changed everything to data again.
My problems are visible in pictures 6 and 7.
In picture 6 I get an ISO 8601 format and I can not convert it to date ( e.g, I tried pd.to_datetime or datetime.date.fromisoformat )
In picture 7, this problem gets worse. I not only have problems with conversion but also I run not problems with NaT and other issues from time to time and nothing works
also, the reason that I changed every column to text in step 2 was because of what I read here: Python script in Power BI returns date as Microsoft.OleDb.Date
So I would be grateful if you help me with this. I do not how to make those plots because I keep running into various errors
also, here is the code
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset = pandas.DataFrame(Scale, Y 1, Y 2, Y 3, Y 4, Y 5, Y 6)
# dataset = dataset.drop_duplicates()
# Paste or type your script code here:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
plt.figure(figsize=(15, 16))
# In below plot everything is text (ISO 8601 format) and not date
#plt.plot(dataset['Scale'],dataset['Scale'])
#Below line does not create any plot at all
plt.plot(dataset['Y 1'],dataset['Y 1'])
plt.show()
- I think it’s preferable to let the script do all of the work. As such, manage the data transformation and plotting with
pandas
. - In Data view, set all the columns to Text
- Load
dataset
intopandas
withdf = pd.DataFrame(dataset)
pd.to_datetime
to format the columns. To format multiple columns with the same format, use.apply
, which is vectorized for each columndf['col'] = pd.to_datetime(df['col'], format='...')
for a single column.- Note that
pandas
formats the xtick labels differently depending on the time range of the data. There are many existing questions, on SO, dealing with formatting the look and frequency of the ticks and labels. - This answer shows that when the columns don’t contain a time component, using
.dt.date
will cause the plot API to center the date xtick labels. However, within Power BI, that results inTypeError: no numeric data to plot
, so use one of the options in the code block to center the xtick labels.
pd.DataFrame.plot
to directly plot the dataframe. This usesmatplotlib
as the default backend.
- Tested with
python 3.11.3
,pandas 2.0.1
,matplotlib 3.7.1
,Microsoft Power BI Desktop v2.117.984.0 64-bit (May 2023)
In Python Script editor
import pandas as pd
import matplotlib.pyplot as plt
# load the data into a dataframe
df = pd.DataFrame(dataset)
# format all of the columns, specifying the format
df = df.apply(lambda col: pd.to_datetime(col, format='%m/%d/%Y'), axis=0)
# plot the dataframe directly with pandas.DataFrame.plot
# use the y= parameter to set specific columns instead of all columns, e.g. y=['Y1']
ax = df.plot(x='Scale', title='In Data view, set each column as Text', figsize=(12, 7), rot=0)
# change the tick label to center aligned
for tick in ax.xaxis.get_major_ticks():
tick.label1.set_horizontalalignment('center')
# optionally, this can be used to center the labels instead
# plt.setp(ax.get_xticklabels(), ha="center")
# show the plot
plt.show()