Extract data from array – Python

Question:

I am using the unleashed_py library to extract Unleashed data.

The sample of the output is as below where there could be several items in the invoice:

[{
 'OrderNumber': 'SO-00000742',
  'QuoteNumber': None,
  'InvoiceDate': '/Date(1658496322067)/',
  'InvoiceLines': [{'LineNumber': 1,
    'LineType': None},
   {'LineNumber': 2,
    'LineType': None}],
  'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
  'LastModifiedOn': '/Date(1658496322221)/'}]

I need to get a df:

enter image description here

If I run the below script, the invoice lines just get appended with the common fields such as ordernumber, quotenumber, invoicedate, guide, and lastmodifiedon not getting repeated.

order_number = []
quote_number = []
invoice_date = []
invoicelines = []
invoice_line_number = []
invoice_line_type = []
guid = []
last_modified = []


for item in df:
    order_number.append(item.get('OrderNumber'))
    quote_number.append(item.get('QuoteNumber'))
    invoice_date.append(item.get('InvoiceDate'))
    guid.append(item.get('Guid'))
    last_modified.append(item.get('LastModifiedOn'))
    lines = item.get('InvoiceLines')
    for item_sub_2 in lines:
        invoice_line_number.append('LineNumber')
        invoice_line_type.append('LineType')

df_order_number = pd.DataFrame(order_number)
df_quote_number = pd.DataFrame(quote_number)
df_invoice_date = pd.DataFrame(invoice_date)
df_invoice_line_number  = pd.DataFrame(invoice_line_number)
df_invoice_line_type = pd.DataFrame(invoice_line_type)
df_guid = pd.DataFrame(guid)
df_last_modified = pd.DataFrame(last_modified)

df_row = pd.concat([
df_order_number,
df_quote_number,
df_invoice_date,
df_invoice_line_number,
df_invoice_line_type,
df_guid,
df_last_modified
], axis = 1)

What am I doing wrong?

Asked By: eponkratova

||

Answers:

You don’t need to iterate, just create the dataframe out of the list of dictionaries you have, then explode InvoiceLines columns then apply pd.Series and join it with the original dataframe:

data = [{
    'OrderNumber': 'SO-00000742',
    'QuoteNumber': None,
    'InvoiceDate': '/Date(1658496322067)/',
    'InvoiceLines': [{'LineNumber': 1,
                      'LineType': None},
                     {'LineNumber': 2,
                      'LineType': None}],
    'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
    'LastModifiedOn': '/Date(1658496322221)/'}]


df=pd.DataFrame(data).explode('InvoiceLines')
out=pd.concat([df['InvoiceLines'].apply(pd.Series),
           df.drop(columns=['InvoiceLines'])],
          axis=1)

OUTPUT:

#out

   LineNumber  LineType  OrderNumber QuoteNumber            InvoiceDate  
0         1.0       NaN  SO-00000742        None  /Date(1658496322067)/   
0         2.0       NaN  SO-00000742        None  /Date(1658496322067)/   
                                       Guid         LastModifiedOn  
0  8f6b89da-1e6e-42288a24-902a-038041e04f06  /Date(1658496322221)/  
0  8f6b89da-1e6e-42288a24-902a-038041e04f06  /Date(1658496322221)/  

I’m leaving the date conversion and column renames for you cause I believe you can do that yourself.

Answered By: ThePyGuy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.