Reading excel table into dataframe and spliting values into lists
Question:
I have an excel file containing some columns and, in each column some values to be searched into a database.
I want to read this file (I am using pandas because its a very simple way to read excel files) and extract info into variables:
Desired extract information of each row
Company : Ebay (STR format)
company_name_for_search : [EBAY, eBay, Ebay] (list of strings)
company_register: [4722,4721] (list os ints)
Getting this info, I will run a search script. Some info must be lists because the script will do e search for every item inside the list (for loop).
When I read the excel file, each column is read as a object type in a dataframe, so I couldn’t access each value inside such object.
How to split values, change formats and deal with that?
Answers:
Your variables are represented as single strings rather than rows of strings and numbers.
Instead of:
company_name
register
eBay
4722
eBay
4721
Amazon
9999
You have:
company_name
register
ebay,ebay
4722,4721
amazon
9999
You can split each string and then explode the resulting Series containing arrays to get a long form DataFrame.
import pandas as pd
mess = pd.DataFrame(
{
"letters": ["A,B", "C,D", "E,F,G,H"],
"nums": ["100,200", "300,400", "500, 600, 700, 800"],
}
)
mess = mess.apply(lambda col: col.str.split(",").explode())
I have an excel file containing some columns and, in each column some values to be searched into a database.
I want to read this file (I am using pandas because its a very simple way to read excel files) and extract info into variables:
Desired extract information of each row
Company : Ebay (STR format)
company_name_for_search : [EBAY, eBay, Ebay] (list of strings)
company_register: [4722,4721] (list os ints)
Getting this info, I will run a search script. Some info must be lists because the script will do e search for every item inside the list (for loop).
When I read the excel file, each column is read as a object type in a dataframe, so I couldn’t access each value inside such object.
How to split values, change formats and deal with that?
Your variables are represented as single strings rather than rows of strings and numbers.
Instead of:
company_name | register |
---|---|
eBay | 4722 |
eBay | 4721 |
Amazon | 9999 |
You have:
company_name | register |
---|---|
ebay,ebay | 4722,4721 |
amazon | 9999 |
You can split each string and then explode the resulting Series containing arrays to get a long form DataFrame.
import pandas as pd
mess = pd.DataFrame(
{
"letters": ["A,B", "C,D", "E,F,G,H"],
"nums": ["100,200", "300,400", "500, 600, 700, 800"],
}
)
mess = mess.apply(lambda col: col.str.split(",").explode())