Pandas how to read CSV and update specific cells that are currently blank

Question:

I have a CSV File that is blank in row 1 for Columns A: D. I need the file to populate the header labels.

Is there a simple way to insert values for Row 1 only for columns A, B, C, and D? Columns E through Z already have header values, it is just columns A, B, C, and D I need to force-populate.

Code I"m trying to populate the values as expected, but also seems to add a row that displays numbers for each column 1, 2,3,4,5 etc.. and forces all of the correct labels to the second row.

gg = pd.read_csv(r'\pathfilename.csv')
gg.columns[0:4] = ["Date", "Type", "SubType1", "SubType2"]

Edit , here’s sample data, that hopefully shows that the csv is blank in row 1 for columns A:D (this is when I print(gg)

 Unnamed:0   Unnamed:1 Unnamed:2  Unnamed:3   Name1             Name2          Name1     Name2
    Date      Type    Subtype Subtype2         Metric 1            metric1      metric2  metric2
    9/8/2022  Vision  LS-54   LS                  54               .0234234   .923423    .83
    9/8/2022  Vision  LS-55   LS                  55               .023234    .23423     .23
    9/8/2022  Storm   LS-54   LS                  54               .0234234   .923423    .534
    9/8/2022  Storm   LS-55   LS                  55               .023234    .23423     .343

Here’s what I want it to display:

Date     Type     SubType  SubType2             Name2          Name1     Name1       Name2
Date      Type    Subtype Subtype2         Metric 1            metric1     metric2  metric2
9/8/2022  Vision  LS-54   LS                  54               .0234234   .923423    .83
9/8/2022  Vision  LS-55   LS                  55               .023234    .23423     .23
9/8/2022  Storm   LS-54   LS                  54               .0234234   .923423    .534
9/8/2022  Storm   LS-55   LS                  55               .023234    .23423     .343

The current code does not work, but seeking to force those cells to populate the values while leaving the rest of the csv as is if that makes sense.

Asked By: hoodcodie

||

Answers:

Solution1:

You are on right track… it’s just column names of a data frame needed to change for all of them in one go; So try something like this…

gg.columns = ["Date", "Type", "SubType1", "SubType2"] + list(gg.columns[4:])

Solution2:

In case, if you don’t wish to rename the column names via indexing then this is another way, where we don’t need to know where are these columns…

gg.rename(columns={'Unnamed: 0':'Date',
                   'Unnamed: 1':'Type',
                   'Unnamed: 2':'SubType1',
                   'Unnamed: 3':'SubType2'}, inplace=True)

Another Problem Statement:

How to deal with duplicated column name while loading csv file using pandas?

Answer: If you see this link (https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) or below attached screenshot, as of now there is one parameter to allow duplicated columns, but has not implemented yet…

So, loaded data frame will have .1,.2 etc. as suffix if column names are same… A quick fix would be again the renaming… So following code will get rid of everything written after "." So use cautiously…

gg.rename(columns={col:col.split('.')[0] for col in gg.columns}, inplace=True)

enter image description here
Hope this Helps…

Answered By: Sachin Kohli
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.