How to concatenate columns in CSV file using Python and Count the Total per UniqueID?

Question:

This question have been asked multiple times in this community but I couldn’t find the correct answers since I am beginner in Python. I got 2 questions actually:

  1. I want to concatenate 3 columns (A,B,C) with its value into 1 Column. Header would be ABC.

import os
import pandas as pd

directory = ‘C:/Path’
ext = (‘.csv’)

for filename in os.listdir(directory):
f = os.path.join(directory, filename)

if f.endswith(ext):

    head_tail = os.path.split(f)
    head_tail1 = 'C:/Output'
    k =head_tail[1]
    r=k.split(".")[0]

    p=head_tail1 + "/" + r + " - Revised.csv"

    mydata = pd.read_csv(f)

    new =mydata[["A","B","C","D"]]
    new = new.rename(columns={'D': 'Total'})
    new['Total'] = 1
    new.to_csv(p ,index=False)
  1. Once concatenated, is it possible to count the uniqueid and put the total in Column D? Basically, to get the total count per uniqueid (Column ABC),the data can be found on a link when you click that UniqueID. For ex: Column ABC – uniqueid1, -> click -> go to the next page, total of that uniqueid.

On the link page, you can get the total numbers of uniqueid by Serial ID

I have no idea how to do this, but I would really appreciate if someone can help me on this project and would learn a lot from this.

Thank you very much. God Bless

Searched in Google, Youtube and Stackoverflow, couldn’t find the correct answer.

Asked By: QuickSilver42

||

Answers:

Next time, try to specify your issues and give a minimal reproducible example.
This is just an example how to use pd.melt and pd.groupby.
I hope it helps with your question.

import pandas as pd

### example dataframe
df = pd.DataFrame([['first', 1, 2, 3], ['second', 4, 5, 6], ['third', 7, 8, 9]], columns=['ID', 'A', 'B', 'C'])

### directly sum up A, B and C
df['total'] = df.sum(axis=1, numeric_only=True)
print(df)

### how to create a so called long dataframe with melt
df_long = pd.melt(df, id_vars='ID', value_vars=['A', 'B', 'C'], var_name='ABC')
print(df_long)

### group long dataframe by column and sum up all values with this ID
df_group = df_long.groupby(by='ID').sum()
print(df_group)
Answered By: steTATO

I’m not sure that I understand your question correctly. However, if you know exactly the column names (e.g., A, B, and C) that you want to concatenate you can do something like code below.

''.join(merge_columns) is to concatenate column names.

new[merge_columns].apply(lambda x: ''.join(x), axis=1) is to concatenate their values.

Then, you can count unique values of the new column using groupby().count().

new = mydata[["A","B","C","D"]]
new = new.rename(columns={'D': 'Total'})
new['Total'] = 1

# added lines
merge_columns = ['A', 'B', 'C']
merged_col = ''.join(merge_columns)
new[merged_col] = new[merge_columns].apply(lambda x: ''.join(x), axis=1)
new.drop(merge_columns, axis=1, inplace=True)
new = new.groupby(merged_col).count().reset_index()

new.to_csv(p ,index=False)

example:

# before

>     new

   A  B  C  Total
0  a  b  c      1
1  x  y  z      1
2  a  b  c      1

# after execute added lines

>     new

   ABC  Total
0  abc      2
1  xyz      1
Answered By: JayPeerachai