Merge two or more lines of text into one line with python pandas

Question:

I have a txt file like below,

140037|1|TOP SOIL DARK BROWN CLAY RICH ORGANIC|0|0.8
140037|2|MATER SOFT||
140037|3|SANDY CLAY SOFT MOTTLED GREY/ORANGE|0.8|1
140037|4|BROWN <15% SAND GRAINS||
140037|5|CLAY MOTTLED DARK GREY/ORANGE BROWN|1|3
140037|6|SOFT BECOMING FIRM MINOR SILT AND||
140037|7|FINE SAND IN SOME LAYERS||

and want to make it like this.

140037|1|TOP SOIL DARK BROWN CLAY RICH ORGANIC MATER SOFT|0|0.8
140037|2|SANDY CLAY SOFT MOTTLED GREY/ORANGE BROWN <15% SAND GRAINS|0.8|1
140037|3|CLAY MOTTLED DARK GREY/ORANGE BROWN SOFT BECOMING FIRM MINOR SILT AND FINE SAND IN SOME LAYERS|1|3

I am using pandas to read file but not sure how to use merge. Any help will be appreciated.

Asked By: ZYLUO

||

Answers:

Use:

#create DataFrame
df = pd.read_csv(file, sep="|", header=None)

#create groups by non missing values in 3,4 columns and aggregate by join and first
df1 = (df.assign(g = df[[3,4]].notna().any(axis=1).cumsum())
        .groupby([0,'g'], as_index=False)
        .agg({2:' '.join, 3:'first', 4:'first'}))

print (df1)
        0  g                                                  2    3    4
0  140037  1   TOP SOIL DARK BROWN CLAY RICH ORGANIC MATER SOFT  0.0  0.8
1  140037  2  SANDY CLAY SOFT MOTTLED GREY/ORANGE BROWN <15%...  0.8  1.0
2  140037  3  CLAY MOTTLED DARK GREY/ORANGE BROWN SOFT BECOM...  1.0  3.0

#write to new file
df1.to_csv(new_file, index=False, header=False, sep='|')
Answered By: jezrael

You can use the following syntax to combine two text columns into one in a pandas DataFrame:

df['new_column'] = df['column1'] + df['column2']

To merge multiple lines of text into one line, you can use the replace() method to remove the newline character and then use the groupby() method to group the lines by a common key and then use the agg() method to concatenate the lines. Here is an example:

df['text'] = df['text'].replace('n', ' ', regex=True)
df = df.groupby('key')['text'].agg(' '.join).reset_index()

I hope this helps!

Answered By: Suryanandan Mahesh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.