How i can create a function which analize text in rows in 1 columns and rewrite in others

Question

For example – i have smth like this

all info in 1 column

	header	Company	price	data
1	done
2	nan
3	comp1
4	nan
5	price:222
6	result:ok, 22.05.19
7	nan
8	done
9	nan
10	comp2
11	nan
12	price:2895
13	result:no, 15.05.19
14	nan
15	done
16	and ctr

i try to find ot how to creat a function that will put info in other columns
the right result
header 1

num	Company	price	result	date
1	Comp1	222	ok	22.05.19
2	Comp2	2895	no	15.05.19
etc

Asked By: Ali sh

||

Source

Answer 1

The best idea is likely to import the data correctly from the beginning.

That said, and mostly for fun, you can fix this format using a regex and reshaping:

# remove NaNs, and "done" rows
s = df['header'].loc[df['header'].ne('done')].dropna()

# extract the Price/Result/Date
pattern = r'price:(?P<Price>d+)|result:(?P<Result>w+), (?P<Date>[d.]+)'
df2 = s.str.extract(pattern)

# identify potential company names
m = df2.isna().all(axis=1)

# reshape
out = (df2
    .assign(Company=s.where(m))
    .set_index(m.cumsum(), append=True)
    .droplevel(0)
    .stack().unstack()
    .dropna(subset=['Price', 'Result', 'Date'], how='all')
)

Output:

  Price Result      Date Company
1   222     ok  22.05.19   comp1
2  2895     no  15.05.19   comp2

Answered By: mozway

Answer 2

I can see the company, status, dates position have pattern with price. If it the same with your actual data, you can do as below:

Input:

df = pd.DataFrame({'header':['done', None, 'comp1', None, 'price:222', 'result:ok, 22.05.19', None, 'Done', None,
                           'comp2', None, 'price:2895',  'result:no, 15.05.19', None, 'Done', 'and ctr'],'Company':None, 'price':None, 'data':None})

Code:

ls=list(df.header.str.split(':')) 
pd.DataFrame([{'Company':ls[i-2][0], 'price': v[1], 'result': ls[i+1][1].split(',')[0], 'date':ls[i+1][1].split(',')[1]} 
                                   for i, v in enumerate(ls) if 'price'in str(v)])

Output:

   Company  price   result  date
0   comp1   222     ok      22.05.19
1   comp2   2895    no      15.05.19

Answered By: R. Baraiya

Answer 3

This is answer made with mozway’s good idea.

Example

data = ['done', None, 'comp1', None, 'price:222', 'result:ok, 22.05.19', None, 
        'done', None, 'comp2', None, 'price:2895','result:no, 15.05.19', None, 'done']
df = pd.DataFrame(data, columns=['header'])

df

    header
0   done
1   None
2   comp1
3   None
4   price:222
5   result:ok, 22.05.19
6   None
7   done
8   None
9   comp2
10  None
11  price:2895
12  result:no, 15.05.19
13  None
14  done

Code

pattern = r'(?P<Company>compd)|price:(?P<Price>d+)|result:(?P<Result>w+), (?P<Date>[d.]+)'
grouper = df['header'].str.contains('^comp').cumsum()
out = (df['header'].str.extract(pattern).groupby(grouper).first()
       .dropna(subset=['Company']).set_index('Company'))

out

        Price   Result  Date
Company         
comp1   222     ok      22.05.19
comp2   2895    no      15.05.19

Answered By: Panda Kim

How i can create a function which analize text in rows in 1 columns and rewrite in others

Question:

Answers: