convert data to 1NF
Question:
I have a dataset like this
import pandas as pd
df = pd.read_csv("music.csv")
df
name
date
singer
language
phase
1
Yes or No
02.01.20
Benjamin Smith
en
1
2
Parabens
01.06.21
Rafael Galvao;Simon Murphy
pt;en
2
3
Love
12.11.20
Michaela Condell
en
1
4
Paz
11.07.19
Ana Perez; Eduarda Pinto
es;pt
3
5
Stop
12.01.21
Michael Conway;Gabriel Lee
en;en
1
6
Shalom
18.06.21
Shimon Cohen
hebr
1
7
Habibi
22.12.19
Fuad Khoury
ar
3
8
viva
01.08.21
Veronica Barnes
en
1
9
Buznanna
23.09.20
Kurt Azzopardi
mt
1
10
Frieden
21.05.21
Gabriel Meier
dt
1
11
Uruguay
11.04.21
Julio Ramirez
es
1
12
Beautiful
17.03.21
Cameron Armstrong
en
3
13
Holiday
19.06.20
Bianca Watson
en
3
14
Kiwi
21.10.20
Lachlan McNamara
en
1
15
Amore
01.12.20
Vasco Grimaldi
it
1
16
La vie
28.04.20
Victor Dubois
fr
3
17
Yom
21.02.20
Ori Azerad; Naeem al-Hindi
hebr;ar
2
18
EleftherĂa
15.06.19
Nikolaos Gekas
gr
1
This table is not in 1NF. I would like to convert in the form of pd.DataFrame, that satiesfy 1NF.
How can I do that?
I did this, but seems not work
import pandas as pd
import numpy as np
df = pd.read_csv("music.csv")
lens = list(map(len, df['singer','language].values))
res = pd.DataFrame({'name': np.repeat(
df['name'], lens), 'singer': np.concatenate(df['singer'].values),'language': np.concatenate(df['language'].values)})
print(res)
It should satisfy only 1NF not 3NF and so on.
Answers:
Split language
and singer
by values and use pd.explode
:
df['language']=df['language'].str.split(';')
df['singer']=df['singer'].str.split(";")
df.explode(['language','singer'])
Id
name
date
singer
language
phase
1
Yes or No
02.01.20
Benjamin Smith
en
1
2
Parabens
01.06.21
Rafael Galvao
pt
2
2
Parabens
01.06.21
Simon Murphy
en
2
3
Love
12.11.20
Michaela Condell
en
1
4
Paz
11.07.19
Ana Perez
es
3
4
Paz
11.07.19
Eduarda Pinto
pt
3
5
Stop
12.01.21
Michael Conway
en
1
5
Stop
12.01.21
Gabriel Lee
en
1
6
Shalom
18.06.21
Shimon Cohen
hebr
1
7
Habibi
22.12.19
Fuad Khoury
ar
3
8
viva
01.08.21
Veronica Barnes
en
1
9
Buznanna
23.09.20
Kurt Azzopardi
mt
1
10
Frieden
21.05.21
Gabriel Meier
dt
1
11
Uruguay
11.04.21
Julio Ramirez
es
1
12
Beautiful
17.03.21
Cameron Armstrong
en
3
13
Holiday
19.06.20
Bianca Watson
en
3
14
Kiwi
21.10.20
Lachlan McNamara
en
1
15
Amore
01.12.20
Vasco Grimaldi
it
1
16
La vie
28.04.20
Victor Dubois
fr
3
17
Yom
21.02.20
Ori Azerad
hebr
2
17
Yom
21.02.20
Naeem al-Hindi
ar
2
18
EleftherĂa
15.06.19
Nikolaos Gekas
gr
1
I have a dataset like this
import pandas as pd
df = pd.read_csv("music.csv")
df
name | date | singer | language | phase | |
---|---|---|---|---|---|
1 | Yes or No | 02.01.20 | Benjamin Smith | en | 1 |
2 | Parabens | 01.06.21 | Rafael Galvao;Simon Murphy | pt;en | 2 |
3 | Love | 12.11.20 | Michaela Condell | en | 1 |
4 | Paz | 11.07.19 | Ana Perez; Eduarda Pinto | es;pt | 3 |
5 | Stop | 12.01.21 | Michael Conway;Gabriel Lee | en;en | 1 |
6 | Shalom | 18.06.21 | Shimon Cohen | hebr | 1 |
7 | Habibi | 22.12.19 | Fuad Khoury | ar | 3 |
8 | viva | 01.08.21 | Veronica Barnes | en | 1 |
9 | Buznanna | 23.09.20 | Kurt Azzopardi | mt | 1 |
10 | Frieden | 21.05.21 | Gabriel Meier | dt | 1 |
11 | Uruguay | 11.04.21 | Julio Ramirez | es | 1 |
12 | Beautiful | 17.03.21 | Cameron Armstrong | en | 3 |
13 | Holiday | 19.06.20 | Bianca Watson | en | 3 |
14 | Kiwi | 21.10.20 | Lachlan McNamara | en | 1 |
15 | Amore | 01.12.20 | Vasco Grimaldi | it | 1 |
16 | La vie | 28.04.20 | Victor Dubois | fr | 3 |
17 | Yom | 21.02.20 | Ori Azerad; Naeem al-Hindi | hebr;ar | 2 |
18 | EleftherĂa | 15.06.19 | Nikolaos Gekas | gr | 1 |
This table is not in 1NF. I would like to convert in the form of pd.DataFrame, that satiesfy 1NF.
How can I do that?
I did this, but seems not work
import pandas as pd
import numpy as np
df = pd.read_csv("music.csv")
lens = list(map(len, df['singer','language].values))
res = pd.DataFrame({'name': np.repeat(
df['name'], lens), 'singer': np.concatenate(df['singer'].values),'language': np.concatenate(df['language'].values)})
print(res)
It should satisfy only 1NF not 3NF and so on.
Split language
and singer
by values and use pd.explode
:
df['language']=df['language'].str.split(';')
df['singer']=df['singer'].str.split(";")
df.explode(['language','singer'])
Id | name | date | singer | language | phase |
---|---|---|---|---|---|
1 | Yes or No | 02.01.20 | Benjamin Smith | en | 1 |
2 | Parabens | 01.06.21 | Rafael Galvao | pt | 2 |
2 | Parabens | 01.06.21 | Simon Murphy | en | 2 |
3 | Love | 12.11.20 | Michaela Condell | en | 1 |
4 | Paz | 11.07.19 | Ana Perez | es | 3 |
4 | Paz | 11.07.19 | Eduarda Pinto | pt | 3 |
5 | Stop | 12.01.21 | Michael Conway | en | 1 |
5 | Stop | 12.01.21 | Gabriel Lee | en | 1 |
6 | Shalom | 18.06.21 | Shimon Cohen | hebr | 1 |
7 | Habibi | 22.12.19 | Fuad Khoury | ar | 3 |
8 | viva | 01.08.21 | Veronica Barnes | en | 1 |
9 | Buznanna | 23.09.20 | Kurt Azzopardi | mt | 1 |
10 | Frieden | 21.05.21 | Gabriel Meier | dt | 1 |
11 | Uruguay | 11.04.21 | Julio Ramirez | es | 1 |
12 | Beautiful | 17.03.21 | Cameron Armstrong | en | 3 |
13 | Holiday | 19.06.20 | Bianca Watson | en | 3 |
14 | Kiwi | 21.10.20 | Lachlan McNamara | en | 1 |
15 | Amore | 01.12.20 | Vasco Grimaldi | it | 1 |
16 | La vie | 28.04.20 | Victor Dubois | fr | 3 |
17 | Yom | 21.02.20 | Ori Azerad | hebr | 2 |
17 | Yom | 21.02.20 | Naeem al-Hindi | ar | 2 |
18 | EleftherĂa | 15.06.19 | Nikolaos Gekas | gr | 1 |