Change Column Names using Dictionary (key value pair) in Databricks
Question:
I am new to Databricks and python, I just want to know the best way to change the column names in Databricks. For example if the column name is ‘ID’ then I want to change that to Patient_ID ,’Name’ to ‘Patient_Name’.. So I thought I will use dictionaries but i don’t know how to apply that as col names.
Please help, thanks in advance.
Note: the position of col names can change so thought of using dictionary.
Dictionary = {<ID> : <Patient_ID>, <Name> : <Patient_Name>,<Age> : <Patient_age>}
Example of what I am trying to achieve(picture attached)
I tried using a json file to do this but i ended up no wr
Answers:
Given the following dataset
columns=["ID","Name","Age","Country"]
data = [(1,"John","42","Spain"),(2,"Jane","24","Norway"),(3,"Nohj","38","Iceland"),(4,"Fabrice","65","France")]
df=spark.createDataFrame(data,columns)
df.show()
+---+-------+---+-------+
| ID| Name|Age|Country|
+---+-------+---+-------+
| 1| John| 42| Spain|
| 2| Jane| 24| Norway|
| 3| Nohj| 38|Iceland|
| 4|Fabrice| 65| France|
+---+-------+---+-------+
You could loop on your dictionary as follows :
dictionary = {"ID": "Patient_ID", "Name": "Patient_Name", "Age": "Patient_Age"}
for column in dictionary.keys() :
df = df.withColumnRenamed(column,dictionary[column])
df.show()
+----------+-----------+-----------+-------+
|Patient_ID|Patient_Name|Patient_Age|Country|
+----------+-----------+-----------+-------+
| 1| John| 42| Spain|
| 2| Jane| 24| Norway|
| 3| Nohj| 38|Iceland|
| 4| Fabrice| 65| France|
+----------+-----------+-----------+-------+
I am new to Databricks and python, I just want to know the best way to change the column names in Databricks. For example if the column name is ‘ID’ then I want to change that to Patient_ID ,’Name’ to ‘Patient_Name’.. So I thought I will use dictionaries but i don’t know how to apply that as col names.
Please help, thanks in advance.
Note: the position of col names can change so thought of using dictionary.
Dictionary = {<ID> : <Patient_ID>, <Name> : <Patient_Name>,<Age> : <Patient_age>}
Example of what I am trying to achieve(picture attached)
I tried using a json file to do this but i ended up no wr
Given the following dataset
columns=["ID","Name","Age","Country"]
data = [(1,"John","42","Spain"),(2,"Jane","24","Norway"),(3,"Nohj","38","Iceland"),(4,"Fabrice","65","France")]
df=spark.createDataFrame(data,columns)
df.show()
+---+-------+---+-------+
| ID| Name|Age|Country|
+---+-------+---+-------+
| 1| John| 42| Spain|
| 2| Jane| 24| Norway|
| 3| Nohj| 38|Iceland|
| 4|Fabrice| 65| France|
+---+-------+---+-------+
You could loop on your dictionary as follows :
dictionary = {"ID": "Patient_ID", "Name": "Patient_Name", "Age": "Patient_Age"}
for column in dictionary.keys() :
df = df.withColumnRenamed(column,dictionary[column])
df.show()
+----------+-----------+-----------+-------+
|Patient_ID|Patient_Name|Patient_Age|Country|
+----------+-----------+-----------+-------+
| 1| John| 42| Spain|
| 2| Jane| 24| Norway|
| 3| Nohj| 38|Iceland|
| 4| Fabrice| 65| France|
+----------+-----------+-----------+-------+