How to select items inside a python list and add it to a dataframe

Question:

I have a pyspark dataframe with below columns

Dataframe: httpClient
[capacity: string, version: string]

and I have a list of columns declared as
httpClient_fields = ["capacity", "`httpClient.install`", "date"]

I need to check the dataframe if it has the list items. If items does not exist in the dataframe, I need to add it with empty values.
So, in the result, I need

Dataframe: httpClient
[capacity: string, version: string, `httpClient.install`: string, date: string]

This is my code now:

df_cols = httpClient.columns
for f in httpClient_fields:
    if f not in df_cols:
        httpClient= httpClient.withColumn(f, F.lit(''))
httpClient = httpClient.select(*httpClient_fields).dropDuplicates().repartition(1)
httpClient = httpClient.withColumnRenamed("httpClient.install","httpClient_install")

when I execute this, Im getting
cannot resolve '`httpClient.install`'

Please let me know how to solve this

Asked By: user175025

||

Answers:

You’re almost there! Notice that inside your if-statement, you’re adding the missing columns to df_res rather than to httpClient.

df_res = df_res.withColumn(f, F.lit(''))

Use that instead of httpClient in the next line:

httpClient = df_res.select(*httpClient_fields).dropDuplicates().repartition(1)
Answered By: steliosbl

Well, I’m not sure how to really parse the dot(‘.’) in there since you seems to have use backticks already. However, in some cases, this might not work as expected due to parsing issues.

So is it possible for you to replace the ‘.’ with an underscore ‘_’ from inside the loop itself.

Something like this:

for f in httpClient_fields:
    if f not in df_cols:
        if '.' in f:
            f = f.replace('.', '_')  # Replace dot with underscore
        df_res = df_res.withColumn(f, F.lit(''))

Well the above might not be the thing you are looking for, also i noticed this maybe you can try adding backticks in the last line as well:

replace this :

httpClient = httpClient.withColumnRenamed("httpClient.install","httpClient_install")

with this ( just added backticks in the last-line as well)

httpClient = httpClient.withColumnRenamed("`httpClient.install`", "httpClient_install")

Answered By: Yehan Wasura
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.