Merge two different dataframes in pyspark

Question:

I have two different dataframes, one is date combinations, and one is city pairs:

df_date_combinations:

+-------------------+-------------------+
|            fs_date|            ss_date|
+-------------------+-------------------+
|2022-06-01T00:00:00|2022-06-02T00:00:00|
|2022-06-01T00:00:00|2022-06-03T00:00:00|
|2022-06-01T00:00:00|2022-06-04T00:00:00|
+-------------------+-------------------+

city pairs:

+---------+--------------+---------+--------------+
|fs_origin|fs_destination|ss_origin|ss_destination|
+---------+--------------+---------+--------------+
|      TLV|           NYC|      NYC|           TLV|
|      TLV|           ROM|      ROM|           TLV|
|      TLV|           BER|      BER|           TLV|
+---------+--------------+---------+--------------+

I want to combine them so I will have the following dataframe:

+----------+----------+---------+--------------+---------+--------------+
|   fs_date|   ss_date|fs_origin|fs_destination|ss_origin|ss_destination|
+----------+----------+---------+--------------+---------+--------------+
|2022-06-01|2022-06-02|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-03|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-04|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-02|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-03|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-04|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-02|      TLV|           BER|      BER|           TLV|
|2022-06-01|2022-06-03|      TLV|           BER|      BER|           TLV|
|2022-06-01|2022-06-04|      TLV|           BER|      BER|           TLV|
+----------+----------+---------+--------------+---------+--------------+

Thanks!

Asked By: Daniel Avigdor

||

Answers:

sounds like a cross join.

df1.crossJoin(df2)
Answered By: walking

Pandas actually has built-in methods to do this, we use concat to concatenate the dataframes. You can read how to do this here:

The part that is pertinent to you would be:

pd.concat([df_date_combinations, city_pairs], axis = 1)

Hope this helps!

Answered By: Jacob Lloyd