I want to check if row value equals column name and change the values of the row that come after the intersecting value

Question:

I have a time series data, converted to a dataframe. I have multiple columns, where the first column is timestamps and rest of the column names are timestamps with prices as values.

Sample dataframe :

sample dataframe

The idea is to iterate over the rows and check if the row value in ‘date’ column matches with any column name (highlighted in blue), if it does then the value at the intersection (highlighted in yellow) should stay and all the values after it (highlighted in grey) should be replaced with null or 0’s.

For example : value in first column "2022-01-02 00:00:00+01:00" matches with column with the same name "2022-01-02 00:00:00+01:00". So the intersecting value i.e. "80.82" should stay and the rest of the values in that row (highlighted in grey) should replaced with null or 0’s.

I have tried the following but this replaces the intersecting value.

for i in df.columns:
     df.loc[df['date']==i,i]=None
Asked By: Newlearner

||

Answers:

"The idea is to iterate over the rows and check if the row value in ‘date’ column matches with any column name df['date'] == column, if it does then the value at the intersection should stay and all the values after it df.columns[(idx+1):] should be replaced with null or 0’s ."

for idx, column in enumerate(df.columns):
    df.loc[df['date'] == column, df.columns[(idx+1):]]=None

The enumerate() function adds a counter to an iterable.

Explanation:

df.columns = ["date", "2022-01-01 23:55:00+01:00", "2022-01-02 00:00:00+01:00", "2022-01-02 00:05:00+01:00", ...]

list(enumerate(df.columns))

# return
[(0, "date"), (1, "2022-01-01 23:55:00+01:00"), (2, "2022-01-02 00:00:00+01:00"), (3, "2022-01-02 00:05:00+01:00"), ...]

In the for loop, the first (0, "date") and second iteration (1, "2022-01-01 23:55:00+01:00") didn’t match with any row. The third iteration match with the first row. Using this as example

idx, column = 2, "2022-01-02 00:00:00+01:00"

df['date'] == column returns a list of boolean [True, False, False, False, ...]

df.columns[(idx+1):] is df.columns[3:] which returns a list of column names after "2022-01-02 00:00:00+01:00", which is ["2022-01-02 00:05:00+01:00", "2022-01-02 00:10:00+01:00", ...]

df.loc[df['date'] == column, df.columns[(idx+1):]]=None

is equivalent to

df.loc[[True, False, False, ...], ["2022-01-02 00:05:00+01:00", "2022-01-02 00:10:00+01:00", ...]]=None

which fill the first row and the column after "2022-01-02 00:00:00+01:00" as None.

Answered By: wavetitan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.