Is there a more efficient way to apply this custom function to the entire dataset?

Question

I have a dataset that looks like this with IP addresses (for security’s sake, these are all made up):

0	1	2
100.0.200.0	160.60.30.0	NaN
NaN	101.60.10.0	10.0.0.1

I want to apply a function that would take these IP addresses (where they exist) and essentially return a sliced version of them by removing the fourth octet so it should look like this:

0	1	2
100.0.200	160.60.30	NaN
NaN	101.60.10	10.0.0

I have written the below code that does the job but it is very slow since it uses recursion and I want to be able to do this faster.

def sliceip(row):
 row = str(row)
 return row.rsplit(".",1)[0]

def applysliceip(rowx):
 for i, item in enumerate(rowx):
     rowx[i] = sliceip(item)
 return rowx


# And I apply this to the entire dataframe as such:

split_IPs = IPs.apply(lambda row: applysliceip(row))

So my Question is there a more pythonic and faster way to accomplish the above and return the same output without having to use so much memory?

Asked By: CatDad

||

Source

Answer 1

A possible solution, which uses pandas.DataFrame.applymap and regex to replace the last . and digits by empty string:

import re

df.applymap(lambda x: re.sub(r'.d+$', '', x))

Output:

           0          1       2
0  100.0.200  160.60.30     NaN
1        NaN  101.60.10  10.0.0

A faster solution, based on numpy:

import re

v = np.vectorize(lambda x: re.sub(r'.d+$', '', x))
pd.DataFrame(np.where(pd.notnull(df), v(df), df))

Answered By: PaulS

Answer 2

You can use a regular expression to match and replace instead of using a custom function.

IPs.replace(r"(d+.d+.d+).d+", r"1", regex=True)

Answered By: tdelaney

Is there a more efficient way to apply this custom function to the entire dataset?

Question:

Answers: