slicing strings in Series by a different Series of Ints

Question

Say we have this dict as a dataframe with two columns:

data = {
  "slice_by" : [2, 2, 1]
  "string_to_slice" : ["one", "two", "three"]
}

First line works just fine, second one doesn’t:

df["string_to_slice"].str[:1])
df["string_to_slice"].str[:df["slice_by"])

Output:

0        ne
1        wo
2        hree
Name: string_to_slice, Length: 3, dtype: object
0       NaN
1       NaN
2       NaN
Name: string_to_slice, Length: 3, dtype: float64

What would be the appropiate way to do this? I’m sure I could make up something with df.iterrows() but that’s probably not the efficient way.

Asked By: Violeta

||

Source

Answer 1

here is one way to do it, by using apply

df.apply(lambda x: x['string_to_slice'][x['slice_by']:], axis=1)

0       e
1       o
2    hree
dtype: object

Answered By: Naveed

Answer 2

I am assuming you want str[slice_by:] and not str[:slice_by]. With that assumption you can do:

np_slice_string = np.vectorize(lambda x, y: x[y:]))
out = np_slice_string(df['string_to_slice'], df['slice_by'])

print(out):

['e' 'o' 'hree']

Answered By: SomeDude

slicing strings in Series by a different Series of Ints

Question:

Answers: