Assign value to column and reset after nth row

Question:

I have a pandas dataframe that looks like this…

index my_column
0
1
2
3
4
5
6

What I need to do is conditionally assign values to ‘my_column’ depending on the index. The first three rows should have the values ‘dog’, ‘cat’, ‘bird’. Then, the next three rows should also have ‘dog’, ‘cat’, ‘bird’. That pattern should apply until the end of the dataset.

index my_column
0 dog
1 cat
2 bird
3 dog
4 cat
5 bird
6 dog

I’ve tried the following code to no avail.

for index, row in df.iterrows():
    counter=3
    my_column='dog'
    if counter>3
    break
    else 
    counter+=1
    my_column='cat'
    counter+=1
    if counter>3
    break
    else 
    counter+=1
    my_column='bird'
    if counter>3
    break  
Asked By: ealfons1

||

Answers:

Several problems:

  1. Your if syntax is incorrect, you are missing colons and proper indentation
  2. You are breaking out of your loop, terminating it early instead of using an if, elif, else structure
  3. You are trying to update your dataframe while iterating over it.

See this question about why you shouldn’t update while you iterate.

Instead, you could do

values = ["dog", "cat", "bird"]

num_values = len(values)

for index in df.index():
    df.at[index, "my_column"] = values[index % num_values]
    
Answered By: Dash

Advanced indexing

One solution would be to turn dog-cat-bird into a pd.Series and use advanced indexing:

dcb = pd.Series(["dog", "cat", "bird"])

df["my_column"] = dcb[df.index % len(dcb)].reset_index(drop=True)

This works by first creating an index array from df.index % len(dcb):

In [8]: df.index % len(dcb)
Out[8]: Int64Index([0, 1, 2, 0, 1, 2, 0], dtype='int64')

Then, by using advanced indexing, you can select the elements from dcb with that index array:

In [9]: dcb[df.index % len(dcb)]
Out[9]:
0     dog
1     cat
2    bird
0     dog
1     cat
2    bird
0     dog
dtype: object

Finally, notice that the index of the above array repeats. Reset it and drop the old index with .reset_index(drop=True), and finally assign to your dataframe.

Using a generator

Here’s an alternate solution using an infinite dog-cat-bird generator:

In [2]: df
Out[2]:
  my_column
0
1
2
3
4
5
6

In [3]: def dog_cat_bird():
   ...:     while True:
   ...:         yield from ("dog", "cat", "bird")
   ...:

In [4]: dcb = dog_cat_bird()

In [5]: df["my_column"].apply(lambda _: next(dcb))
Out[5]:
0     dog
1     cat
2    bird
3     dog
4     cat
5    bird
6     dog
Name: my_column, dtype: object
Answered By: ddejohn

Create a dictionary:

pet_dict = {0:'dog',
            1:'cat',
            2:'bird'}

You can get the index value using the .name and modulus (%) function by 3 to get your desired result:

df.apply (lambda x: pet_dict[x.name%3],axis=1)
0     dog
1     cat
2    bird
3     dog
4     cat
5    bird
6     dog
7     cat
8    bird
9     dog
Answered By: gputrain
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.