Getting the "ValueError: Columns must be same length as key" when splitting out the date column by "/"

Question:

I tried to split the date column into 3 separate columns

df[['date1', 'date2', 'date3']] = df['Date'].str.split('/')

Here’s the error I’m getting

ValueError: Columns must be same length as key

It works fine when I just type in df[‘Date’].str.split(‘/’) and I was able to confirm that each list has only 3 elements.

I think it might have to do with some of the dates not having the full year, so the length of the column varies but I’m not sure why this would matter here.

df['Date'].value_counts()

22/05/2022    10
26/12/05      10
26/12/08      10
11/05/14      10
12/05/2019    10
              ..
28/02/05       1
14/09/2015     1
27/09/2015     1
28/09/2015     1
17/08/2015     1

df['Date'].str.len().value_counts()

8.0     4850
10.0    2280
Name: Date, dtype: int64
Asked By: krhermit

||

Answers:

The error is caused by not having all dates in the ‘Date’ column haveing the same length.

The first format 22/05/2022 the length is 8 characters, but in the second format 26/12/05 the length is only 6 characters.

A solution would be to first convert all dates to a uniform length:

def convert_date(date):
    if len(date) == 8:
        date = "20" + date[-2:] + "/" + date[:2] + "/" + date[3:5]
return date

df['Date'] = df['Date'].apply(convert_date)

You can then split the dates into separate columns using:

df[['date1', 'date2', 'date3']] = df['Date'].str.split('/')
Answered By: Andrey Kiselev

By default Series.str.split will return a Series (i.e. a single "column") with lists that contain the split elements).

df['Date'].str.split('/')

0    [22, 05, 2022]
1      [26, 12, 05]
2      [26, 12, 08]
3     [11, 5, 2014]
4     [12, 5, 2019]
5      [28, 02, 05]
6    [14, 09, 2015]
7    [27, 09, 2015]
8    [28, 09, 2015]
9    [17, 08, 2015]
Name: Date, dtype: object

So, the error you are getting is the result of trying to assign this single "column" to three new columns. To fix this, you need to set the expand parameter to True. This will expand the result into three different columns, which we can then assign as intended:

df[['date1', 'date2', 'date3']] = df['Date'].str.split('/', expand=True)

df

         Date  Count date1 date2 date3
0  22/05/2022     10    22    05  2022
1    26/12/05     10    26    12    05
2    26/12/08     10    26    12    08
3   11/5/2014     10    11     5  2014
4   12/5/2019     10    12     5  2019
5    28/02/05      1    28    02    05
6  14/09/2015      1    14    09  2015
7  27/09/2015      1    27    09  2015
8  28/09/2015      1    28    09  2015
9  17/08/2015      1    17    08  2015

Data used

import pandas as pd

data = {'Date': {0: '22/05/2022', 1: '26/12/05', 2: '26/12/08', 3: '11/5/2014', 
                 4: '12/5/2019', 5: '28/02/05', 6: '14/09/2015', 7: '27/09/2015', 
                 8: '28/09/2015', 9: '17/08/2015'}, 
        'Count': {0: 10, 1: 10, 2: 10, 3: 10, 4: 10, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1}}

df = pd.DataFrame(data)

Incidentally, while the above should make the assignment work, if you are sure that the logic of the date strings is always: day / month / year (despite the differences in formatting), it is probably a better idea to rely on pd.to_datetime with dayfirst parameter set to True, and then to use Series.dt.day and same for month and year:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df['day'] = df['Date'].dt.day
df['month'] = df['Date'].dt.month
df['year'] = df['Date'].dt.year

df

        Date  Count  day  month  year
0 2022-05-22     10   22      5  2022
1 2005-12-26     10   26     12  2005
2 2008-12-26     10   26     12  2008
3 2014-05-11     10   11      5  2014
4 2019-05-12     10   12      5  2019
5 2005-02-28      1   28      2  2005
6 2015-09-14      1   14      9  2015
7 2015-09-27      1   27      9  2015
8 2015-09-28      1   28      9  2015
9 2015-08-17      1   17      8  2015

Note that you’ll end up with proper and consistent int values this way.

Answered By: ouroboros1
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.