Pandas: change data type of Series to String
Question:
I use Pandas ‘ver 0.12.0’ with Python 2.7 and have a dataframe as below:
df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610],
'colour': ['black', 'white','white','white',
'black', 'black', 'white', 'white'],
'shape': ['round', 'triangular', 'triangular','triangular','square',
'triangular','round','triangular']
}, columns= ['id','colour', 'shape'])
The id
Series consists of some integers and strings. Its dtype
by default is object
. I want to convert all contents of id
to strings. I tried astype(str)
, which produces the output below.
df['id'].astype(str)
0 1
1 5
2 z
3 1
4 1
5 7
6 2
7 6
1) How can I convert all elements of id
to String?
2) I will eventually use id
for indexing for dataframes. Would having String indices in a dataframe slow things down, compared to having an integer index?
Answers:
You can convert all elements of id to str
using apply
df.id.apply(str)
0 123
1 512
2 zhub1
3 12354.3
4 129
5 753
6 295
7 610
Edit by OP:
I think the issue was related to the Python version (2.7.), this worked:
df['id'].astype(basestring)
0 123
1 512
2 zhub1
3 12354.3
4 129
5 753
6 295
7 610
Name: id, dtype: object
You must assign it, like this:-
df['id']= df['id'].astype(str)
Personally none of the above worked for me.
What did:
new_str = [str(x) for x in old_obj][0]
A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str')
nor astype(str)
work.
As per the documentation, a Series can be converted to the string datatype in the following ways:
df['id'] = df['id'].astype("string")
df['id'] = pandas.Series(df['id'], dtype="string")
df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)
You can use:
df.loc[:,'id'] = df.loc[:, 'id'].astype(str)
This is why they recommend this solution: Pandas doc
TD;LR
To reflect some of the answers:
df['id'] = df['id'].astype("string")
This will break on the given example because it will try to convert to StringArray which can not handle any number in the ‘string’.
df['id']= df['id'].astype(str)
For me this solution throw some warning:
> SettingWithCopyWarning:
> A value is trying to be set on a copy of a
> slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
Your problem can easily be solved by converting it to the object first. After it is converted to object, just use "astype" to convert it to str.
obj = lambda x:x[1:]
df['id']=df['id'].apply(obj).astype('str')
For me it worked:
df['id'].convert_dtypes()
see the documentation here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html
There are two possibilities:
- Use
.astype("str").astype("string")
. As seen here
- Use
.astype(pd.StringDtype())
. From the official documentation
for me .to_string() worked
df['id']=df['id'].to_string()
use pandas string methods ie df['id'].str.cat()
If you want to do dynamically
df_obj = df.select_dtypes(include='object')
df[df_obj.columns] = df_obj.astype(str)
I use Pandas ‘ver 0.12.0’ with Python 2.7 and have a dataframe as below:
df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610],
'colour': ['black', 'white','white','white',
'black', 'black', 'white', 'white'],
'shape': ['round', 'triangular', 'triangular','triangular','square',
'triangular','round','triangular']
}, columns= ['id','colour', 'shape'])
The id
Series consists of some integers and strings. Its dtype
by default is object
. I want to convert all contents of id
to strings. I tried astype(str)
, which produces the output below.
df['id'].astype(str)
0 1
1 5
2 z
3 1
4 1
5 7
6 2
7 6
1) How can I convert all elements of id
to String?
2) I will eventually use id
for indexing for dataframes. Would having String indices in a dataframe slow things down, compared to having an integer index?
You can convert all elements of id to str
using apply
df.id.apply(str)
0 123
1 512
2 zhub1
3 12354.3
4 129
5 753
6 295
7 610
Edit by OP:
I think the issue was related to the Python version (2.7.), this worked:
df['id'].astype(basestring)
0 123
1 512
2 zhub1
3 12354.3
4 129
5 753
6 295
7 610
Name: id, dtype: object
You must assign it, like this:-
df['id']= df['id'].astype(str)
Personally none of the above worked for me.
What did:
new_str = [str(x) for x in old_obj][0]
A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str')
nor astype(str)
work.
As per the documentation, a Series can be converted to the string datatype in the following ways:
df['id'] = df['id'].astype("string")
df['id'] = pandas.Series(df['id'], dtype="string")
df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)
You can use:
df.loc[:,'id'] = df.loc[:, 'id'].astype(str)
This is why they recommend this solution: Pandas doc
TD;LR
To reflect some of the answers:
df['id'] = df['id'].astype("string")
This will break on the given example because it will try to convert to StringArray which can not handle any number in the ‘string’.
df['id']= df['id'].astype(str)
For me this solution throw some warning:
> SettingWithCopyWarning:
> A value is trying to be set on a copy of a
> slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
Your problem can easily be solved by converting it to the object first. After it is converted to object, just use "astype" to convert it to str.
obj = lambda x:x[1:]
df['id']=df['id'].apply(obj).astype('str')
For me it worked:
df['id'].convert_dtypes()
see the documentation here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html
There are two possibilities:
- Use
.astype("str").astype("string")
. As seen here - Use
.astype(pd.StringDtype())
. From the official documentation
for me .to_string() worked
df['id']=df['id'].to_string()
use pandas string methods ie df['id'].str.cat()
If you want to do dynamically
df_obj = df.select_dtypes(include='object')
df[df_obj.columns] = df_obj.astype(str)