pandas converting floats to strings without decimals
Question:
I have a dataframe
df = pd.DataFrame([
['2', '3', 'nan'],
['0', '1', '4'],
['5', 'nan', '7']
])
print df
0 1 2
0 2 3 nan
1 0 1 4
2 5 nan 7
I want to convert these strings to numbers and sum the columns and convert back to strings.
Using astype(float)
seems to get me to the number part. Then summing is easy with sum()
. Then back to strings should be easy too with astype(str)
df.astype(float).sum().astype(str)
0 7.0
1 4.0
2 11.0
dtype: object
That’s almost what I wanted. I wanted the string version of integers. But floats have decimals. How do I get rid of them?
I want this
0 7
1 4
2 11
dtype: object
Answers:
Add a astype(int)
in the mix:
df.astype(float).sum().astype(int).astype(str)
0 7
1 4
2 11
dtype: object
Demonstration of example with empty cells. This was not a requirement from the OP but to satisfy the detractors
df = pd.DataFrame([
['2', '3', 'nan', None],
[None, None, None, None],
['0', '1', '4', None],
['5', 'nan', '7', None]
])
df
0 1 2 3
0 2 3 nan None
1 None None None None
2 0 1 4 None
3 5 nan 7 None
Then
df.astype(float).sum().astype(int).astype(str)
0 7
1 4
2 11
3 0
dtype: object
Because the OP didn’t specify what they’d like to happen when a column was all missing, presenting zero is a reasonable option.
However, we could also drop those columns
df.dropna(1, 'all').astype(float).sum().astype(int).astype(str)
0 7
1 4
2 11
dtype: object
Add astype(int)
right before conversion to a string:
print (df.astype(float).sum().astype(int).astype(str))
Generates the desired result.
Converting to int
(i.e. with .astype(int).astype(str)
) won’t work if your column contains nulls; it’s often a better idea to use string formatting to explicitly specify the format of your string column; (you can set this in pd.options
):
>>> pd.options.display.float_format = '{:,.0f}'.format
>>> df.astype(float).sum()
0 7
1 4
2 11
dtype: float64
For pandas >= 1.0:
<NA>
type was introduced for ‘Int64’. You can now do this:
df['your_column'].astype('Int64').astype('str')
And it will properly convert 1.0
to 1
.
Alternative:
If you do not want to change the display options of all pandas, @maxymoo solution does, you can use apply
:
df['your_column'].apply(lambda x: f'{x:.0f}')
based on toto_tico’s solution – alternative , minor changes to avoid null case become nan
df['your_column'].apply(lambda x: f'{x:.0f}' if not pd.isnull(x) else '')
The above solutions, when converting to string, will turn NaN
into a string as well. To get around that and retain NaN
, use:
c = ... # your column
np.where(
df[c].isnull(), np.nan,
df[c].apply('{:.0f}'.format)
)
Retaining NaN allows you to do stuff like convert a nullable column of integers like 19991231, 20000101, np.nan, 20000102
into date time without triggering date parsing errors.
I have a dataframe
df = pd.DataFrame([
['2', '3', 'nan'],
['0', '1', '4'],
['5', 'nan', '7']
])
print df
0 1 2
0 2 3 nan
1 0 1 4
2 5 nan 7
I want to convert these strings to numbers and sum the columns and convert back to strings.
Using astype(float)
seems to get me to the number part. Then summing is easy with sum()
. Then back to strings should be easy too with astype(str)
df.astype(float).sum().astype(str)
0 7.0
1 4.0
2 11.0
dtype: object
That’s almost what I wanted. I wanted the string version of integers. But floats have decimals. How do I get rid of them?
I want this
0 7
1 4
2 11
dtype: object
Add a astype(int)
in the mix:
df.astype(float).sum().astype(int).astype(str)
0 7
1 4
2 11
dtype: object
Demonstration of example with empty cells. This was not a requirement from the OP but to satisfy the detractors
df = pd.DataFrame([
['2', '3', 'nan', None],
[None, None, None, None],
['0', '1', '4', None],
['5', 'nan', '7', None]
])
df
0 1 2 3
0 2 3 nan None
1 None None None None
2 0 1 4 None
3 5 nan 7 None
Then
df.astype(float).sum().astype(int).astype(str)
0 7
1 4
2 11
3 0
dtype: object
Because the OP didn’t specify what they’d like to happen when a column was all missing, presenting zero is a reasonable option.
However, we could also drop those columns
df.dropna(1, 'all').astype(float).sum().astype(int).astype(str)
0 7
1 4
2 11
dtype: object
Add astype(int)
right before conversion to a string:
print (df.astype(float).sum().astype(int).astype(str))
Generates the desired result.
Converting to int
(i.e. with .astype(int).astype(str)
) won’t work if your column contains nulls; it’s often a better idea to use string formatting to explicitly specify the format of your string column; (you can set this in pd.options
):
>>> pd.options.display.float_format = '{:,.0f}'.format
>>> df.astype(float).sum()
0 7
1 4
2 11
dtype: float64
For pandas >= 1.0:
<NA>
type was introduced for ‘Int64’. You can now do this:
df['your_column'].astype('Int64').astype('str')
And it will properly convert 1.0
to 1
.
Alternative:
If you do not want to change the display options of all pandas, @maxymoo solution does, you can use apply
:
df['your_column'].apply(lambda x: f'{x:.0f}')
based on toto_tico’s solution – alternative , minor changes to avoid null case become nan
df['your_column'].apply(lambda x: f'{x:.0f}' if not pd.isnull(x) else '')
The above solutions, when converting to string, will turn NaN
into a string as well. To get around that and retain NaN
, use:
c = ... # your column
np.where(
df[c].isnull(), np.nan,
df[c].apply('{:.0f}'.format)
)
Retaining NaN allows you to do stuff like convert a nullable column of integers like 19991231, 20000101, np.nan, 20000102
into date time without triggering date parsing errors.