Convert floats to ints in Pandas?

Question:

I’ve been working with data imported from a CSV. Pandas changed some columns to float, so now the numbers in these columns get displayed as floating points! However, I need them to be displayed as integers or without comma. Is there a way to convert them to integers or not display the comma?

Asked By: MJP

||

Answers:

Use the pandas.DataFrame.astype(<type>) function to manipulate column dtypes.

>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"))
>>> df
          A         B         C         D
0  0.542447  0.949988  0.669239  0.879887
1  0.068542  0.757775  0.891903  0.384542
2  0.021274  0.587504  0.180426  0.574300
>>> df[list("ABCD")] = df[list("ABCD")].astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

EDIT:

To handle missing values:

>>> df
          A         B     C         D
0  0.475103  0.355453  0.66  0.869336
1  0.260395  0.200287   NaN  0.617024
2  0.517692  0.735613  0.18  0.657106
>>> df[list("ABCD")] = df[list("ABCD")].fillna(0.0).astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0
Answered By: Ryan G

To modify the float output do this:

df= pd.DataFrame(range(5), columns=['a'])
df.a = df.a.astype(float)
df

Out[33]:

          a
0 0.0000000
1 1.0000000
2 2.0000000
3 3.0000000
4 4.0000000

pd.options.display.float_format = '{:,.0f}'.format
df

Out[35]:

   a
0  0
1  1
2  2
3  3
4  4
Answered By: EdChum

Considering the following data frame:

>>> df = pd.DataFrame(10*np.random.rand(3, 4), columns=list("ABCD"))
>>> print(df)
...           A         B         C         D
... 0  8.362940  0.354027  1.916283  6.226750
... 1  1.988232  9.003545  9.277504  8.522808
... 2  1.141432  4.935593  2.700118  7.739108

Using a list of column names, change the type for multiple columns with applymap():

>>> cols = ['A', 'B']
>>> df[cols] = df[cols].applymap(np.int64)
>>> print(df)
...    A  B         C         D
... 0  8  0  1.916283  6.226750
... 1  1  9  9.277504  8.522808
... 2  1  4  2.700118  7.739108

Or for a single column with apply():

>>> df['C'] = df['C'].apply(np.int64)
>>> print(df)
...    A  B  C         D
... 0  8  0  1  6.226750
... 1  1  9  9  8.522808
... 2  1  4  2  7.739108
Answered By: user4322543
>>> import pandas as pd
>>> right = pd.DataFrame({'C': [1.002, 2.003], 'D': [1.009, 4.55], 'key': ['K0', 'K1']})
>>> print(right)
           C      D key
    0  1.002  1.009  K0
    1  2.003  4.550  K1
>>> right['C'] = right.C.astype(int)
>>> print(right)
       C      D key
    0  1  1.009  K0
    1  2  4.550  K1
Answered By: user8051244

This is a quick solution in case you want to convert more columns of your pandas.DataFrame from float to integer considering also the case that you can have NaN values.

cols = ['col_1', 'col_2', 'col_3', 'col_4']
for col in cols:
   df[col] = df[col].apply(lambda x: int(x) if x == x else "")

I tried with else x) and else None), but the result is still having the float number, so I used else "".

Answered By: enri

To convert all float columns to int

>>> df = pd.DataFrame(np.random.rand(5, 4) * 10, columns=list('PQRS'))
>>> print(df)
...     P           Q           R           S
... 0   4.395994    0.844292    8.543430    1.933934
... 1   0.311974    9.519054    6.171577    3.859993
... 2   2.056797    0.836150    5.270513    3.224497
... 3   3.919300    8.562298    6.852941    1.415992
... 4   9.958550    9.013425    8.703142    3.588733

>>> float_col = df.select_dtypes(include=['float64']) # This will select float columns only
>>> # list(float_col.columns.values)

>>> for col in float_col.columns.values:
...     df[col] = df[col].astype('int64')

>>> print(df)
...     P   Q   R   S
... 0   4   0   8   1
... 1   0   9   6   3
... 2   2   0   5   3
... 3   3   8   6   1
... 4   9   9   8   3
Answered By: Suhas_Pote

Expanding on @Ryan G mentioned usage of the pandas.DataFrame.astype(<type>) method, one can use the errors=ignore argument to only convert those columns that do not produce an error, which notably simplifies the syntax. Obviously, caution should be applied when ignoring errors, but for this task it comes very handy.

>>> df = pd.DataFrame(np.random.rand(3, 4), columns=list('ABCD'))
>>> df *= 10
>>> print(df)
...           A       B       C       D
... 0   2.16861 8.34139 1.83434 6.91706
... 1   5.85938 9.71712 5.53371 4.26542
... 2   0.50112 4.06725 1.99795 4.75698

>>> df['E'] = list('XYZ')
>>> df.astype(int, errors='ignore')
>>> print(df)
...     A   B   C   D   E
... 0   2   8   1   6   X
... 1   5   9   5   4   Y
... 2   0   4   1   4   Z

From pandas.DataFrame.astype docs:

errors : {‘raise’, ‘ignore’}, default ‘raise’

Control raising of exceptions on invalid data for provided dtype.

  • raise : allow exceptions to be raised
  • ignore : suppress exceptions. On error return original object

New in version 0.20.0.

Answered By: aebmad

The columns that needs to be converted to int can be mentioned in a dictionary also as below

df = df.astype({'col1': 'int', 'col2': 'int', 'col3': 'int'})
Answered By: prashanth

In the text of the question is explained that the data comes from a csv. Só, I think that show options to make the conversion when the data is read and not after are relevant to the topic.

When importing spreadsheets or csv in a dataframe, "only integer columns" are commonly converted to float because excel stores all numerical values as floats and how the underlying libraries works.

When the file is read with read_excel or read_csv there are a couple of options avoid the after import conversion:

  • parameter dtype allows a pass a dictionary of column names and target types like dtype = {"my_column": "Int64"}
  • parameter converters can be used to pass a function that makes the conversion, for example changing NaN’s with 0. converters = {"my_column": lambda x: int(x) if x else 0}
  • parameter convert_float will convert "integral floats to int (i.e., 1.0 –> 1)", but take care with corner cases like NaN’s. This parameter is only available in read_excel

To make the conversion in an existing dataframe several alternatives have been given in other comments, but since v1.0.0 pandas has a interesting function for this cases: convert_dtypes, that "Convert columns to best possible dtypes using dtypes supporting pd.NA."

As example:

In [3]: import numpy as np                                                                                                                                                                                         

In [4]: import pandas as pd                                                                                                                                                                                        

In [5]: df = pd.DataFrame( 
   ...:     { 
   ...:         "a": pd.Series([1, 2, 3], dtype=np.dtype("int64")), 
   ...:         "b": pd.Series([1.0, 2.0, 3.0], dtype=np.dtype("float")), 
   ...:         "c": pd.Series([1.0, np.nan, 3.0]), 
   ...:         "d": pd.Series([1, np.nan, 3]), 
   ...:     } 
   ...: )                                                                                                                                                                                                          

In [6]: df                                                                                                                                                                                                         
Out[6]: 
   a    b    c    d
0  1  1.0  1.0  1.0
1  2  2.0  NaN  NaN
2  3  3.0  3.0  3.0

In [7]: df.dtypes                                                                                                                                                                                                  
Out[7]: 
a      int64
b    float64
c    float64
d    float64
dtype: object

In [8]: converted = df.convert_dtypes()                                                                                                                                                                            

In [9]: converted.dtypes                                                                                                                                                                                           
Out[9]: 
a    Int64
b    Int64
c    Int64
d    Int64
dtype: object

In [10]: converted                                                                                                                                                                                                 
Out[10]: 
   a  b     c     d
0  1  1     1     1
1  2  2  <NA>  <NA>
2  3  3     3     3

Answered By: Francisco Puga

Use 'Int64' for NaN support

  • astype(int) and astype('int64') cannot handle missing values (numpy int)
  • astype('Int64') (note the capital I) can handle missing values (pandas int)
df['A'] = df['A'].astype('Int64') # capital I

This assumes you want to keep missing values as NaN. If you plan to impute them, you could fillna first as Ryan suggested.


Examples of 'Int64' (capital I)

  1. If the floats are already rounded, just use astype:

    df = pd.DataFrame({'A': [99.0, np.nan, 42.0]})
    
    df['A'] = df['A'].astype('Int64')
    #       A
    # 0    99
    # 1  <NA>
    # 2    42
    
  2. If the floats are not rounded yet, round before astype:

    df = pd.DataFrame({'A': [3.14159, np.nan, 1.61803]})
    
    df['A'] = df['A'].round().astype('Int64')
    #       A
    # 0     3
    # 1  <NA>
    # 2     2
    
  3. To read int+NaN data from a file, use dtype='Int64' to avoid the need for converting at all:

    csv = io.StringIO('''
    id,rating
    foo,5
    bar,
    baz,2
    ''')
    
    df = pd.read_csv(csv, dtype={'rating': 'Int64'})
    #     id  rating
    # 0  foo       5
    # 1  bar    <NA>
    # 2  baz       2
    

Notes

  • 'Int64' is an alias for Int64Dtype:

    df['A'] = df['A'].astype(pd.Int64Dtype()) # same as astype('Int64')
    
  • Sized/signed aliases are available:

    lower bound upper bound
    'Int8' -128 127
    'Int16' -32,768 32,767
    'Int32' -2,147,483,648 2,147,483,647
    'Int64' -9,223,372,036,854,775,808 9,223,372,036,854,775,807
    'UInt8' 0 255
    'UInt16' 0 65,535
    'UInt32' 0 4,294,967,295
    'UInt64' 0 18,446,744,073,709,551,615
Answered By: tdy

Although there are many options here,
You can also convert the format of specific columns using a dictionary

Data = pd.read_csv('Your_Data.csv')

Data_2 = Data.astype({"Column a":"int32", "Column_b": "float64", "Column_c": "int32"})

print(Data_2 .dtypes) # Check the dtypes of the columns

This is an useful and very fast way to change the data format of specific columns for quick data analysis.

Answered By: Fellipe Alcantara