numpy append changes int to float and adds zeros

Question:

I need to transform ranges to consecutive numbers. The ranges are in ints and the result should be the same. This is what I have so far:

import numpy as np

mydata = np.array (
[49123400, 49123499],
[33554333, 33554337])

numbers_list = np.empty((0))
base_dir = "/foo.csv"

for x in mydata:
    numbers = np.arange(x[0], x[1]+1)
    numbers_list = np.append(numbers_list, numbers, axis=0)
np.savetxt(base_dir, numbers_list, delimiter=";")

What I would like to see is a list like that:

49123400,
49123401,
49123402,...
49123499,
33554333,
33554334,...
33554399

But what I get is:

4.912340000000000000e+11 and so on...

Where am I going wrong? Why is there a change from int to float, when I am doing the append?

Asked By: SLglider

||

Answers:

It helps in cases like this to step through it in an iteractive session, and look at shape and dtype at each step.

In [254]: mydata = np.array( [
     ...: [49123400, 49123499],
     ...: [33554333, 33554337]])
In [255]: mydata
Out[255]: 
array([[49123400, 49123499],
       [33554333, 33554337]])
In [256]: mydata.shape
Out[256]: (2, 2)
In [257]: mydata.dtype
Out[257]: dtype('int32')
In [258]: numbers_list = np.empty((0))
In [259]: numbers_list
Out[259]: array([], dtype=float64)

Note that numbers_list is a float array. Look into providing empty with a dtype

In [260]: x=mydata[0]
In [261]: numbers = np.arange(x[0],x[1]+1)
In [262]: numbers.dtype
Out[262]: dtype('int32')
In [263]: numbers.shape
Out[263]: (100,)
In [264]: numbers_list = np.append(numbers_list, numbers, axis=0)
In [265]: numbers_list.shape
Out[265]: (100,)
In [266]: numbers_list.dtype
Out[266]: dtype('float64')

After concatenating these 2 arrays, the result has the dtype of the numbers_list.

So changing that empty dtype should preserve the int dtype.

I have been on a crusade against np.append. This is another example of its misuse. It is just a form of np.concatenate, and often is a poor substitute for a list append

I’d suggest building a list and using one concatenate

In [267]: numbers_list = [np.arange(x[0],x[1]+1) for x in mydata]
In [268]: len(numbers_list)
Out[268]: 2
In [269]: np.concatenate(numbers_list)
Out[269]: 
array([49123400, 49123401, 49123402, 49123403, 49123404, 49123405,
       49123406, 49123407, 49123408, 49123409, 49123410, 49123411,
       49123412, 49123413, 49123414, 49123415, 49123416, 49123417,
       49123418, 49123419, 49123420, 49123421, 49123422, 49123423,
       49123424, 49123425, 49123426, 49123427, 49123428, 49123429,
  ...
       49123496, 49123497, 49123498, 49123499, 33554333, 33554334,
       33554335, 33554336, 33554337])
In [270]: _.shape
Out[270]: (105,)

Since you are using savetxt to write the numbers, look at it’s fmt parameter. The default is that scienctific notation.

With the correct fmt you will get integers:

In [272]: arr=np.concatenate(numbers_list)
In [273]: np.savetxt('test.txt',arr,fmt='%d',delimiter=',')
In [274]: cat test.txt
49123400
49123401
49123402
49123403
49123404
Answered By: hpaulj

One important lesson to learn is that you should always choose the right data structure for your problem. In most cases if you want to append/concatenate then is the wrong choice, except you can trivially setup the final array (with its final shape) and alter it by setting slices of it.

In this case the obvious choice would be to use a normal list and range:

mydata = [[49123400, 49123499],
          [33554333, 33554337]]

mynewdata = []
for sublist in mydata:
    mynewdata.extend(range(sublist[0], sublist[1]+1))

>>> mynewdata
  [49123400, 49123401, 49123402, 49123403, 49123404, 49123405,
   49123406, 49123407, 49123408, 49123409, 49123410, 49123411,
   49123412, 49123413, 49123414, 49123415, 49123416, 49123417,
   49123418, 49123419, 49123420, 49123421, 49123422, 49123423,
   49123424, 49123425, 49123426, 49123427, 49123428, 49123429,
   49123430, 49123431, 49123432, 49123433, 49123434, 49123435,
   49123436, 49123437, 49123438, 49123439, 49123440, 49123441,
   49123442, 49123443, 49123444, 49123445, 49123446, 49123447,
   49123448, 49123449, 49123450, 49123451, 49123452, 49123453,
   49123454, 49123455, 49123456, 49123457, 49123458, 49123459,
   49123460, 49123461, 49123462, 49123463, 49123464, 49123465,
   49123466, 49123467, 49123468, 49123469, 49123470, 49123471,
   49123472, 49123473, 49123474, 49123475, 49123476, 49123477,
   49123478, 49123479, 49123480, 49123481, 49123482, 49123483,
   49123484, 49123485, 49123486, 49123487, 49123488, 49123489,
   49123490, 49123491, 49123492, 49123493, 49123494, 49123495,
   49123496, 49123497, 49123498, 49123499, 33554333, 33554334,
   33554335, 33554336, 33554337]

This can be trivially converted to a numpy.array:

>>> np.array(mynewdata)
array([49123400, 49123401, 49123402, 49123403, 49123404, 49123405,
       49123406, 49123407, 49123408, 49123409, 49123410, 49123411,
       49123412, 49123413, 49123414, 49123415, 49123416, 49123417,
       49123418, 49123419, 49123420, 49123421, 49123422, 49123423,
       49123424, 49123425, 49123426, 49123427, 49123428, 49123429,
       49123430, 49123431, 49123432, 49123433, 49123434, 49123435,
       49123436, 49123437, 49123438, 49123439, 49123440, 49123441,
       49123442, 49123443, 49123444, 49123445, 49123446, 49123447,
       49123448, 49123449, 49123450, 49123451, 49123452, 49123453,
       49123454, 49123455, 49123456, 49123457, 49123458, 49123459,
       49123460, 49123461, 49123462, 49123463, 49123464, 49123465,
       49123466, 49123467, 49123468, 49123469, 49123470, 49123471,
       49123472, 49123473, 49123474, 49123475, 49123476, 49123477,
       49123478, 49123479, 49123480, 49123481, 49123482, 49123483,
       49123484, 49123485, 49123486, 49123487, 49123488, 49123489,
       49123490, 49123491, 49123492, 49123493, 49123494, 49123495,
       49123496, 49123497, 49123498, 49123499, 33554333, 33554334,
       33554335, 33554336, 33554337])

or even simply written to a file without bothering about arrays:

with open('yourfile', 'w') as file:
    file.write(str(mynewdata).replace(',', ';'))

And finally a note on why you converted your integers to floats:

>>> np.empty((0))
array([], dtype=float64)

The np.empty creates a float array and so append/concatenate will always result in float arrays. Use np.empty(0, int) if you wanted an integer array:

>>> np.empty(0, int)
array([], dtype=int64)
Answered By: MSeifert

I had the same issue with appending columns to numpy array. i was using np.arange() function to make a sample array with one column, then i was appending columns to it but the data was getting messy as you can see :

[[  0.00000000e+00  -1.56000000e+00]
[  1.00000000e+00   2.43000000e+00]
[  2.00000000e+00  -9.40000000e-01]
..., 
[  4.99700000e+03  -1.99000000e+00]
[  4.99800000e+03   4.10000000e-01]
[  4.99900000e+03  -7.00000000e-02]]

the problem didn’t go anyway even by ensuring the equality of dtypes but finally got solved by using np.zeros() instead of np.arange().

Answered By: Sami S
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.