Why there is a difference between n1 and n2?

Question:

I read a csv data in two ways, get different results.
one way is the directly extract ‘value’ column one time from a csv using pandas
another way is to extract ‘value’ class by class and append them together.
ideally, the two results should be the same, but I do see difference.
the sequence of class is U1 U2 U7 U8 U9 U10 U98 U5 U4 U3,
not sure if the order will impact or not. any idea?

input.csv in link https://drive.google.com/file/d/1qND1NM6BK3py2ZjYw294GjhJVDzIOlHj/view?usp=sharing

inputfilename='input.csv'
data=[]
df=pd.read_csv(inputfilename)
classes=pd.unique(df['class'])
for c in classes:
    df2=df[df['class']==c]
    data+=list(df2['value'].values)
n1=np.array(data)
n2=df['value']
plt.plot(n1-n2)
plt.show()
Asked By: GreatShark

||

Answers:

The two arrays will only be the same if all the rows with the same class are grouped together in the CSV.

n1 is created by grouping all the values with the same class together. So it contains all U1 values, then all U2 values, and so on.

n2 just has all the values in the order that they appear in the CSV.

The classes are contiguous for U1, U2, U7, U8, U9, U10, and U98. But U3, U4, and U5 are all mixed together. You have a sequence of rows starting like this:

U4,-0.6
U4,-0.8
U4,-0.1
U4,-0.6
U3,-0.2
U3,0.2
U5,-0.3
U5,0.1
U3,0
U5,0.2
U5,-0.2

These will be ordered differently in the two arrays.

You could solve this by sorting the dataframe by class first.

Answered By: Barmar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.