pandas and numby to read csv and convert it from 2d vector to 1d with ignoring diagonal values

Question:

My csv file looks like this:

0  |0.1|0.2|0.4|
0.1|0  |0.5|0.6|
0.2|0.5|0  |0.9|
0.4|0.6|0.9|0  |

I try to read it row by row, ignoring the diagonal values and write it as one long column like this:

0.1
0.2
0.4
0.1
0.5
0.6
0.2
0.5
0.9
.... 

I use this method:

import numpy as np
import pandas as pd


data = pd.read_csv(r"C:Userssoso-DesktopSVMDataSetchem_Jacarrd_sim.csv")
row_vector = np.array(data)
result = row_vector.ravel()
result.reshape(299756,1)
df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv")

However the output ignores the first row and reads the zero’s like follows:
how can I fix it?

0.1
0
0.5
0.6
0.2
0.5
0
0.9
....
Asked By: Sara Almashharawi

||

Answers:

For the datframe you have:

0  |0.1|0.2|0.4
0.1|0  |0.5|0.6
0.2|0.5|0  |0.9
0.4|0.6|0.9|0  

which I saved as the ffff.csvdf, you need to do the following thing:

import numpy as np
import pandas as pd

data = pd.read_csv("ffff.csv", sep="|", header=None)
print(data)
row_vector = np.array(data)

# Create a new mask with the correct shape
mask = np.zeros((row_vector.shape), dtype=bool)
mask[np.arange(row_vector.shape[0]), np.arange(row_vector.shape[0])] = True

result = np.ma.array(row_vector, mask=mask)
result = result.compressed()

df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv", index=False)
print(df)

which returns:

    chem
0    0.1
1    0.2
2    0.4
3    0.1
4    0.5
5    0.6
6    0.2
7    0.5
8    0.9
9    0.4
10   0.6
11   0.9

This one is a bit shorter

  • assuming you have 2d numpy array
import numpy as np
arr = np.random.rand(3,3)

# array([[0.12964821, 0.92124532, 0.72456772],
#        [0.26063188, 0.1486612 , 0.45312145],
#        [0.04165099, 0.31071689, 0.26935581]])

arr_out = arr[np.where(~np.eye(arr.shape[0],dtype=bool))]

# array([0.92124532, 0.72456772, 0.26063188, 0.45312145, 0.04165099,
#        0.31071689])
Answered By: TommyLeeJones
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.