make 3d numpy array using for loop in python
Question:
I have training data with 2 dimension. (200 results of 4 features)
I proved 100 different applications with 10 repetition resulting 1000 csv files.
I want to stack each csv results for machine learning.
But I don’t know how.
each of my csv files look like below.
test1.csv to numpy array data
[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]]
I tried below python code.
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
cnt=0
for f in csv_files:
cnt +=1
seperator = '_'
app = os.path.basename(f).split(seperator, 1)[0]
if cnt==1:
a = np.array(preprocess(f))
b = np.array(app)
else:
a = np.vstack((a, np.array(preprocess(f))))
b = np.append(b,app)
print(a)
print(b)
preprocess function returns df.to_numpy results for each csv files.
My expectation was like below. a(1000, 200, 4)
[[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]],
[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]],
...
[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]]]
However, I’m getting this. a(200000, 4)
[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]]
I want to access each csv results using a[0] to a[1000] each sub-array looks like (200,4)
How can I solve the problem? I’m quite lost
Answers:
make a new list and append each to that new list after reading.
(make new list outside the loop)
You have to change from vstack
to stack
la=[]
lb=[]
for f in csv_files:
cnt +=1
seperator = '_'
app = os.path.basename(f).split(seperator, 1)[0]
la.append(preprocess(f))
lb.append(app)
a=np.stack(la, axis=0)
b=np.array(lb)
vstack
can stack along rows only but stack
function can stack along a new axis.
Well, yes, that is what vstack
(and append
) does. It merges things on the same axis (rows axis).
a1=np.arange(10).reshape(2,5)
# [[0,1,2,3,4],
# [5,6,7,8,9]]
a2=np.arange(10,20).reshape(2,5)
# [[10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]])
np.vstack((a1,a2))
# [[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]])
b1=np.arange(5)
b2=np.arange(5,10)
np.append(b1,b2)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
If you expect (from those examples), to append along a new axis, then you need to add it, or to use more flexible stack
.
np.vstack(([a1],[a2]))
#array([[[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9]],
#
# [[10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]]])
Or, in the case of 1d, use vstack
instead of append
np.vstack((b1,b2))
#array([[0, 1, 2, 3, 4],
# [5, 6, 7, 8, 9]])
But more importantly, you shouldn’t be doing this in the first place inside a loop. Each of those functions (stack
, vstack
, append
) recreates a new array.
It would be probably more efficient to just append all your np.array(preprocess(f))
and b = np.array(app)
to a pure python list, and call stack
and vstack
only once you’ve read them all.
Or, even better, just append directly the preprocess(f)
and the app
inside python list. And call np.array
only after the loop, and the whole thing.
So, something like
la=[]
lb=[]
for f in csv_files:
cnt +=1
seperator = '_'
app = os.path.basename(f).split(seperator, 1)[0]
la.append(preprocess(f))
lb.append(app)
a=np.array(la)
b=np.array(lb)
I have training data with 2 dimension. (200 results of 4 features)
I proved 100 different applications with 10 repetition resulting 1000 csv files.
I want to stack each csv results for machine learning.
But I don’t know how.
each of my csv files look like below.
test1.csv to numpy array data
[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]]
I tried below python code.
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
cnt=0
for f in csv_files:
cnt +=1
seperator = '_'
app = os.path.basename(f).split(seperator, 1)[0]
if cnt==1:
a = np.array(preprocess(f))
b = np.array(app)
else:
a = np.vstack((a, np.array(preprocess(f))))
b = np.append(b,app)
print(a)
print(b)
preprocess function returns df.to_numpy results for each csv files.
My expectation was like below. a(1000, 200, 4)
[[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]],
[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]],
...
[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]]]
However, I’m getting this. a(200000, 4)
[[0 'crc32_pclmul' 445 0]
[0 'crc32_pclmul' 270 4096]
[0 'crc32_pclmul' 234 8192]
...
[249 'intel_pmt' 272 4096]
[249 'intel_pmt' 224 8192]
[249 'intel_pmt' 268 12288]]
I want to access each csv results using a[0] to a[1000] each sub-array looks like (200,4)
How can I solve the problem? I’m quite lost
make a new list and append each to that new list after reading.
(make new list outside the loop)
You have to change from vstack
to stack
la=[]
lb=[]
for f in csv_files:
cnt +=1
seperator = '_'
app = os.path.basename(f).split(seperator, 1)[0]
la.append(preprocess(f))
lb.append(app)
a=np.stack(la, axis=0)
b=np.array(lb)
vstack
can stack along rows only but stack
function can stack along a new axis.
Well, yes, that is what vstack
(and append
) does. It merges things on the same axis (rows axis).
a1=np.arange(10).reshape(2,5)
# [[0,1,2,3,4],
# [5,6,7,8,9]]
a2=np.arange(10,20).reshape(2,5)
# [[10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]])
np.vstack((a1,a2))
# [[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]])
b1=np.arange(5)
b2=np.arange(5,10)
np.append(b1,b2)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
If you expect (from those examples), to append along a new axis, then you need to add it, or to use more flexible stack
.
np.vstack(([a1],[a2]))
#array([[[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9]],
#
# [[10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]]])
Or, in the case of 1d, use vstack
instead of append
np.vstack((b1,b2))
#array([[0, 1, 2, 3, 4],
# [5, 6, 7, 8, 9]])
But more importantly, you shouldn’t be doing this in the first place inside a loop. Each of those functions (stack
, vstack
, append
) recreates a new array.
It would be probably more efficient to just append all your np.array(preprocess(f))
and b = np.array(app)
to a pure python list, and call stack
and vstack
only once you’ve read them all.
Or, even better, just append directly the preprocess(f)
and the app
inside python list. And call np.array
only after the loop, and the whole thing.
So, something like
la=[]
lb=[]
for f in csv_files:
cnt +=1
seperator = '_'
app = os.path.basename(f).split(seperator, 1)[0]
la.append(preprocess(f))
lb.append(app)
a=np.array(la)
b=np.array(lb)