How to left join numpy array python
Question:
What’s the numpy "pythonic" way to left join arrays? Let’s say I have two 2-D arrays that share a key:
a.shape # (20, 2)
b.shape # (200, 3)
Both arrays share a common key in their first dimension:
a[:, 0] # values from 0..19
b[:, 0] # values from 0..19
How are I left join the values from a[:, 1] to b?
Answers:
It’s trivial with pandas
:
import numpy as np
import pandas as pd
a = np.array([[4, 9], [5, 8], [6, 7]])
b = np.array([[5, 4, 1], [4, 6, 8], [5, 4, 8], [3, 8, 9]])
dfa = pd.DataFrame(a).add_prefix('a')
dfb = pd.DataFrame(b).add_prefix('b')
out = dfa.merge(dfb, left_on='a0', right_on='b0', how='left')
Output:
>>> a
array([[4, 9],
[5, 8],
[6, 7]])
>>> b
array([[5, 4, 1],
[4, 6, 8],
[5, 4, 8],
[3, 8, 9]])
>>> out.values
array([[ 4., 9., 4., 6., 8.],
[ 5., 8., 5., 4., 1.],
[ 5., 8., 5., 4., 8.],
[ 6., 7., nan, nan, nan]])
It’s more complicated with numpy
to do a left join:
import numpy as np
a = np.array([[4, 9], [5, 8], [6, 7]])
b = np.array([[5, 4, 1], [4, 6, 8], [5, 4, 8], [3, 8, 9]])
i, j = np.where(a[:, 0, None] == b[:, 0])
k = np.setdiff1d(np.arange(len(a)), i)
c = np.pad(a[k], [(0, 0), (0, b.shape[1])], constant_values=-1)
out = np.vstack([np.hstack([a[i], b[j]]), c])
Output:
>>> out
array([[ 4, 9, 4, 6, 8],
[ 5, 8, 5, 4, 1],
[ 5, 8, 5, 4, 8],
[ 6, 7, -1, -1, -1]])
What’s the numpy "pythonic" way to left join arrays? Let’s say I have two 2-D arrays that share a key:
a.shape # (20, 2)
b.shape # (200, 3)
Both arrays share a common key in their first dimension:
a[:, 0] # values from 0..19
b[:, 0] # values from 0..19
How are I left join the values from a[:, 1] to b?
It’s trivial with pandas
:
import numpy as np
import pandas as pd
a = np.array([[4, 9], [5, 8], [6, 7]])
b = np.array([[5, 4, 1], [4, 6, 8], [5, 4, 8], [3, 8, 9]])
dfa = pd.DataFrame(a).add_prefix('a')
dfb = pd.DataFrame(b).add_prefix('b')
out = dfa.merge(dfb, left_on='a0', right_on='b0', how='left')
Output:
>>> a
array([[4, 9],
[5, 8],
[6, 7]])
>>> b
array([[5, 4, 1],
[4, 6, 8],
[5, 4, 8],
[3, 8, 9]])
>>> out.values
array([[ 4., 9., 4., 6., 8.],
[ 5., 8., 5., 4., 1.],
[ 5., 8., 5., 4., 8.],
[ 6., 7., nan, nan, nan]])
It’s more complicated with numpy
to do a left join:
import numpy as np
a = np.array([[4, 9], [5, 8], [6, 7]])
b = np.array([[5, 4, 1], [4, 6, 8], [5, 4, 8], [3, 8, 9]])
i, j = np.where(a[:, 0, None] == b[:, 0])
k = np.setdiff1d(np.arange(len(a)), i)
c = np.pad(a[k], [(0, 0), (0, b.shape[1])], constant_values=-1)
out = np.vstack([np.hstack([a[i], b[j]]), c])
Output:
>>> out
array([[ 4, 9, 4, 6, 8],
[ 5, 8, 5, 4, 1],
[ 5, 8, 5, 4, 8],
[ 6, 7, -1, -1, -1]])