How can I select the minimum y-value for x-y pairs?
Question:
I have the following two arrays:
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
The x-array features duplicates of the same value. If I were to plot these values as x,y pairs, the graph would look like columns of dots.
I want to make a script that chooses the lowest y-value for a given x-value, joins that y-value with its associated x-value, and then plots that pair. The other y-values for each value of x are ignored.
I’ve tried using nested for loops, slicing, if/else statements, etc., but I can’t make anything work. The most common result I get is the lowest y-value of the entire array joined with all the x-values.
Answers:
This might work for you:
from itertools import groupby
from operator import itemgetter
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
pairs = [(x, min(map(itemgetter(1), y))) for x, y in groupby(zip(x, y), itemgetter(0))]
In linear time, O(n), I think.
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
lowestDict = dict()
for i in range(len(x)):
currentY = lowestDict.get(x[i], None)
if currentY is None or y[i] < currentY:
lowestDict[x[i]] = y[i]
pairs = list(d.items())
print(pairs)
You can create a dict
with float('inf')
on x
, then use zip
, and save min
value of y
for each key
. At the end, convert dict
to list
of tuple
s.
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
# create a 'dict' and use values of 'x' as 'key'
dct = {i: float('inf') for i in set(x)}
# find min value for each 'key' base each pair in (x, y)
for i, j in zip(x, y):
dct[i] = min(dct[i], j)
# convert 'dct' to 'list' of 'tuple's
res = list(dct.items())
print(res)
Output:
[(10, 0.183), (20, 0.169)]
As demonstrated by the other answers on this page, it’s possible to solve this using only Python’s built-in libraries, albeit verbosely.
However, if you’ll be doing this sort of thing often — or with large datasets — it may be worth learning to use the pandas
package, which is loaded with convenient tools for common data manipulation tasks like this.
Once your data is loaded into a DataFrame
, your answer can be computed in one line:
import pandas as pd
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
df = pd.DataFrame({'x': x, 'y': y})
mins = df.groupby('x')['y'].min()
print(mins.index.tolist())
print(mins.values.tolist())
[10, 20]
[0.183, 0.169]
Or here’s another option:
import pandas as pd
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
z = [1, 2, 0, 1, 2]
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
filtered = df.sort_values(['x', 'y']).drop_duplicates('x')
print(filtered)
x y z
1 10 0.183 2
3 20 0.169 1
I have the following two arrays:
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
The x-array features duplicates of the same value. If I were to plot these values as x,y pairs, the graph would look like columns of dots.
I want to make a script that chooses the lowest y-value for a given x-value, joins that y-value with its associated x-value, and then plots that pair. The other y-values for each value of x are ignored.
I’ve tried using nested for loops, slicing, if/else statements, etc., but I can’t make anything work. The most common result I get is the lowest y-value of the entire array joined with all the x-values.
This might work for you:
from itertools import groupby
from operator import itemgetter
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
pairs = [(x, min(map(itemgetter(1), y))) for x, y in groupby(zip(x, y), itemgetter(0))]
In linear time, O(n), I think.
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
lowestDict = dict()
for i in range(len(x)):
currentY = lowestDict.get(x[i], None)
if currentY is None or y[i] < currentY:
lowestDict[x[i]] = y[i]
pairs = list(d.items())
print(pairs)
You can create a dict
with float('inf')
on x
, then use zip
, and save min
value of y
for each key
. At the end, convert dict
to list
of tuple
s.
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
# create a 'dict' and use values of 'x' as 'key'
dct = {i: float('inf') for i in set(x)}
# find min value for each 'key' base each pair in (x, y)
for i, j in zip(x, y):
dct[i] = min(dct[i], j)
# convert 'dct' to 'list' of 'tuple's
res = list(dct.items())
print(res)
Output:
[(10, 0.183), (20, 0.169)]
As demonstrated by the other answers on this page, it’s possible to solve this using only Python’s built-in libraries, albeit verbosely.
However, if you’ll be doing this sort of thing often — or with large datasets — it may be worth learning to use the pandas
package, which is loaded with convenient tools for common data manipulation tasks like this.
Once your data is loaded into a DataFrame
, your answer can be computed in one line:
import pandas as pd
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
df = pd.DataFrame({'x': x, 'y': y})
mins = df.groupby('x')['y'].min()
print(mins.index.tolist())
print(mins.values.tolist())
[10, 20]
[0.183, 0.169]
Or here’s another option:
import pandas as pd
x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
z = [1, 2, 0, 1, 2]
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
filtered = df.sort_values(['x', 'y']).drop_duplicates('x')
print(filtered)
x y z
1 10 0.183 2
3 20 0.169 1