How can I select the minimum y-value for x-y pairs?

Question:

I have the following two arrays:

x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]

The x-array features duplicates of the same value. If I were to plot these values as x,y pairs, the graph would look like columns of dots.

I want to make a script that chooses the lowest y-value for a given x-value, joins that y-value with its associated x-value, and then plots that pair. The other y-values for each value of x are ignored.

I’ve tried using nested for loops, slicing, if/else statements, etc., but I can’t make anything work. The most common result I get is the lowest y-value of the entire array joined with all the x-values.

Asked By: cat_herder

||

Answers:

This might work for you:

from itertools import groupby
from operator import itemgetter

x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
pairs = [(x, min(map(itemgetter(1), y))) for x, y in groupby(zip(x, y), itemgetter(0))]

Answered By: Simon Ward-Jones

In linear time, O(n), I think.

x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]

lowestDict = dict()
for i in range(len(x)):
  currentY = lowestDict.get(x[i], None)
  if currentY is None or y[i] < currentY:
    lowestDict[x[i]] = y[i]

pairs = list(d.items())
print(pairs)
Answered By: Chris

You can create a dict with float('inf') on x, then use zip, and save min value of y for each key. At the end, convert dict to list of tuples.

x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]

# create a 'dict' and use values of 'x' as 'key'
dct = {i: float('inf') for i in set(x)}

# find min value for each 'key' base each pair in (x, y)
for i, j in zip(x, y):
    dct[i] = min(dct[i], j)

# convert 'dct' to 'list' of 'tuple's
res = list(dct.items())
print(res)

Output:

[(10, 0.183), (20, 0.169)]
Answered By: I'mahdi

As demonstrated by the other answers on this page, it’s possible to solve this using only Python’s built-in libraries, albeit verbosely.

However, if you’ll be doing this sort of thing often — or with large datasets — it may be worth learning to use the pandas package, which is loaded with convenient tools for common data manipulation tasks like this.

Once your data is loaded into a DataFrame, your answer can be computed in one line:

import pandas as pd

x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]

df = pd.DataFrame({'x': x, 'y': y})
mins = df.groupby('x')['y'].min()

print(mins.index.tolist())
print(mins.values.tolist())
[10, 20]
[0.183, 0.169]

Or here’s another option:

import pandas as pd

x = [10, 10, 20, 20, 20]
y = [0.194, 0.183, 0.202, 0.169, 0.417]
z = [1, 2, 0, 1, 2]

df = pd.DataFrame({'x': x, 'y': y, 'z': z})
filtered = df.sort_values(['x', 'y']).drop_duplicates('x')
print(filtered)
    x      y  z
1  10  0.183  2
3  20  0.169  1
Answered By: Stuart Berg
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.