Pass the values from a dataset with indexes and values to a sparse Numpy array

Question:

I want to make a sparse numpy array using the indexes and values stored in a pandas DataSet

The dataset has ‘userIndex’, ‘movieIndex’ and ‘rating’ with a million rows

For example:

movieIndex userIndex rating
0 0 4 2.5
1 2 2 3.0
2 1 1 4.0
3 2 0 4.0
4 4 2 3.0

Would be transformed to a numpy array like this:

[[0 0 0 0 2.5],
[0 4.0 0 0 0],
[4.0 0 3.0 0 0],
[0 0 0 0 0],
[0 0 3.0 0 0]]

So, first I’m making a np.zeros array with the correct size:

Y = np.zeros([nm,nu])

And for now, I’m passing the information as:

for i in range(len(ratings)):
  Y[int(ratings.iloc[i].movieIndex),int(ratings.iloc[i].userIndex)]
    = ratings.iloc[i].rating

And it works just fine with O(n), so it’s not really bad but it takes 3 minutes to do so.
I know it’s not a good idea to use "for" in a dataset, and I should use the vector functions to do it, but I can’t find a way to make this work. Any ideas?

Asked By: Tadashi Mori

||

Answers:

Maybe it will work faster:

Y[ratings["movieIndex"].values, ratings["userIndex"].values] = ratings["rating"].values
Answered By: MaryRa
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.