Hoc to return number of films for each note interval ? (Hadoop MapReduce Python 3)
Question:
I am a novice on Hadoop MapReduce and I am trying to return the number of films for each rating interval from this dataset : "title.ratings.tsv" dataset. I have an issue : I am not able to show the intervals. Here is my code :
`
from mrjob.job import MRJob
from mrjob.step import MRStep
class MovieCountPerRating(MRJob):
def steps(self):
return [
MRStep(mapper=self.mapper_get_ratings,
reducer=self.reducer_count_ratings)
]
# Mapper
def mapper_get_ratings(self, _, line):
(tconst, averageRating, numVotes ) = line.split('t')
yield averageRating, 1
# Reducer
def reducer_count_ratings(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MovieCountPerRating.run()
`
Do you have an idea to how I can return the number of movies for each interval of ratings ([0,1], ]1,2], ]2,3], ]3,4], ]4,5], ]5,6], ]6,7], ]7,8], ]8,9], ]9,10]) ?
I codes a map and a reduce function that work well on Ubuntu but I can’t seem to find a way to show the intervals of ratings
Answers:
In order to show the intervals for the ratings, you can simply use the round function in Python to round each rating to the nearest interval. For example, if a rating is 4.6, it will be rounded to 4.5 and will fall under the interval ]4,5]. Here is an example of how you can modify your code to show the intervals:
from mrjob.job import MRJob
from mrjob.step import MRStep
class MovieCountPerRating(MRJob):
def steps(self):
return [
MRStep(mapper=self.mapper_get_ratings,
reducer=self.reducer_count_ratings)
]
# Mapper
def mapper_get_ratings(self, _, line):
(tconst, averageRating, numVotes ) = line.split('t')
# Round the rating to the nearest interval
rating_interval = round(float(averageRating) * 2) / 2
yield rating_interval, 1
# Reducer
def reducer_count_ratings(self, key, values):
yield key, sum(values)
if name == ‘main‘:
MovieCountPerRating.run()
This code will round each rating to the nearest interval and then emit the interval as the key and 1 as the value. In the reducer, the values for each interval will be summed to give the total number of movies for each interval.
I am a novice on Hadoop MapReduce and I am trying to return the number of films for each rating interval from this dataset : "title.ratings.tsv" dataset. I have an issue : I am not able to show the intervals. Here is my code :
`
from mrjob.job import MRJob
from mrjob.step import MRStep
class MovieCountPerRating(MRJob):
def steps(self):
return [
MRStep(mapper=self.mapper_get_ratings,
reducer=self.reducer_count_ratings)
]
# Mapper
def mapper_get_ratings(self, _, line):
(tconst, averageRating, numVotes ) = line.split('t')
yield averageRating, 1
# Reducer
def reducer_count_ratings(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MovieCountPerRating.run()
`
Do you have an idea to how I can return the number of movies for each interval of ratings ([0,1], ]1,2], ]2,3], ]3,4], ]4,5], ]5,6], ]6,7], ]7,8], ]8,9], ]9,10]) ?
I codes a map and a reduce function that work well on Ubuntu but I can’t seem to find a way to show the intervals of ratings
In order to show the intervals for the ratings, you can simply use the round function in Python to round each rating to the nearest interval. For example, if a rating is 4.6, it will be rounded to 4.5 and will fall under the interval ]4,5]. Here is an example of how you can modify your code to show the intervals:
from mrjob.job import MRJob
from mrjob.step import MRStep
class MovieCountPerRating(MRJob):
def steps(self):
return [
MRStep(mapper=self.mapper_get_ratings,
reducer=self.reducer_count_ratings)
]
# Mapper
def mapper_get_ratings(self, _, line):
(tconst, averageRating, numVotes ) = line.split('t')
# Round the rating to the nearest interval
rating_interval = round(float(averageRating) * 2) / 2
yield rating_interval, 1
# Reducer
def reducer_count_ratings(self, key, values):
yield key, sum(values)
if name == ‘main‘:
MovieCountPerRating.run()
This code will round each rating to the nearest interval and then emit the interval as the key and 1 as the value. In the reducer, the values for each interval will be summed to give the total number of movies for each interval.