I am trying to create new files and store data from JSON line format depending on specific value
Question:
I have JSON line format file that looks like this:
{"reviewerID": "A11N155CW1UV02", "asin": "B000H00VBQ", "reviewerName": "AdrianaM", "helpful": [0, 0], "reviewText": "I had big expectations because I love English TV, in particular Investigative and detective stuff but this guy is really boring. It didn't appeal to me at all.", "overall": 2.0, "summary": "A little bit boring for me", "unixReviewTime": 1399075200, "reviewTime": "05 3, 2014"}
{"reviewerID": "A3BC8O2KCL29V2", "asin": "B000H00VBQ", "reviewerName": "Carol T", "helpful": [0, 0], "reviewText": "I highly recommend this series. It is a must for anyone who is yearning to watch "grown up" television. Complex characters and plots to keep one totally involved. Thank you Amazin Prime.", "overall": 5.0, "summary": "Excellent Grown Up TV", "unixReviewTime": 1346630400, "reviewTime": "09 3, 2012"}
{"reviewerID": "A60D5HQFOTSOM", "asin": "B000H00VBQ", "reviewerName": "Daniel Cooper "dancoopermedia"", "helpful": [0, 1], "reviewText": "This one is a real snoozer. Don't believe anything you read or hear, it's awful. I had no idea what the title means. Neither will you.", "overall": 1.0, "summary": "Way too boring for me", "unixReviewTime": 1381881600, "reviewTime": "10 16, 2013"}
I need to extract data from reviewTexts for each overall rating (e.g., overall rating: 1.0)
I have this:
f1 = open("ranking1.txt", "a")
f2 = open("ranking2.txt", "a")
f3 = open("ranking3.txt", "a")
f4 = open("ranking4.txt", "a")
f5 = open("ranking5.txt", "a")
with open("/content/Amazon_Instant_Video_5.json") as json_file:
for line in json_file: #runs the loop to extract info
data = json.loads(line)
if data['overall'] == 1:
f1.write(data['reviewText'] + 'n')
if data['overall'] ==2:
f2.write(data['reviewText'] + 'n')
if data['overall'] == 3:
f3.write(data['reviewText'] + 'n')
if data['overall'] ==4:
f4.write(data['reviewText'] + 'n')
if data['overall'] ==5:
f5.write(data['reviewText'] + 'n')
The code works but I don’t think this is not the most optimal way to do that. What can I change to make it more efficient?
Answers:
Code can be simplified:
- Use
with
context manager for open/write file
- Only open file for append on a need-to basis
- Use dynamic filename (based on ranking) to determine which file to open
with open("/content/Amazon_Instant_Video_5.json") as json_file:
for line in json_file: # runs the loop to extract info
data = json.loads(line)
rank = int(data['overall'])
txt = data['reviewText']
with open(f'ranking{rank}.txt', 'a') as f:
f.write(txt + 'n')
For slight performance improvements, maybe you can get away with only one open
statement (in append or a
mode) for each file.
So a mapping of filename to file or IO
object could be a good idea:
import json
filename_to_file = {}
with open("/content/Amazon_Instant_Video_5.json") as json_file:
for line in json_file: # runs the loop to extract info
data = json.loads(line)
rank = int(data['overall'])
txt = data['reviewText']
filename = f'ranking{rank}.txt'
file = filename_to_file.get(filename)
if not file:
filename_to_file[filename] = file = open(filename, 'a')
file.write(txt + 'n')
# close all files opened in `append` mode
for file in filename_to_file.values():
file.close()
I have JSON line format file that looks like this:
{"reviewerID": "A11N155CW1UV02", "asin": "B000H00VBQ", "reviewerName": "AdrianaM", "helpful": [0, 0], "reviewText": "I had big expectations because I love English TV, in particular Investigative and detective stuff but this guy is really boring. It didn't appeal to me at all.", "overall": 2.0, "summary": "A little bit boring for me", "unixReviewTime": 1399075200, "reviewTime": "05 3, 2014"}
{"reviewerID": "A3BC8O2KCL29V2", "asin": "B000H00VBQ", "reviewerName": "Carol T", "helpful": [0, 0], "reviewText": "I highly recommend this series. It is a must for anyone who is yearning to watch "grown up" television. Complex characters and plots to keep one totally involved. Thank you Amazin Prime.", "overall": 5.0, "summary": "Excellent Grown Up TV", "unixReviewTime": 1346630400, "reviewTime": "09 3, 2012"}
{"reviewerID": "A60D5HQFOTSOM", "asin": "B000H00VBQ", "reviewerName": "Daniel Cooper "dancoopermedia"", "helpful": [0, 1], "reviewText": "This one is a real snoozer. Don't believe anything you read or hear, it's awful. I had no idea what the title means. Neither will you.", "overall": 1.0, "summary": "Way too boring for me", "unixReviewTime": 1381881600, "reviewTime": "10 16, 2013"}
I need to extract data from reviewTexts for each overall rating (e.g., overall rating: 1.0)
I have this:
f1 = open("ranking1.txt", "a")
f2 = open("ranking2.txt", "a")
f3 = open("ranking3.txt", "a")
f4 = open("ranking4.txt", "a")
f5 = open("ranking5.txt", "a")
with open("/content/Amazon_Instant_Video_5.json") as json_file:
for line in json_file: #runs the loop to extract info
data = json.loads(line)
if data['overall'] == 1:
f1.write(data['reviewText'] + 'n')
if data['overall'] ==2:
f2.write(data['reviewText'] + 'n')
if data['overall'] == 3:
f3.write(data['reviewText'] + 'n')
if data['overall'] ==4:
f4.write(data['reviewText'] + 'n')
if data['overall'] ==5:
f5.write(data['reviewText'] + 'n')
The code works but I don’t think this is not the most optimal way to do that. What can I change to make it more efficient?
Code can be simplified:
- Use
with
context manager for open/write file - Only open file for append on a need-to basis
- Use dynamic filename (based on ranking) to determine which file to open
with open("/content/Amazon_Instant_Video_5.json") as json_file:
for line in json_file: # runs the loop to extract info
data = json.loads(line)
rank = int(data['overall'])
txt = data['reviewText']
with open(f'ranking{rank}.txt', 'a') as f:
f.write(txt + 'n')
For slight performance improvements, maybe you can get away with only one open
statement (in append or a
mode) for each file.
So a mapping of filename to file or IO
object could be a good idea:
import json
filename_to_file = {}
with open("/content/Amazon_Instant_Video_5.json") as json_file:
for line in json_file: # runs the loop to extract info
data = json.loads(line)
rank = int(data['overall'])
txt = data['reviewText']
filename = f'ranking{rank}.txt'
file = filename_to_file.get(filename)
if not file:
filename_to_file[filename] = file = open(filename, 'a')
file.write(txt + 'n')
# close all files opened in `append` mode
for file in filename_to_file.values():
file.close()