How to sort csv data by changing data value in rows to columns?
Question:
There are a total of 25 LOC_CODE
, which means locations.
There are a total of 6 ITEM_CODE
, which corresponds to CO2 level, CO level…etc.
The item_codes are: 1,3,5,6,8, and 9
The problem:
I want to sort this dataset and overwrite the same csv such that there are only 25 rows where each row is a unique location LOC_CODE
.
And I want to display values of all six item_codes per location so it’s not one item_code
per row like in the screenshot. Everything else stays the same. I just want to display values of all six ITEM_CODE
for a unique location on a single row.
Answers:
This solution assumes that the response from the API is already saved into a CSV file in the format given in the first screenshot. I’m using csv.DictReader
and csv.DictWriter
from the csv
module.
Before beginning, let’s just import csv
using:
import csv
Let’s first create a function that’ll process the DATA_DT
into a desirable format
def get_datetime(value: str):
# returns year, month, day, time (hh:mm:ss), in that order
# assumes string length is 14 and has format 'YYYYMMDDhhmmss'
y, m, d = value[0:4], value[4:6], value[6:8]
t = ':'.join([value[8:10], value[10:12], value[12:14]])
return y, m, d, t
a dictionary for ITEM_CODE
:
item_dict = {'1': 'SO2', '3': ...} # please fill this yourself
and the headers list needed for the CSV DictWriter:
headers = ['Location', 'Year', 'Month', 'Day', 'Time (24h)', 'Station No.',
'SO2', 'NO2', 'CO', 'O3', 'PM10', 'PM2.5', 'Meter Status']
We open the CSV file and read from it into a list raw_data
(fill the filename, please). Each element of raw_data
is a dict:
with open(r'filepathfilename.csv') as file:
raw_data = list(csv.DictReader(file))
We now create an empty dict data
, and then iterate over raw_data
, processing its data and writing it to the dict (comments added at necessary places):
data = {}
for rec in raw_data:
loc = rec['LOC_CODE']
if loc not in data:
data[loc] = dict.fromkeys(headers, '')
# rec is from old data, record is for the new data
record = data[loc]
if not record['Year']:
# assumed that date & time for a location is same for all ITEM_CODE
(record['Year'],
record['Month'],
record['Day'],
record['Time (24h)']
) = get_datetime(rec['DATA_DT'])
record['Station No.'] = rec['DATA_STATE']
record['Meter Status'] = rec['DATA_NOVER']
# for the readings we get the apt key using item_dict
record[item_dict[rec['ITEM_CODE']]] = rec['DATA_VALUE']
Finally, we arrange all the records in data
into a list of dict
s the way csv.DictWriter
would expect and write it into the output CSV file (please fill in the filename yourself):
records = [{**v, 'Location': k} for k, v in data.items()]
with open(r'filepathnewfilename.csv', 'w') as file:
writer = csv.DictWriter(file, fieldnames=headers, lineterminator='n')
writer.writeheader()
writer.writerows(records)
(All the ITEM_CODE
s that do not have a value in your table will display an empty cell in the created CSV)
You must, of course, tune this code to your requirements – if you want it to not delete existing data from the CSV please change the mode from 'w'
to 'a'
or 'r+'
and modify the data writing part of the code accordingly. And similarly, if you wanna sort the data by date, or whatever, descending, do the same before beginning.
Should I combine all the code into one or leave it to the reader, comment below… ;P
There are a total of 25 LOC_CODE
, which means locations.
There are a total of 6 ITEM_CODE
, which corresponds to CO2 level, CO level…etc.
The item_codes are: 1,3,5,6,8, and 9
The problem:
I want to sort this dataset and overwrite the same csv such that there are only 25 rows where each row is a unique location LOC_CODE
.
And I want to display values of all six item_codes per location so it’s not one item_code
per row like in the screenshot. Everything else stays the same. I just want to display values of all six ITEM_CODE
for a unique location on a single row.
This solution assumes that the response from the API is already saved into a CSV file in the format given in the first screenshot. I’m using csv.DictReader
and csv.DictWriter
from the csv
module.
Before beginning, let’s just import csv
using:
import csv
Let’s first create a function that’ll process the DATA_DT
into a desirable format
def get_datetime(value: str):
# returns year, month, day, time (hh:mm:ss), in that order
# assumes string length is 14 and has format 'YYYYMMDDhhmmss'
y, m, d = value[0:4], value[4:6], value[6:8]
t = ':'.join([value[8:10], value[10:12], value[12:14]])
return y, m, d, t
a dictionary for ITEM_CODE
:
item_dict = {'1': 'SO2', '3': ...} # please fill this yourself
and the headers list needed for the CSV DictWriter:
headers = ['Location', 'Year', 'Month', 'Day', 'Time (24h)', 'Station No.',
'SO2', 'NO2', 'CO', 'O3', 'PM10', 'PM2.5', 'Meter Status']
We open the CSV file and read from it into a list raw_data
(fill the filename, please). Each element of raw_data
is a dict:
with open(r'filepathfilename.csv') as file:
raw_data = list(csv.DictReader(file))
We now create an empty dict data
, and then iterate over raw_data
, processing its data and writing it to the dict (comments added at necessary places):
data = {}
for rec in raw_data:
loc = rec['LOC_CODE']
if loc not in data:
data[loc] = dict.fromkeys(headers, '')
# rec is from old data, record is for the new data
record = data[loc]
if not record['Year']:
# assumed that date & time for a location is same for all ITEM_CODE
(record['Year'],
record['Month'],
record['Day'],
record['Time (24h)']
) = get_datetime(rec['DATA_DT'])
record['Station No.'] = rec['DATA_STATE']
record['Meter Status'] = rec['DATA_NOVER']
# for the readings we get the apt key using item_dict
record[item_dict[rec['ITEM_CODE']]] = rec['DATA_VALUE']
Finally, we arrange all the records in data
into a list of dict
s the way csv.DictWriter
would expect and write it into the output CSV file (please fill in the filename yourself):
records = [{**v, 'Location': k} for k, v in data.items()]
with open(r'filepathnewfilename.csv', 'w') as file:
writer = csv.DictWriter(file, fieldnames=headers, lineterminator='n')
writer.writeheader()
writer.writerows(records)
(All the ITEM_CODE
s that do not have a value in your table will display an empty cell in the created CSV)
You must, of course, tune this code to your requirements – if you want it to not delete existing data from the CSV please change the mode from 'w'
to 'a'
or 'r+'
and modify the data writing part of the code accordingly. And similarly, if you wanna sort the data by date, or whatever, descending, do the same before beginning.
Should I combine all the code into one or leave it to the reader, comment below… ;P