How to sort csv data by changing data value in rows to columns?

Question:

There are a total of 25 LOC_CODE, which means locations.
There are a total of 6 ITEM_CODE, which corresponds to CO2 level, CO level…etc.

The item_codes are: 1,3,5,6,8, and 9

The problem:
I want to sort this dataset and overwrite the same csv such that there are only 25 rows where each row is a unique location LOC_CODE.
And I want to display values of all six item_codes per location so it’s not one item_code per row like in the screenshot. Everything else stays the same. I just want to display values of all six ITEM_CODE for a unique location on a single row.

Asked By: confused_puggo

||

Answers:

This solution assumes that the response from the API is already saved into a CSV file in the format given in the first screenshot. I’m using csv.DictReader and csv.DictWriter from the csv module.

Before beginning, let’s just import csv using:

import csv

Let’s first create a function that’ll process the DATA_DT into a desirable format

def get_datetime(value: str):
    # returns year, month, day, time (hh:mm:ss), in that order
    # assumes string length is 14 and has format 'YYYYMMDDhhmmss'
    y, m, d = value[0:4], value[4:6], value[6:8]
    t = ':'.join([value[8:10], value[10:12], value[12:14]])
    return y, m, d, t

a dictionary for ITEM_CODE:

item_dict = {'1': 'SO2', '3': ...}  # please fill this yourself

and the headers list needed for the CSV DictWriter:

headers = ['Location', 'Year', 'Month', 'Day', 'Time (24h)', 'Station No.',
           'SO2', 'NO2', 'CO', 'O3', 'PM10', 'PM2.5', 'Meter Status']

We open the CSV file and read from it into a list raw_data (fill the filename, please). Each element of raw_data is a dict:

with open(r'filepathfilename.csv') as file:
    raw_data = list(csv.DictReader(file))

We now create an empty dict data, and then iterate over raw_data, processing its data and writing it to the dict (comments added at necessary places):

data = {}

for rec in raw_data:
    loc = rec['LOC_CODE']
    if loc not in data:
        data[loc] = dict.fromkeys(headers, '')
    
    # rec is from old data, record is for the new data
    record = data[loc]
    
    if not record['Year']:
        # assumed that date & time for a location is same for all ITEM_CODE
        (record['Year'],
        record['Month'],
        record['Day'],
        record['Time (24h)']
        ) = get_datetime(rec['DATA_DT'])
    
    record['Station No.'] = rec['DATA_STATE']
    record['Meter Status'] = rec['DATA_NOVER']
    # for the readings we get the apt key using item_dict
    record[item_dict[rec['ITEM_CODE']]] = rec['DATA_VALUE']

Finally, we arrange all the records in data into a list of dicts the way csv.DictWriter would expect and write it into the output CSV file (please fill in the filename yourself):

records = [{**v, 'Location': k} for k, v in data.items()]

with open(r'filepathnewfilename.csv', 'w') as file:
    writer = csv.DictWriter(file, fieldnames=headers, lineterminator='n')
    writer.writeheader()
    writer.writerows(records)

(All the ITEM_CODEs that do not have a value in your table will display an empty cell in the created CSV)


You must, of course, tune this code to your requirements – if you want it to not delete existing data from the CSV please change the mode from 'w' to 'a' or 'r+' and modify the data writing part of the code accordingly. And similarly, if you wanna sort the data by date, or whatever, descending, do the same before beginning.

Should I combine all the code into one or leave it to the reader, comment below… ;P

Answered By: a_n
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.