How to store data: list in dict vs dict in list vs dataframe

Question:

I have data in python where each row basically looks something like this:

{'touchdowns': 3, 'sport': 'Football', 'team: 'Cincinnati Bengals'}

I’m wondering whether the best way to store the data is with a list inside a dictionary, like this:

{
    "points": [16, 2, 104],
    "sport": ["Football", "Baseball", "Basketball"],
    "team": ["Cincinnati Bengals", "New York Yankees", "LA Lakers"]
}

Or with a dictionary inside a list, like this:

[
    {"points": 16, "sport": "Football", "team": "Cincinnati Bengals"},
    {"points": 2, "sport": "Baseball", "team": "New York Yankees"},
    {"points": 104, "sport": "Basketball", "team": "LA Lakers"}
]

A third option would be a dataframe in Pandas.

I’m going to use the data in something like this:

data = # either the list, dict, or df
new_data = get_new_data(x,y,z)
for row in new_data:
    if row in data:
        data.append(row)
        # do other stuff to the row
    else:
        pass

So, what i’m trying to do is:

  1. Get a new row of data
  2. Check if the data is already in the dataset
  3. If it is, do nothing
  4. If it isn’t, add it to the dataset and do other stuff to the row

Thanks for any and all help in advance!

Asked By: ejn

||

Answers:

There are two common practices for this.

If you do not need to quickly reference the data by an ID or team name the list of objects will work fine.

You can use

if my_dict in my_list:
 …

If you have a unique Id, for example teamName you can have a dictionary which has keys of team name and values of dict.

Eg:


my_dict= {
 “bengals”: {
   "points": 16,
   "sport": "Football",
   "team": "Cincinnati Bengals"
  },
 “yankees”: {
   "points": 2,
   "sport": "Baseball",
   "team": "New York Yankees"
  },
  “lakers”:{
   "points": 104,
   "sport": "Basketball",
   "team": "LA Lakers"
  }
}

if “lakers” not in my_dict.keys():
  …

Just make sure that your keys are unique.

Answered By: Rafael Zasas

It really depends on what you’re planning to do with the data.

If you’re only ever going to retrieve individual triples by the combination of all the inner values, and you don’t intend to change the data after its added to the collection, then I’d consider using a set of tuples (or NamedTuples) for your repository, eg:

data = {(16, "Football", "Cincinnati Bengals"), 
        (2, "Baseball", "New York Yankees"), 
        (104, "Basketball", "LA Lakers")}
Answered By: hhimko