How to groupby a column which contains a list

Question:

The following code takes the average of the sentiment scores for all news headlines collected during each date and plots it on a bar chart. My issue is that I have a list in the ‘tickers’ column and I don’t know how to deal with it since the code

This is the code:

plt.rcParams['figure.figsize'] = [10, 6]

# Group by date and ticker columns from sentiment_reddit_data and calculate the mean
mean_scores = sentiment_reddit_data.groupby(['tickers','time']).mean()

# Unstack the column ticker
mean_scores = mean_scores.unstack()

# Get the cross-section of compound in the 'columns' axis
mean_scores = mean_scores.xs('compound', axis="columns").transpose()

# Plot a bar chart with pandas
mean_scores.plot(kind = 'bar')
plt.grid()

Here is a piece of the dataset
https://filebin.net/inbvkjis2hk05crh

enter image description here

example of expected output
enter image description here

Data Sample:

,redditor,karma,time,comment,upvotes,gilded,interaction,rank,post_id,tickers,compound
10,deleted,deleted,2020-01-01 01:13:01,ohhh something like tegridy farm christmas snow dd,5,,0,11,eib3w8,['SNOW'],0.3612
29,deleted,deleted,2020-01-01 01:57:50,amzn 1950,2,,0,14,eibdob,['AMZN'],0.0
44,deleted,deleted,2020-01-01 00:36:55,according sub right nio acb call lol,2,,1,2,eibfcr,['NIO'],0.6486
47,deleted,deleted,2020-01-01 00:39:14,amzn,1,,0,5,eibfcr,['AMZN'],0.0
49,deleted,deleted,2020-01-01 00:47:04,mu call,1,,0,7,eibfcr,['MU'],0.3612
50,deleted,deleted,2020-01-02 10:50:41,im ball deep msft,1,,0,8,eibfcr,['MSFT'],0.0
52,deleted,deleted,2020-01-01 00:40:12,perfect got nio put feeling good wsb bullish,2,,0,10,eibfcr,['NIO'],0.7964
67,deleted,deleted,2020-01-01 03:45:08,one day 5 drop ppl gonna get wrecked tsla call tsla beta spy like 350,3,,0,13,eic7kt,"['TSLA', 'TSLA']",0.4404
68,deleted,deleted,2020-01-01 03:33:09,ba,1,,0,14,eic7kt,['BA'],0.0
101,deleted,deleted,2020-01-01 07:07:05,thousand island dressing russian dressing basically fat sugar dump mcd,1,,0,47,eic7kt,['MCD'],-0.3612
125,deleted,deleted,2020-01-01 01:51:01,nobody forced buy fds,2,,1,71,eic7kt,['FDS'],-0.128
155,deleted,deleted,2020-01-01 14:37:59,famed autist michael burry fund owns shit ton gme per last 13f blow good company,1,,1,11,eick65,['GME'],-0.1779
170,deleted,deleted,2020-01-01 04:29:49,would gme go whole business risk bankruptcy point,3,,4,26,eick65,['GME'],-0.2732
179,deleted,deleted,2020-01-01 05:00:35,nah theyre cutting cost firing employee left right news policy wont pay unemployment anything gme going next earnings,0,,0,35,eick65,['GME'],-0.1526
190,deleted,deleted,2020-01-01 19:59:49,obviously common equity last line august 2019 gme 424mm cash equivalent 122mm receivables 950mm inventory 140mm prepaid expense add 1b real estate 22b liability youre left 810mm equity around 9sh tangible equity obviously cash equivalent assumed full carrying value prepaid expense since safe assumption business would overnight fire sale operation liquidating receivables probably sold quickly 80 dollar put discounted current asset excluding inventory 661mm real estate carried 1b seen fairly desirable fair assume liquidated somewhere 75 dollar 750800mm inventory discounted asset 14b 22b liability inventory need sold 80 dollar break even equity little high impossible since theyre mostly used game therefore carrying value already steeply discounted believe brand value left operating business could sold run year extra profit easily 45b sale 33 gross margin credible investment banker could fetch valuation equity midhundred million much higher gme could easily liquidated acquired around share trade today especially given recent ebitda breakeven cost cutting done competent manager could result annual ebitda excess 150200mm give pretty favorable evebitda acquirer bought roughly 710x shouldnt difficult find buyer,1,,1,46,eick65,"['GME', 'GME']",0.9772
209,deleted,deleted,2020-01-01 05:54:57,couldnt put fuckin pton,37,,1,7,eid2mc,['PTON'],0.2755
212,deleted,deleted,2020-01-01 09:06:55,fucking lost took retard strength 100 men also inglorious basterds fantastic movie christoph waltz amazing actor autists owe 100 tsla put scalp,15,,0,10,eid2mc,['TSLA'],0.4297
213,deleted,deleted,2020-01-01 03:49:46,hahahahah nio bit got also hurt cuz like 1000 share 16,11,,0,11,eid2mc,['NIO'],0.0772
214,deleted,deleted,2020-01-01 06:19:19,gold fuck,11,,1,12,eid2mc,['GOLD'],-0.5423
229,deleted,deleted,2020-01-01 13:19:35,still nio 40 call bought 30 nazi coming def feel like retard selling 48 1600 tho shitty 400 hoping new year pop otherwise im headed concentration camp tendies,1,,0,27,eid2mc,['NIO'],-0.4019
246,deleted,deleted,2020-01-01 11:08:42,real even sub arent making fun pton also wtf clb whatd miss e lmfao clb degenerate bio shit,31,,0,44,eid2mc,['PTON'],-0.8021
358,deleted,deleted,2020-01-01 05:47:59,,2,,1,17,eieo9n,['ARE'],0.0
370,deleted,deleted,2020-01-01 07:55:46,ulta call 359 earnings lost 3500 16 minute,24,,2,1,eiftuc,['ULTA'],0.0516
371,deleted,deleted,2020-01-01 09:39:15,bought fds blind man told,16,,1,2,eiftuc,['FDS'],-0.4019
372,deleted,deleted,2020-01-01 08:35:16,averaging amd 9 breaking even 12 selling 14 cry sleep ever since especially knowing chip market like 100 leap contract id bank rolling guh,12,,0,3,eiftuc,['AMD'],-0.4767
376,deleted,deleted,2020-01-01 08:31:18,aapl put guh,3,,0,7,eiftuc,['AAPL'],-0.3612
377,deleted,deleted,2020-01-01 15:14:49,selling shop ttd call trump announced bullshit knew going rise knew going rise stonks knew going rise sold missed run take loss lost 3000 last week december,2,,0,8,eiftuc,"['SHOP', 'TTD']",-0.8689
378,deleted,deleted,2020-01-01 20:00:38,wont really loss porn definitely fucking stupidest trade also literally happened yesterday sitting backseat uber driven guy claimed former fbi agent conceal carry permit uber shit got real comey shenanigan also sort stoned anyway told trump god gift america soon would see federal reserve scam enter depression followed return gold standard also told never give away gun realized probably right decided smart thing find high iv penny stock sell premium make free money still could decided queue order selling 3 itm cash covered put highest iv stock moment fcel say queued market close early new year eve right gonna queue could reevaluate trade thursday morning open sober could see pm price anyway order fill dont even realize morning short itm put penny stock went 50 day 200 2 day httpsimgurcomplzsolgjpg fuck yeah bring 2020,2,,0,9,eiftuc,"['UBER', 'UBER', 'PM']",-0.9092
379,deleted,deleted,2020-01-01 20:14:56,let tell time shorted tsla,2,,0,10,eiftuc,['TSLA'],0.0
381,deleted,deleted,2020-01-01 19:14:23,dumbest trade typically binary earnings play selling loss early turn gain worst trade buying spy dip early august basically buying power buy real dip end trade went back even expiry took loss selling loss early order get dip closer strike around time finally bought put day went back biggest regret buying put ulta even though remember checking expensive chain 355pm 1k nothing compared 50k,1,,0,12,eiftuc,['ULTA'],-0.9231
385,deleted,deleted,2020-01-01 20:31:12,sold 2000 share amd 15,1,,1,16,eiftuc,['AMD'],0.296
401,deleted,deleted,2020-01-01 19:39:27,dd become linking yahoo finance article without reading dont really follow azn used work mrk following 15 year approval work mrk year approval expected literally even say fucking article decision expected fda oncologic drug advisory committee recommended approval earlier month mrk killer two year wont make budge anymore usual wave market finally released yesterday 9 market open yesterday come bro least mark shitpost,0,,0,8,eifu11,"['AZN', 'MRK', 'MRK', 'MRK']",0.7003
417,deleted,deleted,2020-01-01 08:38:41,schwab merge td took td,13,,2,1,eig6iz,"['TD', 'TD']",0.0
421,deleted,deleted,2020-01-01 20:24:07,hilarious meme ipo wework peloton lyft beyond chewy etc,1,,0,5,eig6iz,['LYFT'],0.4019
431,deleted,deleted,2020-01-01 22:30:17,230 minute delay fetching comment messaging 6 year 20260101 220611 utchttpwwwwolframalphacominputi202601012022061120utc20to20local20time remind linkhttpsnpredditcomrwallstreetbetscommentseig6iz2019inreviewfcsgxgccontext3 click linkhttpsnpredditcommessagecomposetoremindmebotsubjectremindermessage5bhttps3a2f2fwwwredditcom2fr2fwallstreetbets2fcomments2feig6iz2f2019inreview2ffcsgxgc2f5d0a0aremindme21202026010120223a063a1120utc send pm also reminded reduce spam parent commenter delete message hide othershttpsnpredditcommessagecomposetoremindmebotsubjectdelete20commentmessagedelete2120eig6iz infohttpsnpredditcomrremindmebotcommentse1bko7remindmebotinfov21customhttpsnpredditcommessagecomposetoremindmebotsubjectremindermessage5blink20or20message20inside20square20brackets5d0a0aremindme2120time20period20hereyour remindershttpsnpredditcommessagecomposetoremindmebotsubjectlist20of20remindersmessagemyreminders21feedbackhttpsnpredditcommessagecomposetowatchful1subjectremindmebot20feedback,2,,0,15,eig6iz,['PM'],-0.6705
488,deleted,deleted,2020-01-01 17:04:50,cancelled disney today im sure new year resolution million others short dis,7,,2,54,eih9st,"['DIS', 'DIS']",-0.296
489,deleted,deleted,2020-01-01 19:18:28,biotech look jpm health conference runup,7,,1,55,eih9st,['JPM'],0.0
495,deleted,deleted,2020-01-01 16:15:13,seen fb meme said saddle boy bout play cowboy democrat virginia comment saying something along line legislation start civil war 1 fuck talking 2 lose money,6,,1,61,eih9st,['FB'],-0.8271
499,deleted,deleted,2020-01-01 18:16:26,ba gonna rally hard dude tax selling done,7,,2,65,eih9st,['BA'],-0.4404
514,deleted,deleted,2020-01-01 19:23:19,dude watching ben mallah real estate guy youtube fat italian fuck interesting guy man,4,,2,80,eih9st,['BEN'],-0.2023
519,deleted,deleted,2020-01-01 21:07:04,spce 2020s bynd discus,6,,0,85,eih9st,"['SPCE', 'BYND']",0.0
523,deleted,deleted,2020-01-01 15:32:37,nio moon,4,,1,89,eih9st,['NIO'],0.3612
526,deleted,deleted,2020-01-01 16:09:08,got couple feb 21 290 aapl put ama,5,,3,92,eih9st,['AAPL'],-0.3612
539,deleted,deleted,2020-01-01 15:51:56,serious question rather buying 10 atm contract 500 premium aapl would lucrative buy say 100 contract otm option 50 premium downside increased theta decay,3,,5,105,eih9st,['AAPL'],0.2732
545,deleted,deleted,2020-01-01 18:56:41,ba gonna fuk short eso hole,4,,0,111,eih9st,['BA'],-0.3612
549,deleted,deleted,2020-01-01 20:09:50,httpswwwreuterscomarticleussamsungelecplantsamsungelectronicschipoutputatsouthkoreaplantpartlyhaltedduetoshortblackoutiduskbn1z01k3il0httpswwwreuterscomarticleussamsungelecplantsamsungelectronicschipoutputatsouthkoreaplantpartlyhaltedduetoshortblackoutiduskbn1z01k3il0 good mu,3,,1,115,eih9st,['MU'],0.4404
556,deleted,deleted,2020-01-01 13:09:45,desperately need gap fill twtr successfully rolled back spy call 331 121 hoping good news acb 3 call nabbed 2 make money,2,,1,122,eih9st,['TWTR'],0.8834
570,deleted,deleted,2020-01-01 20:08:40,aapl 400 500 eoy,2,,1,136,eih9st,['AAPL'],0.0
575,deleted,deleted,2020-01-01 14:07:17,someone please explain iron butterfly example onehttpsimgurcomgalleryywgio8s 221 iron butterfly tsla selling atm 420 put call covering 5 wide get 480 premium 500 collateral though enter lower limit price order fill likely max loss really 20 retarded ideal situation stay around 420 right,1,,4,141,eih9st,['TSLA'],-0.6387
578,deleted,deleted,2020-01-01 20:05:58,buy ba dip,1,,1,144,eih9st,['BA'],0.3612
580,deleted,deleted,2020-01-01 20:26:11,su bae 2020 buy amd bitch,1,,0,146,eih9st,['SU'],-0.3182
582,deleted,deleted,2020-01-01 20:32:35,would let john stamos make love significant exchange amd call,1,,2,148,eih9st,['AMD'],0.8176
629,deleted,deleted,2020-01-01 14:50:24,feeling meet daughter discovered time travel show picture buying pton call,3,,0,195,eih9st,['PTON'],0.6705
690,deleted,deleted,2020-01-01 18:53:56,next selling ba buy stock actually go,7,,0,256,eih9st,['BA'],0.0
Asked By: Damien Borowski

||

Answers:

  • 'tickers' is a column of str type, not list type, so they can be converted to list type, by using ast.literal_eval with the converters parameter.
  • The values in the lists in the 'tickers' column can be removed from the lists, by using the .explode method.
  • In order to properly .groupby the date, the 'time' column must be converted to a datetime dtype.
  • Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1
import pandas as pd
from ast import literal_eval
from datetime import date

# create dataframe
file = 'Example.csv'
df = pd.read_csv(file, converters={'tickers': literal_eval})

# convert the time column to a datetime dtype
df.time = pd.to_datetime(df.time)

# remove values from lists in the tickers column
df = df.explode('tickers', ignore_index=True)

# display(df.head())
redditor   karma                time                                            comment  upvotes gilded  interaction  rank post_id tickers  compound
 deleted deleted 2020-01-01 01:13:01 ohhh something like tegridy farm christmas snow dd        5    NaN            0    11  eib3w8    SNOW    0.3612
 deleted deleted 2020-01-01 01:57:50                                          amzn 1950        2    NaN            0    14  eibdob    AMZN    0.0000
 deleted deleted 2020-01-01 00:36:55               according sub right nio acb call lol        2    NaN            1     2  eibfcr     NIO    0.6486
 deleted deleted 2020-01-01 00:39:14                                               amzn        1    NaN            0     5  eibfcr    AMZN    0.0000
 deleted deleted 2020-01-01 00:47:04                                            mu call        1    NaN            0     7  eibfcr      MU    0.3612

# groupby date and ticker columns from sentiment_reddit_data and calculate the mean
mean_scores = df.groupby(['tickers', df.time.dt.date]).mean()

# Unstack the column ticker
mean_scores = mean_scores.unstack()

# Get the cross-section of compound in the 'columns' axis
mean_scores = mean_scores.xs('compound', axis="columns").transpose()
# select some columns and rows to plot
selected = mean_scores.loc[date(2020, 1, 1):date(2020, 2, 1), ['AAPL', 'GOOG', 'TSLA']]

# Plot a bar chart with pandas made using the full downloaded data, not just the sample data in the OP
ax = selected.plot.barh(figsize=(7, 15))
ax.set_ylabel('Date')
ax.set_xlabel('Compound')
ax.grid()

enter image description here

Answered By: Trenton McKinney
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.