Regular expression for stock tickers – Python

Question:

I have a list of tweets. They look like this:

data = [['trading $aa $BB stock market info'],
        ['$aa is $116 market is doing well $cc $ABC']]

I want to extract stock tickers:

['$aa', '$BB']
['$aa', '$cc', '$ABC']]

I have tried this:

for i in data:
    print re.findall(r'[$]S*', str(i))

And, the output contains $116 as well:

['$aa', '$BB']
['$aa', '$116', '$cc', '$ABC']]

Any suggestions?

Asked By: kevin

||

Answers:

Match the dollar sign, one letter, and then anything that’s not a space:

re.findall(r'[$][A-Za-z][S]*', str(i))
Answered By: Harald Nordgren

The package reticker does this by creating a custom regular expression as per its configuration. It uses the created pattern to extract tickers from text. Alternatively, the returned pattern can be used independently.

>>> import reticker

>>> extractor = reticker.TickerExtractor()
>>> type(extractor.pattern)
<class 're.Pattern'>

>>> reticker.TickerExtractor().extract("Comparing FNGU vs $WEBL vs SOXL- who wins? And what about $cldl vs $Skyu? BTW, will the $w+Z pair still grow? IMHO, SOXL is king! [V]isa is A-okay!")
["FNGU", "WEBL", "SOXL", "CLDL", "SKYU", "W", "Z", "V", "A"]

>>> reticker.TickerExtractor().extract("Which of BTC-USD, $ETH-USD and $ada-usd is best?nWhat about $Brk.a and $Brk.B? Compare futures MGC=F and SIL=F.")
['BTC-USD', 'ETH-USD', 'ADA-USD', 'BRK.A', 'BRK.B', 'MGC=F', 'SIL=F']
Answered By: Asclepius

I’ll just leave this here for people looking for a regex that matches a stock ticker

re.fullmatch('([A-Za-z]{1,5})(-[A-Za-z]{1,2})?', symbol)
Answered By: Tom Sawyer
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.