How do you pull specific information out of a text file? Python

Question:

Here is an example of some of the information in the text file:

Ticker : Ticker representing the company | Company: Name | Title: Position of trader | Trade Type: Buy or sell | Value: Monetary value
Ticker :  AKUS | Company: Akouos, Inc. | Title: 10% | Trade Type: P - Purchase | Value: +$374,908,350
Ticker :  HHC | Company: Howard Hughes Corp | Title: Dir, 10% | Trade Type: P - Purchase | Value: +$109,214,243

Where each time it says ticker, it’s a new line. Is there a way to pull out specific information and set it to a dictionary? For example, would I be able to get a dictionary filled with all the tickers, all the positions and all of the monetary values?

Asked By: Daniel

||

Answers:

The best way I can think of is to import into a dataframe (df),
and then convert to a dictionary (if that is what you really want).

Firstly import the data into a pandas dataframe:

import pandas as pd

filename = 'file1.txt'

df = pd.read_csv(filename,
                 sep = ':s+|s|',
                 engine='python',
                 usecols=[1,3,5,7,9]
                 )
df.columns = ['Ticker', 'Company', 'Title', 'Trade Type', 'Value']

print(df)

This is the dataframe:

Dataframe

You can then convert this into a dictionary using the following code:

data_dictionary = df.to_dict()
print(data_dictionary)

OUTPUT:

{'Ticker': {0: 'AKUS', 1: 'HHC'}, 'Company': {0: 'Akouos, Inc.', 1: 'Howard Hughes Corp'}, 'Title': {0: '10%', 1: 'Dir, 10%'}, 'Trade Type': {0: 'P - Purchase', 1: 'P - Purchase'}, 'Value': {0: '+$374,908,350', 1: '+$109,214,243'}}

Dict Usage

Should you actually want to use the dictionary for retrieval (instead of the dataframe), and if you would want to search based on ticker symbols. Then this is how I would approach it:

Search for Value of ticker symbol 'AKUS'

tickers = {v:k for k,v in data_dictionary.get('Ticker').items()}

print('AKUS Value:', data_dictionary['Value'][tickers.get('AKUS')])

Output:

AKUS Value: +$374,908,350
Answered By: ScottC

My suggestion would be that the dictionary should be keyed on ticker name. Each value for the ticker is itself a dictionary which makes access to the data very easy. Something like this:

ticker = {}

with open('ticker.txt') as tdata:
    next(tdata) # skip first line
    for row in tdata:
        if columns := row.split('|'):
            _, t = columns[0].split(':')
            ticker[t.strip()] = {k.strip(): v.strip() for k, v in [column.split(':') for column in columns[1:]]}

print(ticker)

Output:

{'AKUS': {'Company': 'Akouos, Inc.', 'Title': '10%', 'Trade Type': 'P - Purchase', 'Value': '+$374,908,350'}, 'HHC': {'Company': 'Howard Hughes Corp', 'Title': 'Dir, 10%', 'Trade Type': 'P - Purchase', 'Value': '+$109,214,243'}}

Usage:

For example, to get the value associated with HHC then it’s:

ticker['HHC']['Value']
Answered By: Cobra
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.