Reading Exponents of Scientific Notation

Question:

I’m trying to generate some summary data on a set, so I don’t care about the numbers themselves I only care about the exponents- the goal is to find the total number of 7-digit numbers (ex. phone numbers). The way I’m handling this currently is pretty simplistic

I have a data set in CSV, it looks something like this:

“1.108941100000000000e+07,
4.867837000000000000e+06,
… “

# numlist is the dataset

x = np.trunc(np.log10(numlist))    
total = (x == 6).sum()

And that gives me the number of 7 digit numbers. When I chose that approach I assumed the inputs would be a list of integers but now I am seeing the data could actually be given/stored in scientific notation. If it was given in scientific notation is there a faster way to achieve the same results? Is there a way that I can only load the exponents in from the csv file and skip the log10 behavior entirely?

Also, I’m not limited to using numpy arrays but after some experimentation they were the fastest implementation for my purpose.

Asked By: smpat04

||

Answers:

You may want to write a custom parser to use when reading the file rather than reading in all of the data just to toss it away later.

Count of exponents of size n

def count_exponents(path, n):
    n_str = 'e+0' + str(n)
    out = 0
    with open(path) as fp:
        for line in fp:
            out += line.count(n_str)
    return out

Return exponents

import re
pattern = re.compile('e([+-]d+)')

def get_exponents(path):
    with open(path) as fp:
        out = [pattern.findall(line) for line in fp]
    return out
Answered By: James