create a data frame using lists
Question:
I have following code that I read data from csv file and then manipulate some of that data and I want to put the modified data into a data frame. I was thinking to add it in dictionary but I am not sure how I should really do this.
I use the following CSV data file as data source:
"17",20.2147418139502,20,20,20.8652568117822
"6",19.9412500131875,13,19,20.4982216893409
"4",16.3402085164562,6,18,16.729284141648
"11",15.9562389152125,11,17,16.4769352577916
"19",13.2889788383618,12,16,13.8285694613856
"15",11.7133173411712,1,15,11.7133173411712
I ignore the first column from csv data set.
After recalculation, my data looks like this (see the code below to understand the data manipulation):
15.9562389152125 11 12.0 16.4769352577916 16.4958295382
13.2889788383618 12 13.0 13.8285694613856 13.8459505145
11.7133173411712 1 2.0 11.7133173411712 11.863832339
9.68207331560552 14 15.0 10.2551373334446 10.2701189451
9.56895540188998 19 20.0 10.2083322023664 10.2194703997
7.30124705657363 2 3.0 7.45176205440562 7.53980768393
6.83169608190656 5 6.0 7.18118108407457 7.2207717071
6.40446470770985 4 5.0 6.70549470337383 6.75394970988
Then I reverse sort by column 4, column 3, and column 5.
Then I add a rank column at the end based on column 5 as shown below.
15.9562389152125 11 12.0 16.4769352577916 16.4958295382 1
13.2889788383618 12 13.0 13.8285694613856 13.8459505145 2
11.7133173411712 1 2.0 11.7133173411712 11.863832339 3
9.68207331560552 14 15.0 10.2551373334446 10.2701189451 4
9.56895540188998 19 20.0 10.2083322023664 10.2194703997 5
7.30124705657363 2 3.0 7.45176205440562 7.53980768393 6
6.83169608190656 5 6.0 7.18118108407457 7.2207717071 7
6.40446470770985 4 5.0 6.70549470337383 6.75394970988 8
I am not sure what sort of data structure to use in order to achieve this.
I have tried the following code:
def increaseQuantityByOne(self, fileLocation):
rows = csv.reader(open(fileLocation))
rows.next()
print "PricePercentaget" + "OldQuantityt" + "newQuantityt" + "oldCompScoret" + "newCompScore"
priceCompValue = []
priceCompRank = []
newPriceCompValue = []
newPriceCompRank = []
for row in rows:
newQuantity = float(row[2]) + 1.0
newCompetitiveScore = float(row[1]) + float(math.log(float(newQuantity), 100))
print row[1] + "t", str(row[2])+"t", str(newQuantity) + "t", str(row[4]) + "t", newCompetitiveScore
priceCompValue.append(float(row[4]))
priceCompRank.append(int(row[3]))
newPriceCompValue.append(newCompetitiveScore)
priceCompValue.sort(reverse=True)
priceCompRank.sort(reverse=True)
newPriceCompValue.sort(reverse=True)
for item in priceCompValue:
print item
for item in priceCompRank:
print item
for item in newPriceCompValue:
print item
Answers:
Not sure I understand what you want to do, but I use python numpy’s for all my table and sort type stuff.
Link
http://docs.scipy.org/doc/numpy/reference/generated/numpy.lexsort.html#numpy.lexsort
Kegan
Assuming your data is in a csv file (called data.csv
) in the same directory as this script:
from pprint import pprint
import math
import csv
# function to massage each row into desired values
def calc_new_vals(row):
newQuantity = float(row[2]) + 1.0
newCompetitiveScore = float(row[1]) + math.log(newQuantity, 100)
return [ float(row[1]),
float(row[2]),
newQuantity,
float(row[4]),
newCompetitiveScore ]
# read data from file and recalculate each row
f = open('./data.csv', 'r')
reader = csv.reader(f)
records = [ calc_new_vals(record) for record in reader ]
# This sorts by the three columns in reverse
# see this page for more: http://wiki.python.org/moin/HowTo/Sorting/
records = sorted(records, key=lambda record: record[3], reverse = True)
records = sorted(records, key=lambda record: record[2], reverse = True)
records = sorted(records, key=lambda record: record[4], reverse = True)
new_records = []
rank = 1
for row in records:
row.append( rank )
new_records.append( row )
rank += 1
pprint(new_records)
This produces a list of lists thus:
[[20.2147418139502, 20.0, 21.0, 20.8652568117822, 20.87585146131716, 1],
[19.9412500131875, 13.0, 14.0, 20.4982216893409, 20.51431403102662, 2],
[16.3402085164562, 6.0, 7.0, 16.729284141648, 16.76275753646333, 3],
[15.9562389152125, 11.0, 12.0, 16.4769352577916, 16.49582953823631, 4],
[13.2889788383618, 12.0, 13.0, 13.8285694613856, 13.845950514515218, 5],
[11.7133173411712, 1.0, 2.0, 11.7133173411712, 11.86383233900319, 6]]
I hope this gets you started.
I have following code that I read data from csv file and then manipulate some of that data and I want to put the modified data into a data frame. I was thinking to add it in dictionary but I am not sure how I should really do this.
I use the following CSV data file as data source:
"17",20.2147418139502,20,20,20.8652568117822
"6",19.9412500131875,13,19,20.4982216893409
"4",16.3402085164562,6,18,16.729284141648
"11",15.9562389152125,11,17,16.4769352577916
"19",13.2889788383618,12,16,13.8285694613856
"15",11.7133173411712,1,15,11.7133173411712
I ignore the first column from csv data set.
After recalculation, my data looks like this (see the code below to understand the data manipulation):
15.9562389152125 11 12.0 16.4769352577916 16.4958295382
13.2889788383618 12 13.0 13.8285694613856 13.8459505145
11.7133173411712 1 2.0 11.7133173411712 11.863832339
9.68207331560552 14 15.0 10.2551373334446 10.2701189451
9.56895540188998 19 20.0 10.2083322023664 10.2194703997
7.30124705657363 2 3.0 7.45176205440562 7.53980768393
6.83169608190656 5 6.0 7.18118108407457 7.2207717071
6.40446470770985 4 5.0 6.70549470337383 6.75394970988
Then I reverse sort by column 4, column 3, and column 5.
Then I add a rank column at the end based on column 5 as shown below.
15.9562389152125 11 12.0 16.4769352577916 16.4958295382 1
13.2889788383618 12 13.0 13.8285694613856 13.8459505145 2
11.7133173411712 1 2.0 11.7133173411712 11.863832339 3
9.68207331560552 14 15.0 10.2551373334446 10.2701189451 4
9.56895540188998 19 20.0 10.2083322023664 10.2194703997 5
7.30124705657363 2 3.0 7.45176205440562 7.53980768393 6
6.83169608190656 5 6.0 7.18118108407457 7.2207717071 7
6.40446470770985 4 5.0 6.70549470337383 6.75394970988 8
I am not sure what sort of data structure to use in order to achieve this.
I have tried the following code:
def increaseQuantityByOne(self, fileLocation):
rows = csv.reader(open(fileLocation))
rows.next()
print "PricePercentaget" + "OldQuantityt" + "newQuantityt" + "oldCompScoret" + "newCompScore"
priceCompValue = []
priceCompRank = []
newPriceCompValue = []
newPriceCompRank = []
for row in rows:
newQuantity = float(row[2]) + 1.0
newCompetitiveScore = float(row[1]) + float(math.log(float(newQuantity), 100))
print row[1] + "t", str(row[2])+"t", str(newQuantity) + "t", str(row[4]) + "t", newCompetitiveScore
priceCompValue.append(float(row[4]))
priceCompRank.append(int(row[3]))
newPriceCompValue.append(newCompetitiveScore)
priceCompValue.sort(reverse=True)
priceCompRank.sort(reverse=True)
newPriceCompValue.sort(reverse=True)
for item in priceCompValue:
print item
for item in priceCompRank:
print item
for item in newPriceCompValue:
print item
Not sure I understand what you want to do, but I use python numpy’s for all my table and sort type stuff.
Link
http://docs.scipy.org/doc/numpy/reference/generated/numpy.lexsort.html#numpy.lexsort
Kegan
Assuming your data is in a csv file (called data.csv
) in the same directory as this script:
from pprint import pprint
import math
import csv
# function to massage each row into desired values
def calc_new_vals(row):
newQuantity = float(row[2]) + 1.0
newCompetitiveScore = float(row[1]) + math.log(newQuantity, 100)
return [ float(row[1]),
float(row[2]),
newQuantity,
float(row[4]),
newCompetitiveScore ]
# read data from file and recalculate each row
f = open('./data.csv', 'r')
reader = csv.reader(f)
records = [ calc_new_vals(record) for record in reader ]
# This sorts by the three columns in reverse
# see this page for more: http://wiki.python.org/moin/HowTo/Sorting/
records = sorted(records, key=lambda record: record[3], reverse = True)
records = sorted(records, key=lambda record: record[2], reverse = True)
records = sorted(records, key=lambda record: record[4], reverse = True)
new_records = []
rank = 1
for row in records:
row.append( rank )
new_records.append( row )
rank += 1
pprint(new_records)
This produces a list of lists thus:
[[20.2147418139502, 20.0, 21.0, 20.8652568117822, 20.87585146131716, 1],
[19.9412500131875, 13.0, 14.0, 20.4982216893409, 20.51431403102662, 2],
[16.3402085164562, 6.0, 7.0, 16.729284141648, 16.76275753646333, 3],
[15.9562389152125, 11.0, 12.0, 16.4769352577916, 16.49582953823631, 4],
[13.2889788383618, 12.0, 13.0, 13.8285694613856, 13.845950514515218, 5],
[11.7133173411712, 1.0, 2.0, 11.7133173411712, 11.86383233900319, 6]]
I hope this gets you started.