Create txt file from Pandas Dataframe
Question:
I would like to save my dataframe in a way that matches an existing txt file (I have a trained model based on the this txt file and I now want to predict on new data, that needs to match this format).
The target txt file looks like this (first3 rows):
2 qid:0 0:0.4967141530112327 1:-0.1382643011711847 2:0.6476885381006925 3:1.523029856408025 4:-0.234153374723336
1 qid:2 0:1.465648768921554 1:-0.2257763004865357 2:0.06752820468792384 3:-1.424748186213457 4:-0.5443827245251827
2 qid:0 0:0.7384665799954104 1:0.1713682811899705 2:-0.1156482823882405 3:-0.3011036955892888 4:-1.478521990367427
First column is just a random integer (here the 2 and the 1)
The qid is always connected via colon to an integer.
Then an integer is followed by a float, for the rest of the columns.
My dataframe looks like this:
data = {'label': [2,3,2],
'qid': ['qid:0', 'qid:1','qid:0'],
'0': [0.4967, 0.4967,0.4967],
'1': [0.4967, 0.4967,0.4967],
'2': [0.4967, 0.4967,0.4967],
'3': [0.4967, 0.4967,0.4967],
'4': [0.4967, 0.4967,0.4967]}
df = pd.DataFrame(data)
Answers:
try this and let us know if it works for you case
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]
Updated code very messy if this solves your problem then it can be updated to handle large amount of data using numpy array methods
for i in list(data.keys()):
if i=="label" or i=="qid":
pass
else:
data[i]=[str(i)+":"+str(j) for j in list(data[i])]
Since your data appears to be structured, you can process it manually:
data = []
with open('file.txt') as fp:
for row in fp:
arg0, *args = row.strip().split()
d = {'rand': arg0}
d.update(dict([arg.split(':') for arg in args]))
data.append(d)
# You can use .apply(pd.to_numeric) if all of your columns are numeric
df = pd.DataFrame(data).apply(pd.to_numeric)
Output:
>>> df
rand qid 0 1 2 3 4
0 2 0 0.496714 -0.138264 0.647689 1.523030 -0.234153
1 1 2 1.465649 -0.225776 0.067528 -1.424748 -0.544383
2 2 0 0.738467 0.171368 -0.115648 -0.301104 -1.478522
I would like to save my dataframe in a way that matches an existing txt file (I have a trained model based on the this txt file and I now want to predict on new data, that needs to match this format).
The target txt file looks like this (first3 rows):
2 qid:0 0:0.4967141530112327 1:-0.1382643011711847 2:0.6476885381006925 3:1.523029856408025 4:-0.234153374723336
1 qid:2 0:1.465648768921554 1:-0.2257763004865357 2:0.06752820468792384 3:-1.424748186213457 4:-0.5443827245251827
2 qid:0 0:0.7384665799954104 1:0.1713682811899705 2:-0.1156482823882405 3:-0.3011036955892888 4:-1.478521990367427
First column is just a random integer (here the 2 and the 1)
The qid is always connected via colon to an integer.
Then an integer is followed by a float, for the rest of the columns.
My dataframe looks like this:
data = {'label': [2,3,2],
'qid': ['qid:0', 'qid:1','qid:0'],
'0': [0.4967, 0.4967,0.4967],
'1': [0.4967, 0.4967,0.4967],
'2': [0.4967, 0.4967,0.4967],
'3': [0.4967, 0.4967,0.4967],
'4': [0.4967, 0.4967,0.4967]}
df = pd.DataFrame(data)
try this and let us know if it works for you case
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]
Updated code very messy if this solves your problem then it can be updated to handle large amount of data using numpy array methods
for i in list(data.keys()):
if i=="label" or i=="qid":
pass
else:
data[i]=[str(i)+":"+str(j) for j in list(data[i])]
Since your data appears to be structured, you can process it manually:
data = []
with open('file.txt') as fp:
for row in fp:
arg0, *args = row.strip().split()
d = {'rand': arg0}
d.update(dict([arg.split(':') for arg in args]))
data.append(d)
# You can use .apply(pd.to_numeric) if all of your columns are numeric
df = pd.DataFrame(data).apply(pd.to_numeric)
Output:
>>> df
rand qid 0 1 2 3 4
0 2 0 0.496714 -0.138264 0.647689 1.523030 -0.234153
1 1 2 1.465649 -0.225776 0.067528 -1.424748 -0.544383
2 2 0 0.738467 0.171368 -0.115648 -0.301104 -1.478522