Create txt file from Pandas Dataframe

Question:

I would like to save my dataframe in a way that matches an existing txt file (I have a trained model based on the this txt file and I now want to predict on new data, that needs to match this format).

The target txt file looks like this (first3 rows):

2 qid:0 0:0.4967141530112327 1:-0.1382643011711847 2:0.6476885381006925 3:1.523029856408025 4:-0.234153374723336 
1 qid:2 0:1.465648768921554 1:-0.2257763004865357 2:0.06752820468792384 3:-1.424748186213457 4:-0.5443827245251827 
2 qid:0 0:0.7384665799954104 1:0.1713682811899705 2:-0.1156482823882405 3:-0.3011036955892888 4:-1.478521990367427 

First column is just a random integer (here the 2 and the 1)
The qid is always connected via colon to an integer.
Then an integer is followed by a float, for the rest of the columns.

My dataframe looks like this:

data = {'label': [2,3,2],
        'qid': ['qid:0', 'qid:1','qid:0'],
       '0': [0.4967, 0.4967,0.4967],
       '1': [0.4967, 0.4967,0.4967],
       '2': [0.4967, 0.4967,0.4967],
       '3': [0.4967, 0.4967,0.4967],
       '4': [0.4967, 0.4967,0.4967]}

df = pd.DataFrame(data)
Asked By: Tartaglia

||

Answers:

try this and let us know if it works for you case

data = pd.read_csv('output_list.txt', sep=" ", header=None)

data.columns = ["a", "b", "c", "etc."]

google colab pic

Updated code very messy if this solves your problem then it can be updated to handle large amount of data using numpy array methods

for i in list(data.keys()):
  if i=="label" or i=="qid":
    pass
  else:
    data[i]=[str(i)+":"+str(j) for j in list(data[i])]

enter image description here

Answered By: Somen Das

Since your data appears to be structured, you can process it manually:

data = []
with open('file.txt') as fp:
    for row in fp:
        arg0, *args = row.strip().split()
        d = {'rand': arg0}
        d.update(dict([arg.split(':') for arg in args]))
        data.append(d)

# You can use .apply(pd.to_numeric) if all of your columns are numeric
df = pd.DataFrame(data).apply(pd.to_numeric)

Output:

>>> df
   rand  qid         0         1         2         3         4
0     2    0  0.496714 -0.138264  0.647689  1.523030 -0.234153
1     1    2  1.465649 -0.225776  0.067528 -1.424748 -0.544383
2     2    0  0.738467  0.171368 -0.115648 -0.301104 -1.478522
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.