Is there a function to write certain values of a dataframe to a .txt file in Python?
Question:
I have a dataframe as follows:
Index A B C D E F
1 0 0 C 0 E 0
2 A 0 0 0 0 F
3 0 0 0 0 E 0
4 0 0 C D 0 0
5 A B 0 0 0 0
Basically I would like to write the dataframe to a txt file, such that every row consists of the index and the subsequent column name only, excluding the zeroes.
For example:
txt file
1 C E
2 A F
3 E
4 C D
5 A B
The dataset is quite big, about 1k rows, 16k columns. Is there any way I can do this using a function in Pandas?
Answers:
Take a matrix vector multiplication between the boolean matrix generated by "is this entry "0"
or not" and the columns of the dataframe, and write it to a text file with to_csv
(thanks to @Andreas’ answer!):
df.ne("0").dot(df.columns + " ").str.rstrip().to_csv("text_file.txt")
where we right strip the spaces at the end due to the added " "
to the last entries.
If you don’t want the name Index
appearing in the text file, you can chain a rename_axis(index=None)
to get rid of it i.e.,
df.ne("0").dot(df.columns + " ").str.rstrip().rename_axis(index=None)
and then to_csv
as above.
You can try this (replace ‘0’ with 0 if that are numeric 0 instead of string 0):
# Credits to Pygirl, made the code even better.
df.set_index('Index', inplace=True)
df = df.replace('0',np.nan)
df.stack().groupby(level=0).apply(list)
# Out[79]:
# variable
# 0 [C, E]
# 1 [A, F]
# 2 [E]
# 3 [C, D]
# 4 [A, B]
# Name: value, dtype: object
For the writing to text, you can use pandas as well:
df.to_csv('your_text_file.txt')
You could replace string ‘0’ with empty string ''
, then so some string-list-join manipulation to get the final results. Finally append each line into a text file. See code:
df = pd.DataFrame([
['0','0','C','0','E','0'],
['A','0','0','0','0','F'],
['0','0','0','0','E','0'],
['0','0','C','D','0','0'],
['A','B','0','0','0','0']], columns=['A','B','C','D','E','F']
)
df = df.replace('0', '')
logfile = open('test.txt', 'a')
for i in range(len(df)):
temp = ''.join(list(df.loc[i,:]))
logfile.write(str(i+1) + ' ' + ' '.join(list(temp)) + 'n')
logfile.close()
Output test.txt
1 C E
2 A F
3 E
4 C D
5 A B
I have a dataframe as follows:
Index A B C D E F
1 0 0 C 0 E 0
2 A 0 0 0 0 F
3 0 0 0 0 E 0
4 0 0 C D 0 0
5 A B 0 0 0 0
Basically I would like to write the dataframe to a txt file, such that every row consists of the index and the subsequent column name only, excluding the zeroes.
For example:
txt file
1 C E
2 A F
3 E
4 C D
5 A B
The dataset is quite big, about 1k rows, 16k columns. Is there any way I can do this using a function in Pandas?
Take a matrix vector multiplication between the boolean matrix generated by "is this entry "0"
or not" and the columns of the dataframe, and write it to a text file with to_csv
(thanks to @Andreas’ answer!):
df.ne("0").dot(df.columns + " ").str.rstrip().to_csv("text_file.txt")
where we right strip the spaces at the end due to the added " "
to the last entries.
If you don’t want the name Index
appearing in the text file, you can chain a rename_axis(index=None)
to get rid of it i.e.,
df.ne("0").dot(df.columns + " ").str.rstrip().rename_axis(index=None)
and then to_csv
as above.
You can try this (replace ‘0’ with 0 if that are numeric 0 instead of string 0):
# Credits to Pygirl, made the code even better.
df.set_index('Index', inplace=True)
df = df.replace('0',np.nan)
df.stack().groupby(level=0).apply(list)
# Out[79]:
# variable
# 0 [C, E]
# 1 [A, F]
# 2 [E]
# 3 [C, D]
# 4 [A, B]
# Name: value, dtype: object
For the writing to text, you can use pandas as well:
df.to_csv('your_text_file.txt')
You could replace string ‘0’ with empty string ''
, then so some string-list-join manipulation to get the final results. Finally append each line into a text file. See code:
df = pd.DataFrame([
['0','0','C','0','E','0'],
['A','0','0','0','0','F'],
['0','0','0','0','E','0'],
['0','0','C','D','0','0'],
['A','B','0','0','0','0']], columns=['A','B','C','D','E','F']
)
df = df.replace('0', '')
logfile = open('test.txt', 'a')
for i in range(len(df)):
temp = ''.join(list(df.loc[i,:]))
logfile.write(str(i+1) + ' ' + ' '.join(list(temp)) + 'n')
logfile.close()
Output test.txt
1 C E
2 A F
3 E
4 C D
5 A B