I have a text file with key values separated with comma. I want to use those key values to extract a complete row of each key in another .csv file

Question:

I want to use the key values basically person ID’s present in file1.txt to extract a complete row of information for each of those values in file2.csv and store it in file3.csv with rows indicating the ID’s and columns indicating the information like age, height, weight etc.

I tried the following and it gives an error:

import pandas as pd
df1 = pd.read_csv("file1.txt", sep = ',', header = None)
df2 = pd.read_csv("file2.csv")
wanted_ids = df1[]
wanted_rows = df2.loc[wanted_ids,:]
wanted_rows.to_csv("file3.csv", header = False)

line 4
    wanted_ids = df1[]
                     ^
SyntaxError: invalid syntax

file1.txt does not have any row,column name or header. There are thousands of key values enclosed in a single square bracket separated with comma like this:
[25536,17381,384973,2783249,36323….n]

I also want to retain all the column names in file2.csv in file3.

Asked By: Shourya

||

Answers:

Let me try to understand:

 .txt has only 1 ID column
 csv2 has more columns , but not ID column
 You want to merge them into a csv3

As you describe .txt, it seems it desn’t have any info in common, so the only way this can be able is if that txt follows the same order as your .csv rows, in that case, index must be the same


Reading your comments I understand .txt is IDs that are on csv2 also so

In that case you can open .txt with a text reader and select all (CTR+E) and Copy.

#You can create a df for your.txt as you have [1,2,345,2345,22341,..] 

data={'ID' : [1,2,345,2345,22341,..]} # you paste on this line
df_text=pd.DataFrame(data)
df_text.head()

Output:
     ID
0    1
1    2
2    345
3    2345
n    ...

# Now you can set ID in both as Index (also in the only column df) to concat

# for one colum df
df_text.set_index('ID',inplace= True, drop= True)

#For the other csv 
df_csv=pd.read_csv('/ / /csv2.csv')
df_csv.set_index('ID', inplace=True)



# And now they have index by IDs in both, you can create a third 
# DataFrame with only those IDs(index) in both 

result = pd.concat([df_text, df_csv], axis=1, join="inner")

# And finally, you can convert again ID from index to a row
result= result.reset_index(level=0)


Output:

     ID  a   b   c   d
0    1   4   6   7   3
1    2   5   9   0   0 
..   ..  ..  ..  ..  ..
17   345 5   2   8   7 

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Answered By: SERGIO
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.