How can I concat multiple dataframes in Python?
Question:
I have multiple (more than 100) dataframes. How can I concat all of them?
The problem is, that I have too many dataframes, that I can not write them manually in a list, like this:
>>> cluster_1 = pd.DataFrame([['a', 1], ['b', 2]],
... columns=['letter ', 'number'])
>>> cluster_1
letter number
0 a 1
1 b 2
>>> cluster_2 = pd.DataFrame([['c', 3], ['d', 4]],
... columns=['letter', 'number'])
>>> cluster_2
letter number
0 c 3
1 d 4
>>> pd.concat([cluster_1, cluster_2])
letter number
0 a 1
1 b 2
0 c 3
1 d 4
The names of my N dataframes are cluster_1, cluster_2, cluster_3,…, cluster_N. The number N can be very high.
How can I concat N dataframes?
Answers:
I think you can just put it into a list, and then concat the list. In Pandas, the chunk function kind of already does this. I personally do this when using the chunk function in pandas.
pdList = [df1, df2, ...] # List of your dataframes
new_df = pd.concat(pdList)
To create the pdList automatically assuming your dfs always start with “cluster”.
pdList = []
pdList.extend(value for name, value in locals().items() if name.startswith('cluster_'))
Use:
pd.concat(your list of column names)
And if want regular index:
pd.concat(your list of column names,ignore_index=True)
Generally it goes like:
frames = [df1, df2, df3]
result = pd.concat(frames)
Note: It will reset the index automatically.
Read more details on different types of merging here.
For a large number of data frames:
If you have hundreds of data frames, depending one if you have in on disk or in memory you can still create a list (“frames” in the code snippet) using a for a loop. If you have it in the disk, it can be easily done just saving all the df’s in one single folder then reading all the files from that folder.
If you are generating the df’s in memory, maybe try saving it in .pkl
first.
I have multiple (more than 100) dataframes. How can I concat all of them?
The problem is, that I have too many dataframes, that I can not write them manually in a list, like this:
>>> cluster_1 = pd.DataFrame([['a', 1], ['b', 2]],
... columns=['letter ', 'number'])
>>> cluster_1
letter number
0 a 1
1 b 2
>>> cluster_2 = pd.DataFrame([['c', 3], ['d', 4]],
... columns=['letter', 'number'])
>>> cluster_2
letter number
0 c 3
1 d 4
>>> pd.concat([cluster_1, cluster_2])
letter number
0 a 1
1 b 2
0 c 3
1 d 4
The names of my N dataframes are cluster_1, cluster_2, cluster_3,…, cluster_N. The number N can be very high.
How can I concat N dataframes?
I think you can just put it into a list, and then concat the list. In Pandas, the chunk function kind of already does this. I personally do this when using the chunk function in pandas.
pdList = [df1, df2, ...] # List of your dataframes
new_df = pd.concat(pdList)
To create the pdList automatically assuming your dfs always start with “cluster”.
pdList = []
pdList.extend(value for name, value in locals().items() if name.startswith('cluster_'))
Use:
pd.concat(your list of column names)
And if want regular index:
pd.concat(your list of column names,ignore_index=True)
Generally it goes like:
frames = [df1, df2, df3]
result = pd.concat(frames)
Note: It will reset the index automatically.
Read more details on different types of merging here.
For a large number of data frames:
If you have hundreds of data frames, depending one if you have in on disk or in memory you can still create a list (“frames” in the code snippet) using a for a loop. If you have it in the disk, it can be easily done just saving all the df’s in one single folder then reading all the files from that folder.
If you are generating the df’s in memory, maybe try saving it in .pkl
first.