Data Frame, how to save data extracted with webscraping?

Question:

I have a spreadsheet where columns "A", "B" and "C" have data, I need to save the webscraping data in column "D".

Currently, it deletes the contents of all columns and saves the information in column "B", this is my problem, as it should save in column "D" and keep the data from the other columns

I tried the command below and I had no success, because it just created a column named "c".
pd.DataFrame(data, columns=["c"])

enter image description here

just below the command I use to save all my webscraping data

data = {'Video': links} 
df = pd.DataFrame(data)
df.to_excel(r"C:__Imagens e Planilhas PythonFacebookYoutube.xlsx", engine='xlsxwriter')   
print(df)
Asked By: Felipe

||

Answers:

You should have included what the data in "Youtube.xlsx" and data look like. The answer below is suggested with the assumption that they’re the same length and that "Youtube.xlsx" has no index column and exactly 3 columns so any added column will be the 4th by default.

I don’t know what’s in "Youtube.xlsx" or in data, but the way it’s coded df will have only one column (Video), and .to_excel uses write mode by default, so

Currently, it deletes the contents of all columns and saves the information in column "B"

[I expect it uses column the first column as index, so Video ends up as the second column.] If you don’t want to write over the previous contents, the usual approach is to append with mode='a'; but that will append rows, and [afaik] there is no way to append columns directly to a file.


You could read the file into a DataFrame, then add the column and save.

filepath = r"C:__Imagens e Planilhas PythonFacebookYoutube.xlsx"
df = df.read_excel(filepath) #, header=None) # if there is no header row
df['Video'] = links
df.to_excel(filepath, engine='xlsxwriter') #, index=False)

[Use a different column name if the file already had a Video column that you want to preserve.]

Answered By: Driftr95
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.