ValueError: [E1041] Expected a string, Doc, or bytes as input, but got: <class 'pandas.core.series.Series'>

Question:

import pandas
df['findings'] = df['findings'].astype(str)
#df['findings'] = df['findings'].astype('string')
df["new_column"] = GPT2_model(df['findings'], min_length=60) 

After running this I get the following error, even after converting my dataframe to string.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-1225bf7a7a14> in <module>
----> 1 df["new_column"] = GPT2_model(df['findings'], min_length=60)

5 frames
/usr/local/lib/python3.7/dist-packages/spacy/language.py in _ensure_doc(self, doc_like)
  1106         if isinstance(doc_like, bytes):
  1107             return Doc(self.vocab).from_bytes(doc_like)
-> 1108         raise ValueError(Errors.E1041.format(type=type(doc_like)))
  1109 
  1110     def _ensure_doc_with_context(

ValueError: [E1041] Expected a string, Doc, or bytes as input, but got: <class 'pandas.core.series.Series'>
Asked By: Jacob

||

Answers:

Your method/model GPT2_model doesn’t take a Pandas Series object. That’s what the error is complaining about. You can instead apply the method to your findings column.

df['new_column'] = df['findings'].apply(GPT2_model, min_length=60)
Answered By: Abirbhav G.
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.