Python joining strings fails

Question:

I have an odd problem with my code and cannot find the mistake.

I am trying the reverse names in "SURNAME, First Names, Religious Title" format to "First Names Surname" and have written the following lines of code:

    try:
      for x in range(0, df_length):
          print(df_length - x)
          e_df=df.iloc[[x]].fillna("@") # virtual value to avoid issues with empty data frames

          # CLEAN PERSON NAMES

          pers_name=e_df['pers_name'].values[0]
          print(pers_name)
          if "," in pers_name:
            pers_new=pers_name.split(",")
            first_name=pers_new[1].strip()
            print(first_name)
            last_name=pers_new[0].title().strip() # change "all caps" to sentence case
            print(last_name)
            rel_title=pers_new[2].strip()
            name_list=(first_name, last_name)

            try:
              name_reversed=" ".join(name_list) # religious titles are being ignored
              print("This is the new name: ", name_reversed)
            except Exception as e:
              print(e)
          else:
            name_reversed=re.sub("ss+" , " ", pers_name)
          print(name_reversed)
          person_cleaned.append(name_reversed)
    
    except IndexError:
      print("No more names found.")

    # add column with cleaned person names

    print(person_cleaned)
    print(len(person_cleaned))
    df['pers_cleaned']=person_cleaned

    frame_list.append(df)

The input data look like this:

factoid_ID pers_ID pers_name alternative_name additional_info source event_type place_name pers_title pers_function event_date date_before_date event_after_date inst_name rel_pers source_quotations comment info_dump source_site
5637 OCR  DORSCH, Anton Joseph  
 am 16.1.1798 traf er in Aachen ein und wurde Direktorialkommissar für das Roer-Departement
 Praetorius, Professoren, S.139;
 Hans-Joachim Philipp, Anton Joseph Dorsch. Materialsammlung, -Manuskript im Universitätsarchiv Mainz-;
 Helmut Mathy, Anton Joseph Dorsch, 1758-1819, in: Mainzer Zeitschrift 62, 1967, S. 1-55 hier: * 13.6.1758;
 R. Schmitt, S. 56 Anm.101 hier: * 13.6.1758;
 RPh 294v, 296v,321r,321v, 322r,322v;
 Prot. phil. Fak. , S.109;
 Prot, theol. Fak.1779, Bl.81b;
 1780, B1.83a;
 RTh 68v, 71r;
 Hansen I, S.48 Nr.44 hier: * 13.6.1758;
 NDB 4, S.85 f.)
Funktionsausübung Aachen Direktor 1798-01-16
5638 OCR  DORSCH, Anton Joseph  
 1799 wurde er Mitglied der Freimaurer-Johannisloge in Aachen
 Praetorius, Professoren, S.139;
 Hans-Joachim Philipp, Anton Joseph Dorsch. Materialsammlung, -Manuskript im Universitätsarchiv Mainz-;
 Helmut Mathy, Anton Joseph Dorsch, 1758-1819, in: Mainzer Zeitschrift 62, 1967, S. 1-55 hier: * 13.6.1758;
 R. Schmitt, S. 56 Anm.101 hier: * 13.6.1758;
 RPh 294v, 296v,321r,321v, 322r,322v;
 Prot. phil. Fak. , S.109;
 Prot, theol. Fak.1779, Bl.81b;
 1780, B1.83a;
 RTh 68v, 71r;
 Hansen I, S.48 Nr.44 hier: * 13.6.1758;
 NDB 4, S.85 f.)
Funktionsausübung Aachen Mitglied 1799
2006 OCR  SCHEUICHAVIUS (Schevichavius), Gisbert, SJ  
 im Sommer 1600 ging er nach Aachen und im November 1600 nach Graz als Rektor des Jesuitenkollegs
 Nom. rev., S.8;
 Cat. Jes. 1597, S.l;
 De Back.-Som. VII, Sp.776;
 Duhr I, S. 417;
 Duhr 11,1, S. 337 Anm. 1;
 Verzeichnis theol. Fak., S.l)
Amtsantritt Aachen, Graz Rektor 1600 Jesuitenkolleg Erfurt, Jesuiten ###

The output I get is this:

7155
 DORSCH, Anton Joseph
Anton Joseph
Dorsch
No more names found.
[]

So I can successfully extract the name components I need and convert the all caps string in the surname to sentence case, but merging the strings to form a new name fails. And then the whole script breaks before moving on to the next name. Did I use the join() function in a wrong (deprecated?) manner?

Asked By: OnceUponATime

||

Answers:

What is failing in your code is this specific line:

rel_title=pers_new[2].strip()

The length of the pers_new is 2, in your example, you would have:

['DORSCH', ' Anton Joseph']

so when you write pers_new[2] gives IndexError exception. The code is not reaching the join().

I guess you forgot to remove that line. Because you are not using anywhere the rel_title variable.

Answered By: Asi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.