Python joining strings fails
Question:
I have an odd problem with my code and cannot find the mistake.
I am trying the reverse names in "SURNAME, First Names, Religious Title" format to "First Names Surname" and have written the following lines of code:
try:
for x in range(0, df_length):
print(df_length - x)
e_df=df.iloc[[x]].fillna("@") # virtual value to avoid issues with empty data frames
# CLEAN PERSON NAMES
pers_name=e_df['pers_name'].values[0]
print(pers_name)
if "," in pers_name:
pers_new=pers_name.split(",")
first_name=pers_new[1].strip()
print(first_name)
last_name=pers_new[0].title().strip() # change "all caps" to sentence case
print(last_name)
rel_title=pers_new[2].strip()
name_list=(first_name, last_name)
try:
name_reversed=" ".join(name_list) # religious titles are being ignored
print("This is the new name: ", name_reversed)
except Exception as e:
print(e)
else:
name_reversed=re.sub("ss+" , " ", pers_name)
print(name_reversed)
person_cleaned.append(name_reversed)
except IndexError:
print("No more names found.")
# add column with cleaned person names
print(person_cleaned)
print(len(person_cleaned))
df['pers_cleaned']=person_cleaned
frame_list.append(df)
The input data look like this:
factoid_ID
pers_ID
pers_name
alternative_name
additional_info
source
event_type
place_name
pers_title
pers_function
event_date
date_before_date
event_after_date
inst_name
rel_pers
source_quotations
comment
info_dump
source_site
5637
OCR
DORSCH, Anton Joseph
am 16.1.1798 traf er in Aachen ein und wurde Direktorialkommissar für das Roer-Departement
Praetorius, Professoren, S.139;
Hans-Joachim Philipp, Anton Joseph Dorsch. Materialsammlung, -Manuskript im Universitätsarchiv Mainz-;
Helmut Mathy, Anton Joseph Dorsch, 1758-1819, in: Mainzer Zeitschrift 62, 1967, S. 1-55 hier: * 13.6.1758;
R. Schmitt, S. 56 Anm.101 hier: * 13.6.1758;
RPh 294v, 296v,321r,321v, 322r,322v;
Prot. phil. Fak. , S.109;
Prot, theol. Fak.1779, Bl.81b;
1780, B1.83a;
RTh 68v, 71r;
Hansen I, S.48 Nr.44 hier: * 13.6.1758;
NDB 4, S.85 f.)
Funktionsausübung
Aachen
Direktor
1798-01-16
5638
OCR
DORSCH, Anton Joseph
1799 wurde er Mitglied der Freimaurer-Johannisloge in Aachen
Praetorius, Professoren, S.139;
Hans-Joachim Philipp, Anton Joseph Dorsch. Materialsammlung, -Manuskript im Universitätsarchiv Mainz-;
Helmut Mathy, Anton Joseph Dorsch, 1758-1819, in: Mainzer Zeitschrift 62, 1967, S. 1-55 hier: * 13.6.1758;
R. Schmitt, S. 56 Anm.101 hier: * 13.6.1758;
RPh 294v, 296v,321r,321v, 322r,322v;
Prot. phil. Fak. , S.109;
Prot, theol. Fak.1779, Bl.81b;
1780, B1.83a;
RTh 68v, 71r;
Hansen I, S.48 Nr.44 hier: * 13.6.1758;
NDB 4, S.85 f.)
Funktionsausübung
Aachen
Mitglied
1799
2006
OCR
SCHEUICHAVIUS (Schevichavius), Gisbert, SJ
im Sommer 1600 ging er nach Aachen und im November 1600 nach Graz als Rektor des Jesuitenkollegs
Nom. rev., S.8;
Cat. Jes. 1597, S.l;
De Back.-Som. VII, Sp.776;
Duhr I, S. 417;
Duhr 11,1, S. 337 Anm. 1;
Verzeichnis theol. Fak., S.l)
Amtsantritt
Aachen, Graz
Rektor
1600
Jesuitenkolleg Erfurt, Jesuiten ###
The output I get is this:
7155
DORSCH, Anton Joseph
Anton Joseph
Dorsch
No more names found.
[]
So I can successfully extract the name components I need and convert the all caps string in the surname to sentence case, but merging the strings to form a new name fails. And then the whole script breaks before moving on to the next name. Did I use the join()
function in a wrong (deprecated?) manner?
Answers:
What is failing in your code is this specific line:
rel_title=pers_new[2].strip()
The length of the pers_new
is 2, in your example, you would have:
['DORSCH', ' Anton Joseph']
so when you write pers_new[2]
gives IndexError exception. The code is not reaching the join()
.
I guess you forgot to remove that line. Because you are not using anywhere the rel_title
variable.
I have an odd problem with my code and cannot find the mistake.
I am trying the reverse names in "SURNAME, First Names, Religious Title" format to "First Names Surname" and have written the following lines of code:
try:
for x in range(0, df_length):
print(df_length - x)
e_df=df.iloc[[x]].fillna("@") # virtual value to avoid issues with empty data frames
# CLEAN PERSON NAMES
pers_name=e_df['pers_name'].values[0]
print(pers_name)
if "," in pers_name:
pers_new=pers_name.split(",")
first_name=pers_new[1].strip()
print(first_name)
last_name=pers_new[0].title().strip() # change "all caps" to sentence case
print(last_name)
rel_title=pers_new[2].strip()
name_list=(first_name, last_name)
try:
name_reversed=" ".join(name_list) # religious titles are being ignored
print("This is the new name: ", name_reversed)
except Exception as e:
print(e)
else:
name_reversed=re.sub("ss+" , " ", pers_name)
print(name_reversed)
person_cleaned.append(name_reversed)
except IndexError:
print("No more names found.")
# add column with cleaned person names
print(person_cleaned)
print(len(person_cleaned))
df['pers_cleaned']=person_cleaned
frame_list.append(df)
The input data look like this:
factoid_ID | pers_ID | pers_name | alternative_name | additional_info | source | event_type | place_name | pers_title | pers_function | event_date | date_before_date | event_after_date | inst_name | rel_pers | source_quotations | comment | info_dump | source_site |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5637 | OCR | DORSCH, Anton Joseph | am 16.1.1798 traf er in Aachen ein und wurde Direktorialkommissar für das Roer-Departement |
Praetorius, Professoren, S.139; Hans-Joachim Philipp, Anton Joseph Dorsch. Materialsammlung, -Manuskript im Universitätsarchiv Mainz-; Helmut Mathy, Anton Joseph Dorsch, 1758-1819, in: Mainzer Zeitschrift 62, 1967, S. 1-55 hier: * 13.6.1758; R. Schmitt, S. 56 Anm.101 hier: * 13.6.1758; RPh 294v, 296v,321r,321v, 322r,322v; Prot. phil. Fak. , S.109; Prot, theol. Fak.1779, Bl.81b; 1780, B1.83a; RTh 68v, 71r; Hansen I, S.48 Nr.44 hier: * 13.6.1758; NDB 4, S.85 f.) |
Funktionsausübung | Aachen | Direktor | 1798-01-16 | ||||||||||
5638 | OCR | DORSCH, Anton Joseph | 1799 wurde er Mitglied der Freimaurer-Johannisloge in Aachen |
Praetorius, Professoren, S.139; Hans-Joachim Philipp, Anton Joseph Dorsch. Materialsammlung, -Manuskript im Universitätsarchiv Mainz-; Helmut Mathy, Anton Joseph Dorsch, 1758-1819, in: Mainzer Zeitschrift 62, 1967, S. 1-55 hier: * 13.6.1758; R. Schmitt, S. 56 Anm.101 hier: * 13.6.1758; RPh 294v, 296v,321r,321v, 322r,322v; Prot. phil. Fak. , S.109; Prot, theol. Fak.1779, Bl.81b; 1780, B1.83a; RTh 68v, 71r; Hansen I, S.48 Nr.44 hier: * 13.6.1758; NDB 4, S.85 f.) |
Funktionsausübung | Aachen | Mitglied | 1799 | ||||||||||
2006 | OCR | SCHEUICHAVIUS (Schevichavius), Gisbert, SJ | im Sommer 1600 ging er nach Aachen und im November 1600 nach Graz als Rektor des Jesuitenkollegs |
Nom. rev., S.8; Cat. Jes. 1597, S.l; De Back.-Som. VII, Sp.776; Duhr I, S. 417; Duhr 11,1, S. 337 Anm. 1; Verzeichnis theol. Fak., S.l) |
Amtsantritt | Aachen, Graz | Rektor | 1600 | Jesuitenkolleg Erfurt, Jesuiten ### |
The output I get is this:
7155
DORSCH, Anton Joseph
Anton Joseph
Dorsch
No more names found.
[]
So I can successfully extract the name components I need and convert the all caps string in the surname to sentence case, but merging the strings to form a new name fails. And then the whole script breaks before moving on to the next name. Did I use the join()
function in a wrong (deprecated?) manner?
What is failing in your code is this specific line:
rel_title=pers_new[2].strip()
The length of the pers_new
is 2, in your example, you would have:
['DORSCH', ' Anton Joseph']
so when you write pers_new[2]
gives IndexError exception. The code is not reaching the join()
.
I guess you forgot to remove that line. Because you are not using anywhere the rel_title
variable.