Get value from next row into the previous row as a separate column

Question:

import pandas as pd
url = r'https://www.geonames.org/postal-codes/DE/BE/berlin.html'
table = pd.read_html(r'https://www.geonames.org/postal-codes/DE/BE/berlin.html')
table[2].to_excel('berlin_zipcodes.xlsx')

table[2] looks like this
enter image description here

Output expected:
enter image description here

Take for example the first 2 rows:
52.517 is supposedly the longitude
13.387 is supposedly the latitude.

row[0] should have 52.517 as the value of the column "Longitude" and 13.387 as the value of the column "Latitude".

The excel screenshot was created using Excel, but I would like to automate the process with Python.

Asked By: luc

||

Answers:

You can try:

import pandas as pd

url = r'https://www.geonames.org/postal-codes/DE/BE/berlin.html'
table = pd.read_html(r'https://www.geonames.org/postal-codes/DE/BE/berlin.html')[2]

# identify rows with coordinates
m = table.pop('Unnamed: 0').isna()

# filter other ones
out = table[~m]

# backfill the coordinates and split to new columns
out[['Longitue', 'Latitude']] = table['Place'].where(m).bfill()[~m].str.split('/', n=1, expand=True)

out.to_excel('berlin_zipcodes.xlsx')

Output:

             Place   Code  Country  Admin1 Admin2         Admin3  Admin4 Longitude Latitude
0           Berlin  10117  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.517   13.387
2           Berlin  10115  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.532   13.385
4           Berlin  10119  Germany  Berlin    NaN  Berlin, Stadt  Berlin     52.53   13.405
6           Berlin  10178  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.521    13.41
8           Berlin  10179  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.512   13.416
..             ...    ...      ...     ...    ...            ...     ...       ...      ...
378         Berlin  13583  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.544   13.182
380         Berlin  13589  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.557   13.168
382         Berlin  13159  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.623   13.398
384         Berlin  14131  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.517     13.4
386  Reinickendorf  13047  Germany  Berlin    NaN  Berlin, Stadt  Berlin    52.567   13.333

[194 rows x 9 columns]

intermediates

# this computes a boolean Series to select the rows with coordinates
m = table.pop('Unnamed: 0').isna()

# this masks the non coordinates from the "Place" column
# and backfills the coordinates to the previous row
table['Place'].where(m).bfill()

# then we select the other rows
table['Place'].where(m).bfill()[~m]

# and split on "/" to get 2 new columns
table['Place'].where(m).bfill()[~m].str.split('/', n=1, expand=True)
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.