PySpark / Python Slicing and Indexing Issue

Question:

Can someone let me know how to pull out certain values from a Python output.

I would like the retrieve the value ‘ocweeklyreports’ from the the following output using either indexing or slicing:

'config': '{"hiveView":"ocweeklycur.ocweeklyreports"}

This should be relatively easy, however, I’m having problem defining the Slicing / Indexing configuation

The following will successfully give me ‘ocweeklyreports’

myslice = config['hiveView'][12:30]

However, I need the indexing or slicing modified so that I will get any value after’ocweeklycur’

Asked By: Patterson

||

Answers:

I’m not sure what output you’re dealing with and how robust you’re wanting it but if it’s just a string you can do something similar to this (for a quick and dirty solution).

input = "Your input"
indexStart = input.index('.') + 1 # Get the index of the input at the . which is where you would like to start collecting it
finalResponse = input[indexStart:-2])
print(finalResponse) # Prints ocweeklyreports

Again, not the most elegant solution but hopefully it helps or at least offers a starting point. Another more robust solution would be to use regex but I’m not that skilled in regex at the moment.

Answered By: Chris

You could almost all of it using regex.
See if this helps:

import re
def search_word(di):
  st = di["config"]["hiveView"]
  p = re.compile(r'^ocweeklycur.(?P<word>w+)')
  m = p.search(st)
  return m.group('word')

if __name__=="__main__":
  d = {'config': {"hiveView":"ocweeklycur.ocweeklyreports"}}
  print(search_word(d))
Answered By: teedak8s

The following worked best for me:


# Extract the value of the "hiveView" key
hive_view = config['hiveView']

# Split the string on the '.' character
parts = hive_view.split('.')

# The value you want is the second part of the split string
desired_value = parts[1]

print(desired_value)  # Output: "ocweeklyreports"
Answered By: Patterson