How to chain attribute lookups that might return None in Python?
Question:
My problem is a general one, how to chain a series of attribute lookups when one of the intermediate ones might return None
, but since I ran into this problem trying to use Beautiful Soup, I’m going to ask it in that context.
Beautiful Soup parses an HTML document and returns an object that can be used to access the structured content of that document. For example, if the parsed document is in the variable soup
, I can get its title with:
title = soup.head.title.string
My problem is that if the document doesn’t have a title, then soup.head.title
returns None
and the subsequent string
lookup throws an exception. I could break up the chain as:
x = soup.head
x = x.title if x else None
title = x.string if x else None
but this, to my eye, is verbose and hard to read.
I could write:
title = soup.head and soup.head.title and soup.title.head.string
but that is verbose and inefficient.
One solution if thought of, which I think is possible, would be to create an object (call it nil
) that would return None
for any attribute lookup. This would allow me to write:
title = ((soup.head or nil).title or nil).string
but this is pretty ugly. Is there a better way?
Answers:
You might be able to use reduce
for this:
>>> class Foo(object): pass
...
>>> a = Foo()
>>> a.foo = Foo()
>>> a.foo.bar = Foo()
>>> a.foo.bar.baz = Foo()
>>> a.foo.bar.baz.qux = Foo()
>>>
>>> reduce(lambda x,y:getattr(x,y,''),['foo','bar','baz','qux'],a)
<__main__.Foo object at 0xec2f0>
>>> reduce(lambda x,y:getattr(x,y,''),['foo','bar','baz','qux','quince'],a)
''
In python3.x, I think that reduce
is moved to functools
though 🙁
I suppose you could also do this with a simpler function:
def attr_getter(item,attributes)
for a in attributes:
try:
item = getattr(item,a)
except AttributeError:
return None #or whatever on error
return item
Finally, I suppose the nicest way to do this is something like:
try:
title = foo.bar.baz.qux
except AttributeError:
title = None
The most straightforward way is to wrap in a try
…except
block.
try:
title = soup.head.title.string
except AttributeError:
print "Title doesn't exist!"
There’s really no reason to test at each level when removing each test would raise the same exception in the failure case. I would consider this idiomatic in Python.
One solution would be to wrap the outer object inside a Proxy that handles None values for you. See below for a beginning implementation.
import unittest
class SafeProxy(object):
def __init__(self, instance):
self.__dict__["instance"] = instance
def __eq__(self, other):
return self.instance==other
def __call__(self, *args, **kwargs):
return self.instance(*args, **kwargs)
# TODO: Implement other special members
def __getattr__(self, name):
if hasattr(self.__dict__["instance"], name):
return SafeProxy(getattr(self.instance, name))
if name=="val":
return lambda: self.instance
return SafeProxy(None)
def __setattr__(self, name, value):
setattr(self.instance, name, value)
# Simple stub for creating objects for testing
class Dynamic(object):
def __init__(self, **kwargs):
for name, value in kwargs.iteritems():
self.__setattr__(name, value)
def __setattr__(self, name, value):
self.__dict__[name] = value
class Test(unittest.TestCase):
def test_nestedObject(self):
inner = Dynamic(value="value")
middle = Dynamic(child=inner)
outer = Dynamic(child=middle)
wrapper = SafeProxy(outer)
self.assertEqual("value", wrapper.child.child.value)
self.assertEqual(None, wrapper.child.child.child.value)
def test_NoneObject(self):
self.assertEqual(None, SafeProxy(None))
def test_stringOperations(self):
s = SafeProxy("string")
self.assertEqual("String", s.title())
self.assertEqual(type(""), type(s.val()))
self.assertEqual()
if __name__=="__main__":
unittest.main()
NOTE: I am personally not sure wether I would use this in an actual project, but it makes an interesting experiment and I put it here to get people thoughts on this.
Here is another potential technique, which hides the assignment of the intermediate value in a method call. First we define a class to hold the intermediate value:
class DataHolder(object):
def __init__(self, value = None):
self.v = value
def g(self):
return self.v
def s(self, value):
self.v = value
return value
x = DataHolder(None)
Then we get use it to store the result of each link in the chain of calls:
import bs4;
for html in ('<html><head></head><body></body></html>',
'<html><head><title>Foo</title></head><body></body></html>'):
soup = bs4.BeautifulSoup(html)
print x.s(soup.head) and x.s(x.g().title) and x.s(x.g().string)
# or
print x.s(soup.head) and x.s(x.v.title) and x.v.string
I don’t consider this a good solution, but I’m including it here for completeness.
This is how I handled it with inspiration from @TAS and Is there a Python library (or pattern) like Ruby's andand?
class Andand(object):
def __init__(self, item=None):
self.item = item
def __getattr__(self, name):
try:
item = getattr(self.item, name)
return item if name is 'item' else Andand(item)
except AttributeError:
return Andand()
def __call__(self):
return self.item
title = Andand(soup).head.title.string()
I’m running Python 3.9
Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)]
and the and
key word solves my problem
memo[v] = short_combo and short_combo.copy()
From what I gather this is not pythonic and you should handle the exception.
However in my solution None
ambiguity exists within the function, and in this scenario I would think it to be a poor practice to handle exceptions that occur ~50% of the time.
Where I outside of the function and calling it I would handle the exception.
My problem is a general one, how to chain a series of attribute lookups when one of the intermediate ones might return None
, but since I ran into this problem trying to use Beautiful Soup, I’m going to ask it in that context.
Beautiful Soup parses an HTML document and returns an object that can be used to access the structured content of that document. For example, if the parsed document is in the variable soup
, I can get its title with:
title = soup.head.title.string
My problem is that if the document doesn’t have a title, then soup.head.title
returns None
and the subsequent string
lookup throws an exception. I could break up the chain as:
x = soup.head
x = x.title if x else None
title = x.string if x else None
but this, to my eye, is verbose and hard to read.
I could write:
title = soup.head and soup.head.title and soup.title.head.string
but that is verbose and inefficient.
One solution if thought of, which I think is possible, would be to create an object (call it nil
) that would return None
for any attribute lookup. This would allow me to write:
title = ((soup.head or nil).title or nil).string
but this is pretty ugly. Is there a better way?
You might be able to use reduce
for this:
>>> class Foo(object): pass
...
>>> a = Foo()
>>> a.foo = Foo()
>>> a.foo.bar = Foo()
>>> a.foo.bar.baz = Foo()
>>> a.foo.bar.baz.qux = Foo()
>>>
>>> reduce(lambda x,y:getattr(x,y,''),['foo','bar','baz','qux'],a)
<__main__.Foo object at 0xec2f0>
>>> reduce(lambda x,y:getattr(x,y,''),['foo','bar','baz','qux','quince'],a)
''
In python3.x, I think that reduce
is moved to functools
though 🙁
I suppose you could also do this with a simpler function:
def attr_getter(item,attributes)
for a in attributes:
try:
item = getattr(item,a)
except AttributeError:
return None #or whatever on error
return item
Finally, I suppose the nicest way to do this is something like:
try:
title = foo.bar.baz.qux
except AttributeError:
title = None
The most straightforward way is to wrap in a try
…except
block.
try:
title = soup.head.title.string
except AttributeError:
print "Title doesn't exist!"
There’s really no reason to test at each level when removing each test would raise the same exception in the failure case. I would consider this idiomatic in Python.
One solution would be to wrap the outer object inside a Proxy that handles None values for you. See below for a beginning implementation.
import unittest
class SafeProxy(object):
def __init__(self, instance):
self.__dict__["instance"] = instance
def __eq__(self, other):
return self.instance==other
def __call__(self, *args, **kwargs):
return self.instance(*args, **kwargs)
# TODO: Implement other special members
def __getattr__(self, name):
if hasattr(self.__dict__["instance"], name):
return SafeProxy(getattr(self.instance, name))
if name=="val":
return lambda: self.instance
return SafeProxy(None)
def __setattr__(self, name, value):
setattr(self.instance, name, value)
# Simple stub for creating objects for testing
class Dynamic(object):
def __init__(self, **kwargs):
for name, value in kwargs.iteritems():
self.__setattr__(name, value)
def __setattr__(self, name, value):
self.__dict__[name] = value
class Test(unittest.TestCase):
def test_nestedObject(self):
inner = Dynamic(value="value")
middle = Dynamic(child=inner)
outer = Dynamic(child=middle)
wrapper = SafeProxy(outer)
self.assertEqual("value", wrapper.child.child.value)
self.assertEqual(None, wrapper.child.child.child.value)
def test_NoneObject(self):
self.assertEqual(None, SafeProxy(None))
def test_stringOperations(self):
s = SafeProxy("string")
self.assertEqual("String", s.title())
self.assertEqual(type(""), type(s.val()))
self.assertEqual()
if __name__=="__main__":
unittest.main()
NOTE: I am personally not sure wether I would use this in an actual project, but it makes an interesting experiment and I put it here to get people thoughts on this.
Here is another potential technique, which hides the assignment of the intermediate value in a method call. First we define a class to hold the intermediate value:
class DataHolder(object):
def __init__(self, value = None):
self.v = value
def g(self):
return self.v
def s(self, value):
self.v = value
return value
x = DataHolder(None)
Then we get use it to store the result of each link in the chain of calls:
import bs4;
for html in ('<html><head></head><body></body></html>',
'<html><head><title>Foo</title></head><body></body></html>'):
soup = bs4.BeautifulSoup(html)
print x.s(soup.head) and x.s(x.g().title) and x.s(x.g().string)
# or
print x.s(soup.head) and x.s(x.v.title) and x.v.string
I don’t consider this a good solution, but I’m including it here for completeness.
This is how I handled it with inspiration from @TAS and Is there a Python library (or pattern) like Ruby's andand?
class Andand(object):
def __init__(self, item=None):
self.item = item
def __getattr__(self, name):
try:
item = getattr(self.item, name)
return item if name is 'item' else Andand(item)
except AttributeError:
return Andand()
def __call__(self):
return self.item
title = Andand(soup).head.title.string()
I’m running Python 3.9
Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)]
and the and
key word solves my problem
memo[v] = short_combo and short_combo.copy()
From what I gather this is not pythonic and you should handle the exception.
However in my solution None
ambiguity exists within the function, and in this scenario I would think it to be a poor practice to handle exceptions that occur ~50% of the time.
Where I outside of the function and calling it I would handle the exception.