Ray monitoring fails when binding to empty address

Question:

I’m learning to use RLlib. I’ve been running it in my debugger on an example script, and it works, but for some reason I get an error message about the monitoring service failing. This is the traceback:

File "/home/ramrachum/.venvs/ray_env/lib/python3.10/site-packages/ray/autoscaler/_private/monitor.py", line 600, in <module>
  monitor = Monitor(
File "/home/ramrachum/.venvs/ray_env/lib/python3.10/site-packages/ray/autoscaler/_private/monitor.py", line 205, in __init__
  logger.exception(
File "/usr/lib/python3.10/logging/__init__.py", line 1512, in exception
  self.error(msg, *args, exc_info=exc_info, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 70, in error
File "/usr/lib/python3.10/logging/__init__.py", line 1911, in _LogErrorReplacement
  msg,
File "/home/ramrachum/.venvs/ray_env/lib/python3.10/site-packages/ray/autoscaler/_private/monitor.py", line 199, in __init__
  prometheus_client.start_http_server(
File "/home/ramrachum/.venvs/ray_env/lib/python3.10/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
  TmpServer.address_family, addr = _get_best_family(addr, port)
File "/home/ramrachum/.venvs/ray_env/lib/python3.10/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family
  infos = socket.getaddrinfo(address, port)
File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo
  for res in _socket.getaddrinfo(host, port, family, type, proto, flags):

socket.gaierror: [Errno -5] No address associated with hostname

I’m trying to understand why this bug is happening and how I can fix it. The hostname it’s trying to use is '', which sounds like something that shouldn’t work. Working my way up the traceback, I see that in ray/autoscaler/_private/monitor.py line 201, there’s this logic:

addr="127.0.0.1" if head_node_ip == "127.0.0.1" else "",

Since in my case, head_node_ip is equal to '192.168.1.116', the else clause is used and an empty address is passed on getaddrinfo.

I’m not sure what the logic of this code is. Can getaddrinfo even work with an empty string? How does this service work for people normally? How do I make it not fail?

Asked By: Ram Rachum

||

Answers:

This is a known bug with prometheus-client==0.14.

Answered By: Ram Rachum