Elasticsearch / Python / Proxy

Question:

im new to stackoverflow, so if i make a mistake im sorry.

I have to write a python script which collects some data with Elasticsearch and then write the data to a database. I am struggling collecting the data with elasticsearch, because the company i work is behind a proxy.

The script works without a proxy.. but i dont know how to pass down the proxy settings to Elasticsearch.

following code works without a proxy:

es = Elasticsearch(['https://user:[email protected]/elasticsearch'])
res = es.search(index=index, body=request, search_type="count")

i tried the following when i am behind the proxy:

es = Elasticsearch(['https://user:[email protected]/elasticsearch'], _proxy = 'http://proxy.org', _proxy_headers = {'basic_auth': 'user:pw'})
res = es.search(index=index, body=request, search_type="count")
return res

Does anyone know the keywords which i have to pass down Elasticsearch so it uses the proxy?

any help would be nice.

thanks.

Asked By: meulth

||

Answers:

I got an answer on GitHub:

https://github.com/elastic/elasticsearch-py/issues/275#issuecomment-143781969

Thanks a ton again!

from elasticsearch import RequestsHttpConnection

class MyConnection(RequestsHttpConnection):
    def __init__(self, *args, **kwargs):
        proxies = kwargs.pop('proxies', {})
        super(MyConnection, self).__init__(*args, **kwargs)
        self.session.proxies = proxies

es = Elasticsearch([es_url], connection_class=MyConnection, proxies = {'https': 'http://user:[email protected]:port'})


print(es.info())
Answered By: meulth

Generally, we don’t need to add extra code for proxy, the python low-level module shall be able to use system proxy (i.e. http_proxy) directly.

In the later release (at least 6.x) we can use requests module instead of urllib3 to solve this problem nicely, see https://elasticsearch-py.readthedocs.io/en/master/transports.html

# make sure the http_proxy is in system env
from elasticsearch import Elasticsearch, RequestsHttpConnection
es = Elasticsearch([es_url], connection_class=RequestsHttpConnection)

Another possible problem is search using GET method as default, it is rejected by my old cache server (squid/3.19), extra parameter send_get_body_as shall be added, see https://elasticsearch-py.readthedocs.io/en/master/#environment-considerations

from elasticsearch import Elasticsearch
es = Elasticsearch(send_get_body_as='POST')
Answered By: Larry Cai

For people that needs to use elasticsearch 8.X.X and that do not want to export http_proxy and no_proxy environment variables. They can do this:

from elastic_transport import RequestsHttpNode
from elasticsearch import Elasticsearch


class CustomHttpNode(RequestsHttpNode):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.session.proxies = {"https": "http://localhost:8888"}


es = Elasticsearch(
    "https://localhost:9200",        
    basic_auth=("user", "password"),
    node_class=CustomHttpNode,
)
print(es.info())

https://github.com/elastic/elastic-transport-python/issues/53#issuecomment-1447903214

Answered By: BeGreen
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.