Python communicating with clangd in a non-blocking way

Question:

I’m implementing a plugin for one of my hobby projects and basically I want to use an LSP for highlightning code put on a website.

In my case, I want to use clangd to gather information about pre-written C and C++ code. The problem I’m facing is how to exactly communicate with clangd – I have trouble sending and receiving JSONs over piped stdin and stdout.

The biggest problems are:

  • the protocol has no strict delimeters like , only the Content-Length header
  • pipes are blocking by default which hangs my python code which doesn’t know when to stop reading (too much reading = hang untill clangd produces more which deadlocks my program which waits for more output before doing more LSP calls)
  • There is no alternative way of communication – only I/O streams. I haven’t found anything such inside clangd options and sources.

Here is my code so far – it’s enough to start clangd and send the first initialization JSON but then I don’t know how to proceed with JSON exchanges without deadlocks or hangs.

import os
import json
import subprocess
import shutil
from typing import Union, List, Dict, Tuple


def make_json_rpc_request(id: Union[str, int], method: str, params: Union[Tuple, List, Dict]):
    if not isinstance(id, (str, int)):
        raise RuntimeError(f"id should be a number or a string: {id}")

    request = {
        "jsonrpc": "2.0",
        "id": id,
        "method": method
    }

    if params is not None:
        if isinstance(params, (list, tuple, dict)):
            request["params"] = params
        else:
            raise RuntimeError(f"params is not a structured type: {params}")

    return request


def make_lsp_request(json_rpc_request):
    string = json.dumps(json_rpc_request, indent=None)
    string = f"Content-Length: {len(string)}rnrn{string}"
    return string

def get_clangd_path():
    result = shutil.which("clangd")
    if result:
        return result

    env_name = os.environ.get("CLANGD")
    if env_name:
        result = shutil.which(env_name)
        if result:
            return result

    raise RuntimeError("clangd not found. Specify env variable CLANGD that points to the executable or to a name searchable in PATH")


class Connection:
    def __init__(self):
        self.clangd_path = get_clangd_path()
        self.p = subprocess.Popen([self.clangd_path], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
        self.id = 1

    def send(self, method: str, params):
        request = make_lsp_request(make_json_rpc_request(self.id, method, params))
        self.p.stdin.write(request.encode())
        self.p.stdin.flush()
        self.id += 1


if __name__ == "__main__":
    conn = Connection()
    conn.send("initialize", {"params": {
        "processId": None,
            "rootUri": None,
            "capabilities": {
        }
    }})
    print(conn.p.stdout.read1())

I tried solutions proposed in A non-blocking read on a subprocess.PIPE in Python but couldn’t get anything to work.

The goal: have something like make_lsp_call(self, method, params, timeout) that returns either:

  • the received JSON from clangd
  • received error messages
  • None if timeout

Edit: working solution:

    HEADER_CONTENT_LENGTH = "Content-Length: "

    def receive(self, id: int):
        headers = []

        while True:
            line = self.p.stdout.readline()
            if line != b"rn":
                headers.append(line)
            else:
                break

        length = 0
        for hdr in headers:
            hdr = hdr.decode()
            if HEADER_CONTENT_LENGTH in hdr:
                length = int(hdr.removeprefix(HEADER_CONTENT_LENGTH))
                break

        if length == 0:
            raise RuntimeError(f"invalid or missing '{HEADER_CONTENT_LENGTH}' header")

        return self.p.stdout.read(length).decode()
Asked By: Xeverous

||

Answers:

the protocol has no strict delimeters like , only the Content-Length header

According to the LSP docs the header part is separated from the content with CRLFCRLF, and each header is separated by CRLF, just like in HTTP.

IOW, you don’t want to use .read() to read everything there is in the pipe, but to read a single message:

  1. readline() until you get an empty line, put the contents of each header line in e.g. a dict
  2. If you didn’t get a Content-Length header, the other end is in violation of the spec and that’s a fatal error
  3. Read exactly Content-Length bytes with .read(n).
  4. Repeat – the next thing you get should be yet another header.

Opening an async can of worms for this doesn’t seem necessary.

Answered By: AKX