Unable to create URI with whitespace in MarkLogic

Question:

I have created a Marklogic transform which tries to convert some URL encoded characters: [ ] and whitespace when ingesting data into database. This is the xquery code:

xquery version "1.0-ml";

module namespace space = "http://marklogic.com/rest-api/transform/space-to-space";

declare function space:transform(
    $context    as map:map,
    $params     as map:map,
    $content    as document-node()
  ) as document-node()
{

    let $puts := (
        xdmp:log($params),
        xdmp:log($context),
        map:put($context, "uri", fn:replace(map:get($context, "uri"), "%5B+", "[")),
        map:put($context, "uri", fn:replace(map:get($context, "uri"), "%5D+", "]")),
        map:put($context, "uri", fn:replace(map:get($context, "uri"), "%20+", " ")),
        xdmp:log($context)
    )
    
    return $content
    
};

When I tried this with my python code below

def upload_document(self, inputContent, uri, fileType, database, collection):
        if fileType == 'XML':
            headers = {'Content-type': 'application/xml'}
            fileBytes = str.encode(inputContent)
        elif fileType == 'TXT':
            headers = {'Content-type': 'text/*'}
            fileBytes = str.encode(inputContent)
        else:
            headers = {'Content-type': 'application/octet-stream'}
            fileBytes = inputContent

        endpoint = ML_DOCUMENTS_ENDPOINT
        params = {}

        if uri is not None:
            encodedUri = urllib.parse.quote(uri)
            endpoint = endpoint + "?uri=" + encodedUri

        if database is not None:
            params['database'] = database

        if collection is not None:
            params['collection'] = collection

        params['transform'] = 'space-to-space'

        req = PreparedRequest()
        req.prepare_url(endpoint, params)

        response = requests.put(req.url, data=fileBytes, headers=headers, auth=HTTPDigestAuth(ML_USER_NAME, ML_PASSWORD))
        print('upload_document result: ' + str(response.status_code))

        if response.status_code == 400:
            print(response.text)

The following lines are from the xquery logging:

  1. 2023-02-13 16:59:00.067 Info: {}

  2. 2023-02-13 16:59:00.067 Info:
    {"input-type":"application/octet-stream",
    "uri":"/Judgment/26856/supportingfiles/[TEST] 57_image1.PNG", "output-type":"application/octet-stream"}

  3. 2023-02-13 16:59:00.067 Info:
    {"input-type":"application/octet-stream",
    "uri":"/Judgment/26856/supportingfiles/[TEST] 57_image1.PNG", "output type":"application/octet-stream"}

  4. 2023-02-13 16:59:00.653 Info: Status 500: REST-INVALIDPARAM: (err:FOER0000)
    Invalid parameter: invalid uri:
    /Judgment/26856/supportingfiles/[TEST] 57_image1.PNG

Asked By: Eugene

||

Answers:

The MarkLogic REST API is very opinionated about what a valid URI is, and it doesn’t allow you to insert documents that have spaces in the URI. If you have an existing URI with a space in it, the REST API will retrieve or update it for you. However, it won’t allow you to create a new document with such a URI.

If you need to create documents with spaces in the URI, then you will need to use lower-level APIs. xdmp:document-insert() will let you.

Answered By: Mads Hansen

The lower level APIs have allowed it, but the newer REST API tightened this up.

See https://help.marklogic.com/knowledgebase/article/View/valid-characters-in-a-marklogic-document-uri

Answered By: asusu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.