Strips leading/trailing whitespace before parsing xml (#12297)

**Description:** When llms output leading or trailing whitespace for xml
(when using XMLOutputParser) the parser would raise a `ValueError: Could
not parse output: ...`. However, leading or trailing whitespace are
"ignorable" in the sense of XML standard.

**Issue:** I did not find an issue related.

**Dependencies:** None

**Tag maintainer:**

**Twitter handle:** donatoaz

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

Done, updated unit test and ran `make docker_test`.
pull/12308/head
Donato Azevedo 9 months ago committed by GitHub
parent 3da1a65fa0
commit d9f1bcf366
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -22,6 +22,8 @@ class XMLOutputParser(BaseOutputParser):
encoding_match = self.encoding_matcher.search(text)
if encoding_match:
text = encoding_match.group(2)
text = text.strip()
if (text.startswith("<") or text.startswith("\n<")) and (
text.endswith(">") or text.endswith(">\n")
):

@ -4,7 +4,7 @@ import pytest
from langchain.output_parsers.xml import XMLOutputParser
DEF_RESULT_ENCODING = """<?xml version="1.0" encoding="UTF-8"?>
<foo>
<foo>
<bar>
<baz></baz>
<baz>slim.shady</baz>

Loading…
Cancel
Save