Strips leading/trailing whitespace before parsing xml (#12297)

**Description:** When llms output leading or trailing whitespace for xml
(when using XMLOutputParser) the parser would raise a `ValueError: Could
not parse output: ...`. However, leading or trailing whitespace are
"ignorable" in the sense of XML standard.

**Issue:** I did not find an issue related.

**Dependencies:** None

**Tag maintainer:**

**Twitter handle:** donatoaz

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

Done, updated unit test and ran `make docker_test`.
pull/12308/head
Donato Azevedo 10 months ago committed by GitHub
parent 3da1a65fa0
commit d9f1bcf366
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -22,6 +22,8 @@ class XMLOutputParser(BaseOutputParser):
encoding_match = self.encoding_matcher.search(text) encoding_match = self.encoding_matcher.search(text)
if encoding_match: if encoding_match:
text = encoding_match.group(2) text = encoding_match.group(2)
text = text.strip()
if (text.startswith("<") or text.startswith("\n<")) and ( if (text.startswith("<") or text.startswith("\n<")) and (
text.endswith(">") or text.endswith(">\n") text.endswith(">") or text.endswith(">\n")
): ):

@ -4,7 +4,7 @@ import pytest
from langchain.output_parsers.xml import XMLOutputParser from langchain.output_parsers.xml import XMLOutputParser
DEF_RESULT_ENCODING = """<?xml version="1.0" encoding="UTF-8"?> DEF_RESULT_ENCODING = """<?xml version="1.0" encoding="UTF-8"?>
<foo> <foo>
<bar> <bar>
<baz></baz> <baz></baz>
<baz>slim.shady</baz> <baz>slim.shady</baz>

Loading…
Cancel
Save