langchain/libs
Harsimran-19 c1d8c33df6
core: JsonOutputParser UTF characters bug (#27306)
**Description:**
This PR fixes an issue where non-ASCII characters in Pydantic field
descriptions were being escaped to their Unicode representations when
using `JsonOutputParser`. The change allows non-ASCII characters to be
preserved in the output, which is especially important for multilingual
support and when working with non-English languages.

**Issue:** Fixes #27256

**Example Code:**
```python
from pydantic import BaseModel, Field
from langchain_core.output_parsers import JsonOutputParser

class Article(BaseModel):
    title: str = Field(description="科学文章的标题")

output_data_structure = Article
parser = JsonOutputParser(pydantic_object=output_data_structure)
print(parser.get_format_instructions())
```
**Previous Output**:
```... "title": {"description": "\\u79d1\\u5b66\\u6587\\u7ae0\\u7684\\u6807\\u9898", "title": "Title", "type": "string"}} ...```

**Current Output**:
```... "title": {"description": "科学文章的标题", "title": "Title", "type":
"string"}} ...```

**Changes made**:
- Modified `json.dumps()` call in
`langchain_core/output_parsers/json.py` to use `ensure_ascii=False`
- Added a unit test to verify Unicode handling

Co-authored-by: Harsimran-19 <harsimran1869@gmail.com>
2024-10-29 14:48:53 +00:00
..
cli multiple: update docs urls to latest 2 (#26837) 2024-09-30 17:37:07 -07:00
community community: Fix closed session in Infinity (#26933) 2024-10-27 11:37:21 -04:00
core core: JsonOutputParser UTF characters bug (#27306) 2024-10-29 14:48:53 +00:00
experimental experimental: migrate to external repo (#26879) 2024-09-25 19:02:19 -07:00
langchain docs: Fix typo in _action_agent docs section (#27698) 2024-10-29 14:16:42 +00:00
partners partners/huggingface[patch]: fix HuggingFacePipeline model_id parameter (#27514) 2024-10-29 14:34:46 +00:00
standard-tests standard-tests: test that only one chunk sets input_tokens (#27177) 2024-10-08 11:35:32 -07:00
text-splitters all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00