community: Fix all page numbers were the same for _BaseGoogleVertexAISearchRetriever (#19175)

- Description:
- This pull request is to fix a bug where page numbers were not set
correctly. In the current code, all chunks share the same metadata
object doc_metadata, so the page number is set with the same value for
all documents. To fix this, I changed to using separate metadata objects
for each chunk.
- Issue:
  - None
- Dependencies:
  - No additional dependencies are required for this change.
- Twitter handle:
  - @eycjur

- Test
- Even if it's not a bug, there are cases where everything ends up with
the same number of pages, so it's very difficult for me to write
integration tests.
pull/19190/head^2
k.muto 3 months ago committed by GitHub
parent 160a7077b0
commit 8d2c34e655
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -137,14 +137,15 @@ class _BaseGoogleVertexAISearchRetriever(BaseModel):
continue
for chunk in derived_struct_data[chunk_type]:
doc_metadata["source"] = derived_struct_data.get("link", "")
chunk_metadata = doc_metadata.copy()
chunk_metadata["source"] = derived_struct_data.get("link", "")
if chunk_type == "extractive_answers":
doc_metadata["source"] += f":{chunk.get('pageNumber', '')}"
chunk_metadata["source"] += f":{chunk.get('pageNumber', '')}"
documents.append(
Document(
page_content=chunk.get("content", ""), metadata=doc_metadata
page_content=chunk.get("content", ""), metadata=chunk_metadata
)
)

Loading…
Cancel
Save