fix: more robust check whether the HF model is quantized (#11891)

Removes the check of `model.is_quantized` and adds more robust way of
checking for 4bit and 8bit quantization in the `huggingface_pipeline.py`
script. I had to make the original change on the outdated version of
`transformers`, because the models had this property before. Seems
redundant now.

Fixes: https://github.com/langchain-ai/langchain/issues/11809 and
https://github.com/langchain-ai/langchain/issues/11759
pull/11789/head
eryk-dsai 9 months ago committed by GitHub
parent efa9ef75c0
commit 5019f59724
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -109,9 +109,8 @@ class HuggingFacePipeline(BaseLLM):
) from e
if (
model.is_quantized
or model.model.is_loaded_in_4bit
or model.model.is_loaded_in_8bit
getattr(model, "is_loaded_in_4bit", False)
or getattr(model, "is_loaded_in_8bit", False)
) and device is not None:
logger.warning(
f"Setting the `device` argument to None from {device} to avoid "

Loading…
Cancel
Save