fix: more robust check whether the HF model is quantized (#11891)

Removes the check of `model.is_quantized` and adds more robust way of checking for 4bit and 8bit quantization in the `huggingface_pipeline.py` script. I had to make the original change on the outdated version of `transformers`, because the models had this property before. Seems redundant now. Fixes: https://github.com/langchain-ai/langchain/issues/11809 and https://github.com/langchain-ai/langchain/issues/11759
1 year ago · 5019f59724
parent efa9ef75c0
commit 5019f59724
1 changed files with 2 additions and 3 deletions
--- a/libs/langchain/langchain/llms/huggingface_pipeline.py
+++ b/libs/langchain/langchain/llms/huggingface_pipeline.py
@ -109,9 +109,8 @@ class HuggingFacePipeline(BaseLLM):
            ) from e

        if (
-            model.is_quantized
-            or model.model.is_loaded_in_4bit
-            or model.model.is_loaded_in_8bit
+            getattr(model, "is_loaded_in_4bit", False)
+            or getattr(model, "is_loaded_in_8bit", False)
        ) and device is not None:
            logger.warning(
                f"Setting the `device` argument to None from {device} to avoid "