From 1b48d6cb8c06784d0173c2bb31d23345c154f35c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Sicard?=
 <33360172+ClementSicard@users.noreply.github.com>
Date: Thu, 28 Sep 2023 21:37:51 -0400
Subject: [PATCH] `LlamaCppEmbeddings`: adds `verbose` parameter, similar to
 `llms.LlamaCpp` class (#11038)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Description

As of now, when instantiating and during inference, `LlamaCppEmbeddings`
outputs (a lot of) verbose when controlled from Langchain binding - it
is a bit annoying when computing the embeddings of long documents, for
instance.

This PR adds `verbose` for `LlamaCppEmbeddings` objects to be able
**not** to print the verbose of the model to `stderr`. It is natively
supported by `llama-cpp-python` and directly passed to the library – the
PR is hence very small.

The value of `verbose` is `True` by default, following the way it is
defined in [`LlamaCpp` (`llamacpp.py`
#L136-L137)](https://github.com/langchain-ai/langchain/blob/c87e9fb2ce0ae617e3b2edde52421c80adef54cc/libs/langchain/langchain/llms/llamacpp.py#L136-L137)

## Issue

_No issue linked_

## Dependencies

_No additional dependency needed_

## To see it in action

```python
from langchain.embeddings import LlamaCppEmbeddings

MODEL_PATH = "<path_to_gguf_file>"

if __name__ == "__main__":
    llm_embeddings = LlamaCppEmbeddings(
        model_path=MODEL_PATH,
        n_gpu_layers=1,
        n_batch=512,
        n_ctx=2048,
        f16_kv=True,
        verbose=False,
    )
```

Co-authored-by: Bagatur <baskaryan@gmail.com>
---
 libs/langchain/langchain/embeddings/llamacpp.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/libs/langchain/langchain/embeddings/llamacpp.py b/libs/langchain/langchain/embeddings/llamacpp.py
index db3ec98093..5da4999132 100644
--- a/libs/langchain/langchain/embeddings/llamacpp.py
+++ b/libs/langchain/langchain/embeddings/llamacpp.py
@@ -54,6 +54,9 @@ class LlamaCppEmbeddings(BaseModel, Embeddings):
     n_gpu_layers: Optional[int] = Field(None, alias="n_gpu_layers")
     """Number of layers to be loaded into gpu memory. Default None."""
 
+    verbose: bool = Field(True, alias="verbose")
+    """Print verbose output to stderr."""
+
     class Config:
         """Configuration for this pydantic object."""
 
@@ -73,6 +76,7 @@ class LlamaCppEmbeddings(BaseModel, Embeddings):
             "use_mlock",
             "n_threads",
             "n_batch",
+            "verbose",
         ]
         model_params = {k: values[k] for k in model_param_names}
         # For backwards compatibility, only include if non-null.