Issue: When the third-party package is not installed, whenever we need
to `pip install <package>` the ImportError is raised.
But sometimes, the `ValueError` or `ModuleNotFoundError` is raised. It
is bad for consistency.
Change: replaced the `ValueError` or `ModuleNotFoundError` with
`ImportError` when we raise an error with the `pip install <package>`
message.
Note: Ideally, we replace all `try: import... except... raise ... `with
helper functions like `import_aim` or just use the existing
[langchain_core.utils.utils.guard_import](https://api.python.langchain.com/en/latest/utils/langchain_core.utils.utils.guard_import.html#langchain_core.utils.utils.guard_import)
But it would be much bigger refactoring. @baskaryan Please, advice on
this.
When testing Nomic embeddings --
```
from langchain_community.embeddings import LlamaCppEmbeddings
embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf"
embd_lc = LlamaCppEmbeddings(model_path=embd_model_path)
embedding_lc = embd_lc.embed_query(query)
```
We were seeing this error for strings > a certain size --
```
File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count)
824 s_sizes = []
826 # add to batch
--> 827 self._batch.add_sequence(tokens, len(s_sizes), False)
828 t_batch += n_tokens
829 s_sizes.append(n_tokens)
File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all)
540 self.batch.token[j] = batch[i]
541 self.batch.pos[j] = i
--> 542 self.batch.seq_id[j][0] = seq_id
543 self.batch.n_seq_id[j] = 1
544 self.batch.logits[j] = logits_all
ValueError: NULL pointer access
```
The default `n_batch` of llama-cpp-python's Llama is `512` but we were
explicitly setting it to `8`.
These need to be set to equal for embedding models.
* The embedding.cpp example has an assertion to make sure these are
always equal.
* Apparently this is not being done properly in llama-cpp-python.
With `n_batch` set to 8, if more than 8 tokens are passed the batch runs
out of space and it crashes.
This also explains why the CPU compute buffer size was small:
raw client with default `n_batch=512`
```
llama_new_context_with_model: CPU input buffer size = 3.51 MiB
llama_new_context_with_model: CPU compute buffer size = 21.00 MiB
```
langchain with `n_batch=8`
```
llama_new_context_with_model: CPU input buffer size = 0.04 MiB
llama_new_context_with_model: CPU compute buffer size = 0.33 MiB
```
We can work around this by passing `n_batch=512`, but this will not be
obvious to some users:
```
embedding = LlamaCppEmbeddings(model_path=embd_model_path,
n_batch=512)
```
From discussion w/ @cebtenzzre. Related:
https://github.com/abetlen/llama-cpp-python/issues/1189
Co-authored-by: Bagatur <baskaryan@gmail.com>