Commit Graph

4 Commits

Author SHA1 Message Date
os1ma
2667ddc686
Fix make docs_build and related scripts (#7276)
**Description: a description of the change**

Fixed `make docs_build` and related scripts which caused errors. There
are several changes.

First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.

It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.

`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.

Finally, the description of CONTRIBUTING.md was modified.

**Issue: the issue # it fixes (if applicable)**

https://github.com/hwchase17/langchain/issues/6413

**Dependencies: any dependencies required for this change**

`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.

**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan

If this PR needs any additional changes, I'll be happy to make them!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 22:05:14 -04:00
German Martin
3ce4e46c8c
The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015)
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.

The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s

I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!

@rlancemartin, @eyurtsev, @hwchase17,

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-07 10:28:17 -07:00
Leonid Ganeline
03b16ed2b1
docs retrievers fixes (#6299)
Fixed several inconsistencies:
- file names and notebook titles should be similar otherwise ToC on the
[retrievers
page](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)
and on the left ToC tab are different. For example, now, `Self-querying
with Chroma` is not correctly alphabetically sorted because its file
named `chroma_self_query.ipynb`
- `Stringing compressors and document transformers...` demoted from `#`
to `##`. Otherwise, it appears in Toc.
- several formatting problems

#### Who can review?

@hwchase17 
@dev2049

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-19 22:04:35 -07:00
Davis Chase
87e502c6bc
Doc refactor (#6300)
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-16 11:52:56 -07:00