langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Leonid Ganeline	9544b30821	added `Wikipedia` document loader (#4141 ) - Added the `Wikipedia` document loader. It is based on the existing `unilities/WikipediaAPIWrapper` - Added a respective ut-s and example notebook - Sorted list of classes in __init__	2023-05-06 09:32:45 -07:00
Davis Chase	5ca13cc1f0	Dev2049/pypdfium2 (#4209 ) thanks @jerrytigerxu for the addition! --------- Co-authored-by: Jere Xu <jtxu2008@gmail.com> Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>	2023-05-05 17:55:31 -07:00
Leonid Ganeline	59204a5033	docs: `document_loaders` improvements (#4200 ) - made notebooks consistent: titles, service/format descriptions. - corrected short names to full names, for example, `Word` -> `Microsoft Word` - added missed descriptions - renamed notebook files to make ToC correctly sorted	2023-05-05 17:44:54 -07:00
Aivin V. Solatorio	6567b73e1a	JSON loader (#4067 ) This implements a loader of text passages in JSON format. The `jq` syntax is used to define a schema for accessing the relevant contents from the JSON file. This requires dependency on the `jq` package: https://pypi.org/project/jq/. --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-05 14:48:13 -07:00
Harrison Chase	26534457f5	simplify csv args (#4182 )	2023-05-05 09:22:08 -07:00
Davis Chase	d84bb02881	Add Chroma self query (#4149 ) Add internal query language -> chroma metadata filter translator	2023-05-05 08:43:08 -07:00
Harrison Chase	a9c2450330	Harrison/toml loader (#4090 ) Co-authored-by: Mika Ayenson <Mikaayenson@users.noreply.github.com>	2023-05-03 23:14:39 -07:00
Harrison Chase	fba6921b50	Harrison/one drive loader (#4081 ) Co-authored-by: José Ferraz Neto <netoferraz@gmail.com>	2023-05-03 22:55:34 -07:00
Davis Chase	7f8727bbcd	Router chains (#4019 ) Unpolished router examples to help flesh out abstractions and use cases ![Screenshot 2023-05-02 at 7 02 58 PM](https://user-images.githubusercontent.com/130488702/235820394-389e5584-db0b-415e-a260-2824b5555167.png) --------- Co-authored-by: Shreya Rajpal <shreya.rajpal@gmail.com>	2023-05-03 22:02:55 -07:00
Harrison Chase	5f30cc8713	Harrison/knn retriever (#4083 ) Co-authored-by: Yuichi Tateno (secon) <hotchpotch@users.noreply.github.com>	2023-05-03 21:21:58 -07:00
Harrison Chase	5a269d3175	Harrison/media wiki xml (#4072 ) Co-authored-by: Géraud de Drouas <gdedrouas@users.noreply.github.com>	2023-05-03 20:45:33 -07:00
Ivo Stranic	3b556eae44	Update deeplake example (#4055 )	2023-05-03 18:03:51 -07:00
Steve Kim	9b830f437c	Deleted importing Document from document_loaders.base because Documen… (#4068 ) Hi, - Modification: https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/arxiv.html - Reason: In this example, the first line is unnecessary because the Document class does not exist in the base. - Resolves: Issue #4052 -------- P.S: This pull-request is my first time, so please let me know if I need to correct or write more explanation.	2023-05-03 17:54:30 -07:00
Akash Sharma	525db1b6cb	Fixed typo leading to broken link (#4034 )	2023-05-03 14:45:54 -07:00
Zander Chase	aa38355999	Vwp/docs improved document loaders (#4006 ) Huge thanks to @leo-gan for improving the document loaders notebooks --------- Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>	2023-05-02 15:24:53 -07:00
Jinto Jose	013208cce6	Fix Documentation - Nomic - Atlas Jupyter Notebook (#3987 ) Correction to Numic-Atlas Jupyter Notebook Docs	2023-05-02 14:20:01 -07:00
Harrison Chase	f04faf8496	Harrison/spreedly (#3937 ) Co-authored-by: Esmit Pérez <esmitperez@users.noreply.github.com>	2023-05-01 20:56:56 -07:00
Davis Chase	e7e29f9937	Dev2049/add modern treasury (#3924 ) Modified Modern Treasury and Strip slightly so credentials don't have to be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury! --------- Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>	2023-05-01 20:28:02 -07:00
Harrison Chase	e28c6403aa	Harrison/cohere reranker (#3904 )	2023-05-01 15:40:16 -07:00
Nikolas Garske	c4d3d74148	Fix typos in arxiv.ipynb (#3887 ) Several minor typos in the doc for the arxiv document loaders were fixed.	2023-05-01 09:17:37 -07:00
Harrison Chase	20aad0bed1	stripe docs	2023-04-29 08:16:37 -07:00
Sheldon	399065e858	update zilliz example (#3578 ) 1. Now the Zilliz example can't connect to Zilliz Cloud, fixed Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-28 22:10:13 -07:00
Harrison Chase	c494ca3ad2	Harrison/doc2txt (#3772 ) Co-authored-by: rishni ratnam <rishniratnam@gmail.com>	2023-04-28 21:54:16 -07:00
Harrison Chase	0c0f14407c	Harrison/tair (#3770 ) Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>	2023-04-28 21:25:33 -07:00
Harrison Chase	b7ae9f715d	Langchain with reddit (#3661 ) (#3768 ) I have added a reddit document loader which fetches the text from the Posts of Subreddits or Reddit users, using the `praw` Python package. I have also added an example notebook reddit.ipynb in order to guide users to use this dataloader. This code was made in format similar to twiiter document loader. I have run code formating, linting and also checked the code myself for different scenarios. This is my first contribution to an open source project and I am really excited about this. If you want to suggest some improvements in my code, I will be happy to do it. :) Co-authored-by: Taaha Bajwa <taaha.s.bajwa@gmail.com>	2023-04-28 20:59:56 -07:00
Jon Saginaw	f8d69e4e52	Enhancement: Blockchain Document Loader with better Metadata support (#3710 ) This PR includes some minor alignment updates, including: - metadata object extended to support contractAddress, blockchainType, and tokenId - notebook doc better aligned to standard langchain format - startToken changed from int to str to support multiple hex value types on the Alchemy API The updated metadata will look like the below. It's possible for a single contractAddress to exist across multiple blockchains (e.g. Ethereum, Polygon, etc.) so it's important to include the blockchainType. ``` metadata = {"source": self.contract_address, "blockchain": self.blockchainType, "tokenId": tokenId} ```	2023-04-28 20:13:05 -07:00
Davis Chase	220a7076ac	Add Mathpix pdf loader (#3727 ) Inspo https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-28 20:11:22 -07:00
Harrison Chase	40f6e60e68	Harrison/stripe (#3762 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	2023-04-28 20:03:21 -07:00
Harrison Chase	7a129ac043	Harrison/pypdf loader (#3764 ) Co-authored-by: Felipe Meres <felipe@felipemeres.com>	2023-04-28 19:56:21 -07:00
Harrison Chase	c55ba43093	Harrison/vespa (#3761 ) Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>	2023-04-28 19:48:43 -07:00
leo-gan	e510732ad2	docs: improved `vectorstore` notebooks (#3724 ) - Added links to the vectorstore providers - Added installation code (it is not clear that we have to go to the `LangChan Ecosystem` page to get installation instructions.)	2023-04-28 19:26:50 -07:00
Alan Cha	e3b7a20454	Fix typo (#3728 )	2023-04-28 13:01:09 -07:00
Davis Chase	3b609642ae	Self-query with generic query constructor (#3607 ) Alternate implementation of #3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as #3452 on exampels --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-27 08:36:00 -07:00
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	2023-04-27 08:14:36 -07:00
Shukri	fac4f36a87	Update models used for embeddings in the weaviate example (#3594 ) Use text-embedding-ada-002 because it [outperforms all other models](https://openai.com/blog/new-and-improved-embedding-model).	2023-04-26 21:48:08 -07:00
leo-gan	36c59e0c25	`Arxiv` document loader (#3627 ) It makes sense to use `arxiv` as another source of the documents for downloading. - Added the `arxiv` document_loader, based on the `utilities/arxiv.py:ArxivAPIWrapper` - added tests - added an example notebook - sorted `__all__` in `__init__.py` (otherwise it is hard to find a class in the very long list)	2023-04-26 21:04:56 -07:00
Eric Peter	603ea75bcd	Fix docs error for google drive loader (#3574 )	2023-04-25 22:52:59 -07:00
apurvsibal	af7906f100	Update Alchemy Key URL (#3559 ) Update Alchemy Key URL in Blockchain Document Loader. I want to say thank you for the incredible work the LangChain library creators have done. I am amazed at how seamlessly the Loader integrates with Ethereum Mainnet, Ethereum Testnet, Polygon Mainnet, and Polygon Testnet, and I am excited to see how this technology can be extended in the future. @hwchase17 - Please let me know if I can improve or if I have missed any community guidelines in making the edit? Thank you again for your hard work and dedication to the open source community.	2023-04-25 16:08:42 -07:00
Harrison Chase	0fc0aa62f2	Harrison/blockchain docloader (#3491 ) Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>	2023-04-25 08:07:06 -07:00
Maxwell Mullin	696f840426	GuessedAtParserWarning from RTD document loader documentation example (#3397 ) Addresses #3396 by adding `features='html.parser'` in example	2023-04-24 21:54:39 -07:00
jrhe	980cc41709	Adds progress bar using tqdm to directory_loader (#3349 ) Approach copied from `WebBaseLoader`. Assumes the user doesn't have `tqdm` installed.	2023-04-24 21:42:42 -07:00
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-23 21:23:54 -07:00
Haste171	93d53e417a	Update unstructured_file.ipynb (#3377 ) Fix typo in docs	2023-04-23 21:22:38 -07:00
Harrison Chase	e5ffbee5eb	Harrison/hf document loader (#3394 ) Co-authored-by: Azam Iftikhar <azamiftikhar1000@gmail.com>	2023-04-23 10:17:43 -07:00
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-04-22 09:17:38 -07:00
Honkware	a5ad1c270f	Add ChatGPT Data Loader (#3336 ) This pull request adds a ChatGPT document loader to the document loaders module in `langchain/document_loaders/chatgpt.py`. Additionally, it includes an example Jupyter notebook in `docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb` which uses fake sample data based on the original structure of the `conversations.json` file. The following files were added/modified: - `langchain/document_loaders/__init__.py` - `langchain/document_loaders/chatgpt.py` - `docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb` - `docs/modules/indexes/document_loaders/examples/example_data/fake_conversations.json` This pull request was made in response to the recent release of ChatGPT data exports by email: https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history	2023-04-22 09:06:24 -07:00
Richy Wang	88a8f59aa7	Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135 ) Hi there！ I'm excited to open this PR to add support for using a fully Postgres syntax compatible database 'AnalyticDB' as a vector. As AnalyticDB has been proved can be used with AutoGPT, ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for you. AnalyticDB is a distributed Alibaba Cloud-Native vector database. It works better when data comes to large scale. The PR includes: - [x] A new memory: AnalyticDBVector - [x] A suite of integration tests verifies the AnalyticDB integration I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below - [x] make format - [x] make lint - [x] make coverage - [x] make test	2023-04-22 08:25:41 -07:00
Paul Garner	aa9d5707e0	Add PythonLoader which auto-detects encoding of Python files (#3311 ) This PR contributes a `PythonLoader`, which inherits from `TextLoader` but detects and sets the encoding automatically.	2023-04-21 10:47:57 -07:00
Daniel Chalef	1ecbeec24e	Fix example match_documents fn table name, grammar (#3294 ) ref https://github.com/hwchase17/langchain/pull/3100#issuecomment-1517086472 Co-authored-by: Daniel Chalef <daniel.chalef@private.org>	2023-04-21 10:21:23 -07:00
Davis Chase	46542dc774	Contextual compression retriever (#2915 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-20 17:01:14 -07:00
Harrison Chase	8f22949dc4	update nnotebook title	2023-04-20 11:53:23 -07:00
Harrison Chase	36c10f8a52	nits (#3203 )	2023-04-19 21:14:46 -07:00
Daniel Chalef	27cdf8d675	supabase vectorstore - first cut (#3100 ) First cut of a supabase vectorstore loosely patterned on the langchainjs equivalent. Doesn't support async operations which is a limitation of the supabase python client. --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org>	2023-04-19 21:06:44 -07:00
Harrison Chase	96809b5794	Harrison/discord loader (#3200 ) Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>	2023-04-19 21:04:12 -07:00
Pranabendra Prasad Chandra	7b1f0656b8	Fix typo in ElasticSearch sample notebook (#3171 ) Added missing parenthesis in example notebook [elasticsearch.ipynb](https://github.com/hwchase17/langchain/blob/master/docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb)	2023-04-19 16:06:31 -07:00
Harrison Chase	aad0a498ac	Harrison/output error (#3094 ) Co-authored-by: yummydum <sumita@nowcast.co.jp>	2023-04-18 08:59:56 -07:00
Harrison Chase	1c1b77bbfe	Harrison/discord (#3092 ) Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>	2023-04-18 08:19:23 -07:00
James O'Dwyer	0257829776	Bump Metal to use index_id (#3089 ) ## Use `index_id` over `app_id` We made a major update to index + retrieve based on Metal Indexes (instead of apps). With this change, we accept an index instead of an app in each of our respective core apis. [More details here](https://docs.getmetal.io/api-reference/core/indexing).	2023-04-18 07:28:13 -07:00
Hamza Kyamanywa	064a1db2b2	[Documentation] Show how to initiate pinecone from an existing index (#3070 ) ## What is this PR for: * This PR adds a commented line of code in the documentation that shows how someone can use the Pinecone client with an already existing Pinecone index * The documentation currently only shows how to create a pinecone index from langchain documents but not how to load one that already exists	2023-04-18 07:27:46 -07:00
Harrison Chase	1920536d99	Harrison/obsidian (#3060 ) Co-authored-by: Ben Hofferber <hofferber.ben@gmail.com>	2023-04-17 21:57:32 -07:00
Zander Chase	93c0514105	Add Twitter Tweet Loader (#3050 ) Reformatted version of #3022 --------- Co-authored-by: LiaoKong <568250549@qq.com>	2023-04-17 21:44:54 -07:00
Harrison Chase	5107fac656	Harrison/rec gd (#3054 ) Co-authored-by: Benjamin Scholtz <BenSchZA@users.noreply.github.com>	2023-04-17 21:02:35 -07:00
Harrison Chase	db7106cb79	Harrison/image caption loader (#3051 ) Co-authored-by: Sean Saito <saitosean@ymail.com>	2023-04-17 20:49:10 -07:00
Harrison Chase	afd3e70ae5	Harrison/confluent loader (#2994 ) Co-authored-by: Justin Flick <Justinjayflick@gmail.com>	2023-04-17 20:23:45 -07:00
Harrison Chase	f1d15b4a75	update nb	2023-04-16 22:09:31 -07:00
vowelparrot	99c0382209	Generative Characters (#2859 ) Add a time-weighted memory retriever and a notebook that approximates a Generative Agent from https://arxiv.org/pdf/2304.03442.pdf The "daily plan" components are removed for now since they are less useful without a virtual world, but the memory is an interesting component to build off. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-16 21:41:00 -07:00
Jan Backes	a9310a3e8b	Add Annoy as VectorStore (#2939 ) Adds Annoy (https://github.com/spotify/annoy) as vector Store. RESOLVES hwchase17/langchain#2842 discord ref: https://discord.com/channels/1038097195422978059/1051632794427723827/1096089994168377354 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-16 13:44:04 -07:00
Harrison Chase	88d3ce12b8	Harrison/diffbot (#2984 ) Co-authored-by: Manuel Saelices <msaelices@gmail.com>	2023-04-16 09:11:24 -07:00
Chetanya Rastogi	aead062a70	Add an example tutorial for using PDFMinerPDFasHTMLLoader (#2960 ) Last week I added the `PDFMinerPDFasHTMLLoader`. I am adding some example code in the notebook to serve as a tutorial for how that loader can be used to create snippets of a pdf that are structured within sections. All the other loaders only provide the `Document` objects segmented by pages but that's pretty loose given the amount of other metadata that can be extracted. With the new loader, one can leverage font-size of the text to decide when a new sections starts and can segment the text more semantically as shown in the tutorial notebook. The cell shows that we are able to find the content of entire section under Related Work for the example pdf which is spread across 2 pages and hence is stored as two separate documents by other loaders	2023-04-16 08:34:39 -07:00
Harrison Chase	274b25c010	SVM retriever (#2947 ) (#2949 ) Add SVM retriever class, based on https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb. Testing still WIP, but the logic is correct (I have a local implementation outside of Langchain working). --------- Co-authored-by: Lance Martin <122662504+PineappleExpress808@users.noreply.github.com> Co-authored-by: rlm <31treehaus@31s-MacBook-Pro.local>	2023-04-15 12:49:59 -07:00
Davit Buniatyan	b3a5b51728	[minor] Deep Lake auth improvements in docs, kwargs pass, faster tests (#2927 ) Minor cosmetic changes - Activeloop environment cred authentication in notebooks with `getpass.getpass` (instead of CLI which not always works) - much faster tests with Deep Lake pytest mode on - Deep Lake kwargs pass Notes - I put pytest environment creds inside `vectorstores/conftest.py`, but feel free to suggest a better location. For context, if I put in `test_deeplake.py`, `ruff` doesn't let me to set them before import deeplake --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-15 10:49:16 -07:00
Akash NP	13a0ed064b	add encoding to avoid UnicodeDecodeError (#2908 ) About Specify encoding to avoid UnicodeDecodeError when reading .txt for users who are following the tutorial. Reference ``` return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1205: character maps to <undefined> ``` Environment OS: Win 11 Python: 3.8	2023-04-14 16:36:03 -07:00
Kwuang Tang	a508afa91c	Add file filter param to Git loader (#2904 ) Allows users to specify what files should be loaded instead of indiscriminately loading the entire repo. extends #2851 NOTE: for reviewers, `hide whitespace` option recommended since I changed the indentation of an if-block to use `continue` instead so it looks less like a Christmas tree :)	2023-04-14 10:45:54 -07:00
Harrison Chase	8fef69296d	nits (#2873 )	2023-04-14 07:55:12 -07:00
Harrison Chase	07d7096de6	Harrison/playwright (#2871 ) Co-authored-by: Manuel Saelices <msaelices@gmail.com>	2023-04-13 22:15:03 -07:00
ecneladis	74abeb8c53	Update output in Git notebook (#2868 ) Supplemental to https://github.com/hwchase17/langchain/pull/2851. Updates one notebook cell that I forgot to commit before.	2023-04-13 21:56:17 -07:00
ecneladis	016738e676	Add GitLoader (#2851 )	2023-04-13 21:39:20 -07:00
vowelparrot	bf0887c486	Add Slack Directory Loader (#2841 ) Fixes linting issue from #2835 Adds a loader for Slack Exports which can be a very valuable source of knowledge to use for internal QA bots and other use cases. ```py # Export data from your Slack Workspace first. from langchain.document_loaders import SLackDirectoryLoader SLACK_WORKSPACE_URL = "https://awesome.slack.com" loader = ("Slack_Exports", SLACK_WORKSPACE_URL) docs = loader.load() ```	2023-04-13 21:31:59 -07:00
Tim Asp	53dc157145	[Docs] minor fixes to loaders links and rst warnings (#2846 ) The doc loaders index was picking up a bunch of subheadings because I mistakenly made the MD titles H1s. Fixed that. also the easy minor warnings from docs_build	2023-04-13 10:54:40 -07:00
Rounak Datta	7688bf9182	WhatsApp document loader - update regex (#2776 ) I was testing out the WhatsApp Document loader, and noticed that sometimes the date is of the following format (notice the additional underscore): ``` 3/24/23, 1:54_PM - +91 99999 99999 joined using this group's invite link 3/24/23, 6:29_PM - +91 99999 99999: When are we starting then? ``` Wierdly, the underscore is visible in Vim, but not on editors like VSCode. I presume it is some unusual character/line terminator. Nevertheless, I think handling this edge case will make the document loader more robust.	2023-04-13 09:48:32 -07:00
vowelparrot	2db9b7a45d	Revert "Add Slack Directory Loader (#2835 )" (#2839 ) This reverts commit `a6f767ae7a`. To fix the linting error.	2023-04-13 09:42:54 -07:00
vowelparrot	a6f767ae7a	Add Slack Directory Loader (#2835 ) Adds a loader for Slack Exports which can be a very valuable source of knowledge to use for internal QA bots and other use cases. ```py # Export data from your Slack Workspace first. from langchain.document_loaders import SLackDirectoryLoader SLACK_WORKSPACE_URL = "https://awesome.slack.com" loader = ("Slack_Exports", SLACK_WORKSPACE_URL) docs = loader.load() ``` --------- Co-authored-by: Mikhail Dubov <mikhail@chattermill.io>	2023-04-13 08:39:07 -07:00
Harrison Chase	507cee5ee5	Harrison/pinecone hybrid update (#2742 ) Co-authored-by: acatav <39461369+acatav@users.noreply.github.com> Co-authored-by: Amnon Catav <catav.amnon1@gmail.com>	2023-04-11 21:32:17 -07:00
vowelparrot	709f26b69e	Added bilibili loader (#2673 ) (#2724 ) I've added a bilibili loader, bilibili is a very active video site in China and I think we need this loader. Example: ```python from langchain.document_loaders.bilibili import BiliBiliLoader loader = BiliBiliLoader( ["https://www.bilibili.com/video/BV1xt411o7Xu/", "https://www.bilibili.com/video/av330407025/"] ) docs = loader.load() ``` Co-authored-by: 了空 <568250549@qq.com>	2023-04-11 10:40:32 -07:00
Naveen Tatikonda	4364d3316e	Add custom vector fields and text fields for OpenSearch (#2652 ) Description Add custom vector field name and text field name while indexing and querying for OpenSearch Issues https://github.com/hwchase17/langchain/issues/2500 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-10 21:02:02 -07:00
Harrison Chase	ad3c5dd186	Harrison/databerry (#2688 ) Co-authored-by: Georges Petrov <georgesm.petrov@gmail.com>	2023-04-10 18:49:47 -07:00
Chetanya Rastogi	50c511d75f	Add new loader to load pdf as html content (#2607 ) Adds a new pdf loader using the existing dependency on PDFMiner. The new loader can be helpful for chunking texts semantically into sections as the output html content can be parsed via `BeautifulSoup` to get more structured and rich information about font size, page numbers, pdf headers/footers, etc. which may not be available otherwise with other pdf loaders	2023-04-09 17:57:25 -07:00
Harrison Chase	7aba18ea77	Harrison/docs cleanup (#2633 )	2023-04-09 12:55:22 -07:00
Davit Buniatyan	aaac7071a3	Deep Lake retriever example analyzing Twitter the-algorithm source code (#2602 ) Improvements to Deep Lake Vector Store - much faster view loading of embeddings after filters with `fetch_chunks=True` - 2x faster ingestion - use np.float32 for embeddings to save 2x storage, LZ4 compression for text and metadata storage (saves up to 4x storage for text data) - user defined functions as filters Docs - Added retriever full example for analyzing twitter the-algorithm source code with GPT4 - Added a use case for code analysis (please let us know your thoughts how we can improve it) --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-09 12:29:47 -07:00
Harrison Chase	a31c9511e8	Harrison/redis improvements (#2528 ) Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>	2023-04-06 23:21:22 -07:00
Harrison Chase	5c64b86ba3	Harrison/weaviate retriever (#2524 ) Co-authored-by: Erika Cardenas <110841617+erika-cardenas@users.noreply.github.com>	2023-04-06 22:27:37 -07:00
Timon Ruban	f0926bad9f	Fix docstring in indexes/getting-started (#2452 ) Fixed a letter. That's all.	2023-04-06 12:48:08 -07:00
Davit Buniatyan	b4914888a7	Deep Lake upgrade to include attribute search, distance metrics, returning scores and MMR (#2455 ) ### Features include - Metadata based embedding search - Choice of distance metric function (`L2` for Euclidean, `L1` for Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot' for dot product. Defaults to `L2` - Returning scores - Max Marginal Relevance Search - Deleting samples from the dataset ### Notes - Added numerous tests, let me know if you would like to shorten them or make smarter --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-06 12:47:33 -07:00
Sam Weaver	2ffb90b161	Extend opensearch to better support existing instances (#2500 ) (#2509 ) Closes #2500.	2023-04-06 12:45:56 -07:00
Harrison Chase	00bc8df640	Harrison/tfidf retriever (#2440 )	2023-04-05 07:36:49 -07:00
Harrison Chase	af7f20fa42	Harrison/elastic search (#2419 )	2023-04-04 21:29:06 -07:00
Harrison Chase	41832042cc	Harrison/pinecone hybrid (#2405 )	2023-04-04 14:09:57 -07:00
Harrison Chase	2b975de94d	add metal retriever (#2244 )	2023-04-04 12:17:13 -07:00
Harrison Chase	e90d007db3	Harrison/msg files (#2375 ) Co-authored-by: Sahil Masand <masand.sahil@gmail.com> Co-authored-by: Sahil Masand <masands@cbh.com.au>	2023-04-04 06:48:34 -07:00
Kacper Łukawski	585f60a5aa	Qdrant update to 1.1.1 & docs polishing (#2388 ) This PR updates Qdrant to 1.1.1 and introduces local mode, so there is no need to spin up the Qdrant server. By that occasion, the Qdrant example notebooks also got updated, covering more cases and answering some commonly asked questions. All the Qdrant's integration tests were switched to local mode, so no Docker container is required to launch them.	2023-04-04 06:48:21 -07:00

1 2 3 4 5

210 Commits