langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-02 09:40:22 +00:00

Author	SHA1	Message	Date
Leonid Ganeline	95dc90609e	experimental[patch]: `prompts` import fix (#20534 ) Replaced `from langchain.prompts` with `from langchain_core.prompts` where it is appropriate. Most of the changes go to `langchain_experimental` Similar to #20348	2024-04-18 16:09:11 -04:00
ccurme	38faa74c23	community[patch]: update use of deprecated llm methods (#20393 ) .predict and .predict_messages for BaseLanguageModel and BaseChatModel	2024-04-12 17:28:23 -04:00
Leonid Ganeline	e512d3c6a6	langchain: `callbacks` imports fix (#20348 ) Replaced all `from langchain.callbacks` into `from langchain_core.callbacks` . Changes in the `langchain` and `langchain_experimental` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-04-12 20:13:14 +00:00
Tomaz Bratanic	a1b105ac00	experimental[patch]: Skip pydantic validation for llm graph transformer and fix JSON response where possible (#19915 ) LLMs might sometimes return invalid response for LLM graph transformer. Instead of failing due to pydantic validation, we skip it and manually check and optionally fix error where we can, so that more information gets extracted	2024-04-12 11:29:25 -07:00
Bagatur	2d83505be9	experimental[patch]: Release 0.0.57 (#20243 )	2024-04-09 17:08:01 -05:00
Erick Friis	f0d5b59962	core[patch]: remove requests (#19891 ) Removes required usage of `requests` from `langchain-core`, all of which has been deprecated. - removes Tracer V1 implementations - removes old `try_load_from_hub` github-based hub implementations Removal done in a way where imports will still succeed, and usage will fail with a `RuntimeError`.	2024-04-02 20:28:10 +00:00
Bagatur	003c98e5b4	experimental[patch]: Release 0.0.56 (#19840 )	2024-03-31 22:00:59 -07:00
LunarECL	b7d180a70d	experimental[minor]: Create Closed Captioning Chain for .mp4 videos (#14059 ) Description: Video imagery to text (Closed Captioning) This pull request introduces the VideoCaptioningChain, a tool for automated video captioning. It processes audio and video to generate subtitles and closed captions, merging them into a single SRT output. Issue: https://github.com/langchain-ai/langchain/issues/11770 Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers, pillow, torch, openai Tag maintainer: @baskaryan @hwchase17 Hello!  We are a group of students from the University of Toronto (@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a contribution to the LangChain community! We have ran make format, make lint and make test locally before submitting the PR. To our knowledge, our changes do not introduce any new errors. Thank you for taking the time to review our PR! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-30 01:57:53 +00:00
Kirushikesh DB	12861273e1	experimental[patch]: Removed 'SQLResults:' from the LLMResponse in SQLDatabaseChain (#17104 ) Description: When using the SQLDatabaseChain with Llama2-70b LLM and, SQLite database. I was getting `Warning: You can only execute one statement at a time.`. ``` from langchain.sql_database import SQLDatabase from langchain_experimental.sql import SQLDatabaseChain sql_database_path = '/dccstor/mmdataretrieval/mm_dataset/swimming_record/rag_data/swimmingdataset.db' sql_db = get_database(sql_database_path) db_chain = SQLDatabaseChain.from_llm(mistral, sql_db, verbose=True, callbacks = [callback_obj]) db_chain.invoke({ "query": "What is the best time of Lance Larson in men's 100 meter butterfly competition?" }) ``` Error: ``` Warning Traceback (most recent call last) Cell In[31], line 3 1 import langchain 2 langchain.debug=False ----> 3 db_chain.invoke({ 4 "query": "What is the best time of Lance Larson in men's 100 meter butterfly competition?" 5 }) File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain/chains/base.py:162, in Chain.invoke(self, input, config, kwargs) 160 except BaseException as e: 161 run_manager.on_chain_error(e) --> 162 raise e 163 run_manager.on_chain_end(outputs) 164 final_outputs: Dict[str, Any] = self.prep_outputs( 165 inputs, outputs, return_only_outputs 166 ) File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain/chains/base.py:156, in Chain.invoke(self, input, config, kwargs) 149 run_manager = callback_manager.on_chain_start( 150 dumpd(self), 151 inputs, 152 name=run_name, 153 ) 154 try: 155 outputs = ( --> 156 self._call(inputs, run_manager=run_manager) 157 if new_arg_supported 158 else self._call(inputs) 159 ) 160 except BaseException as e: 161 run_manager.on_chain_error(e) File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_experimental/sql/base.py:198, in SQLDatabaseChain._call(self, inputs, run_manager) 194 except Exception as exc: 195 # Append intermediate steps to exception, to aid in logging and later 196 # improvement of few shot prompt seeds 197 exc.intermediate_steps = intermediate_steps # type: ignore --> 198 raise exc File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_experimental/sql/base.py:143, in SQLDatabaseChain._call(self, inputs, run_manager) 139 intermediate_steps.append( 140 sql_cmd 141 ) # output: sql generation (no checker) 142 intermediate_steps.append({"sql_cmd": sql_cmd}) # input: sql exec --> 143 result = self.database.run(sql_cmd) 144 intermediate_steps.append(str(result)) # output: sql exec 145 else: File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_community/utilities/sql_database.py:436, in SQLDatabase.run(self, command, fetch, include_columns) 425 def run( 426 self, 427 command: str, 428 fetch: Literal["all", "one"] = "all", 429 include_columns: bool = False, 430 ) -> str: 431 """Execute a SQL command and return a string representing the results. 432 433 If the statement returns rows, a string of the results is returned. 434 If the statement returns no rows, an empty string is returned. 435 """ --> 436 result = self._execute(command, fetch) 438 res = [ 439 { 440 column: truncate_word(value, length=self._max_string_length) (...) 443 for r in result 444 ] 446 if not include_columns: File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_community/utilities/sql_database.py:413, in SQLDatabase._execute(self, command, fetch) 410 elif self.dialect == "postgresql": # postgresql 411 connection.exec_driver_sql("SET search_path TO %s", (self._schema,)) --> 413 cursor = connection.execute(text(command)) 414 if cursor.returns_rows: 415 if fetch == "all": File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1416, in Connection.execute(self, statement, parameters, execution_options) 1414 raise exc.ObjectNotExecutableError(statement) from err 1415 else: -> 1416 return meth( 1417 self, 1418 distilled_parameters, 1419 execution_options or NO_OPTIONS, 1420 ) File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/sql/elements.py:516, in ClauseElement._execute_on_connection(self, connection, distilled_params, execution_options) 514 if TYPE_CHECKING: 515 assert isinstance(self, Executable) --> 516 return connection._execute_clauseelement( 517 self, distilled_params, execution_options 518 ) 519 else: 520 raise exc.ObjectNotExecutableError(self) File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1639, in Connection._execute_clauseelement(self, elem, distilled_parameters, execution_options) 1627 compiled_cache: Optional[CompiledCacheType] = execution_options.get( 1628 "compiled_cache", self.engine._compiled_cache 1629 ) 1631 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache( 1632 dialect=dialect, 1633 compiled_cache=compiled_cache, (...) 1637 linting=self.dialect.compiler_linting \| compiler.WARN_LINTING, 1638 ) -> 1639 ret = self._execute_context( 1640 dialect, 1641 dialect.execution_ctx_cls._init_compiled, 1642 compiled_sql, 1643 distilled_parameters, 1644 execution_options, 1645 compiled_sql, 1646 distilled_parameters, 1647 elem, 1648 extracted_params, 1649 cache_hit=cache_hit, 1650 ) 1651 if has_events: 1652 self.dispatch.after_execute( 1653 self, 1654 elem, (...) 1658 ret, 1659 ) File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1848, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, args, kw) 1843 return self._exec_insertmany_context( 1844 dialect, 1845 context, 1846 ) 1847 else: -> 1848 return self._exec_single_context( 1849 dialect, context, statement, parameters 1850 ) File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1988, in Connection._exec_single_context(self, dialect, context, statement, parameters) 1985 result = context._setup_result_proxy() 1987 except BaseException as e: -> 1988 self._handle_dbapi_exception( 1989 e, str_statement, effective_parameters, cursor, context 1990 ) 1992 return result File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:2346, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec) 2344 else: 2345 assert exc_info[1] is not None -> 2346 raise exc_info[1].with_traceback(exc_info[2]) 2347 finally: 2348 del self._reentrant_error File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1969, in Connection._exec_single_context(self, dialect, context, statement, parameters) 1967 break 1968 if not evt_handled: -> 1969 self.dialect.do_execute( 1970 cursor, str_statement, effective_parameters, context 1971 ) 1973 if self._has_events or self.engine._has_events: 1974 self.dispatch.after_cursor_execute( 1975 self, 1976 cursor, (...) 1980 context.executemany, 1981 ) File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/default.py:922, in DefaultDialect.do_execute(self, cursor, statement, parameters, context) 921 def do_execute(self, cursor, statement, parameters, context=None): --> 922 cursor.execute(statement, parameters) Warning: You can only execute one statement at a time. ``` Issue:* The Error occurs because when generating the SQLQuery, the llm_input includes the stop character of "\nSQLResult:", so for this user query the LLM generated response is SELECT Time FROM men_butterfly_100m WHERE Swimmer = 'Lance Larson';\nSQLResult: it is required to remove the SQLResult suffix on the llm response before executing it on the database. ``` llm_inputs = { "input": input_text, "top_k": str(self.top_k), "dialect": self.database.dialect, "table_info": table_info, "stop": ["\nSQLResult:"], } sql_cmd = self.llm_chain.predict( callbacks=_run_manager.get_child(), llm_inputs, ).strip() if SQL_RESULT in sql_cmd: sql_cmd = sql_cmd.split(SQL_RESULT)[0].strip() result = self.database.run(sql_cmd) ``` <!-- Thank you for contributing to LangChain! Please title your PR "<package>: <description>", where <package> is whichever of langchain, community, core, experimental, etc. is being modified. Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes if applicable, - Dependencies: any dependencies required for this change, - Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 01:22:35 -07:00
T Cramer	540ebf35a9	community[patch]: Add explicit error message to Bedrock error output. (#17328 ) - Description: Propagate Bedrock errors into Langchain explicitly. Use-case: unset region error is hidden behind 'Could not load credentials...' message - Issue: [17654](https://github.com/langchain-ai/langchain/issues/17654) - Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 03:07:33 +00:00
Luca Dorigo	f19229c564	core[patch]: fix beta, deprecated typing (#18877 ) Description: While not technically incorrect, the TypeVar used for the `@beta` decorator prevented pyright (and thus most vscode users) from correctly seeing the types of functions/classes decorated with `@beta`. This is in part due to a small bug in pyright (https://github.com/microsoft/pyright/issues/7448 ) - however, the `Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not doing anything - classes are `Callables` by default, so by my understanding binding to `Type` does not actually provide any more safety - the modified annotation still works correctly for both functions, properties, and classes. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 22:33:43 +00:00
Tomaz Bratanic	b04e663426	experimental[patch]: Flatten relationships in LLM graph transformer (#19642 )	2024-03-27 19:35:34 -07:00
Juan Jose Miguel Ovalle Villamil	1fe10a3e3d	experimental[patch]: Enhance LLMGraphTransformer with async processing and improved readability (#19205 ) - [x] PR title: "experimental: Enhance LLMGraphTransformer with async processing and improved readability" - [x] PR message: - Description: This pull request refactors the `process_response` and `convert_to_graph_documents` methods in the LLMGraphTransformer class to improve code readability and adds async versions of these methods for concurrent processing. The main changes include: - Simplifying list comprehensions and conditional logic in the process_response method for better readability. - Adding async versions aprocess_response and aconvert_to_graph_documents to enable concurrent processing of documents. These enhancements aim to improve the overall efficiency and maintainability of the `LLMGraphTransformer` class. - Issue: N/A - Dependencies: No additional dependencies required. - Twitter handle: @jjovalle99 - [x] Add tests and docs: N/A (This PR does not introduce a new integration) - [x] Lint and test: Ran make format, make lint, and make test from the root of the modified package(s). All tests pass successfully. Additional notes: - The changes made in this PR are backwards compatible and do not introduce any breaking changes. - The PR touches only the `LLMGraphTransformer` class within the experimental package. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 23:40:21 -07:00
Leonid Ganeline	3dc0f3c371	experimental[patch]: `PromptTemplate` import fix (#19617 ) Changed import of `PromptTemplate` from `langchain` to `langchain_core` in `langchain_experimental`	2024-03-26 17:03:13 -07:00
Leonid Ganeline	4159a4723c	experimental[patch]: update module doc strings (#19539 ) Added missed module descriptions. Fixed format.	2024-03-26 10:38:10 -04:00
Bagatur	3fa711dce0	experimental[patch]: Release 0.0.55 (#19353 )	2024-03-20 13:06:39 -07:00
Zihong	ff31cc1648	experimental: update the notebook link of semantic chunk. (#19253 ) update the notebook link of semantic chunk.	2024-03-19 07:24:51 -04:00
Cycle	77868b1974	experimental: add buffer_size hyperparameter to SemanticChunker as in source video (#19208 ) add buffer_size hyperparameter which used in combine_sentences function	2024-03-19 03:54:20 +00:00
Erick Friis	781aee0068	community, langchain, infra: revert store extended test deps outside of poetry (#19153 ) Reverts langchain-ai/langchain#18995 Because it makes installing dependencies in python 3.11 extended testing take 80 minutes	2024-03-15 17:10:47 +00:00
Erick Friis	9e569d85a4	community, langchain, infra: store extended test deps outside of poetry (#18995 ) poetry can't reliably handle resolving the number of optional "extended test" dependencies we have. If we instead just rely on pip to install extended test deps in CI, this isn't an issue.	2024-03-15 05:55:30 +00:00
Erick Friis	2ffb2144a6	experimental[patch]: release 0.0.54 (#19000 )	2024-03-13 00:38:46 +00:00
Tomaz Bratanic	cda43c5a11	experimental[patch]: Fix LLM graph transformer default prompt (#18856 ) Some LLMs do not allow multiple user messages in sequence.	2024-03-11 20:11:52 -07:00
Tomaz Bratanic	246724faab	LLM graph transformer prompt engineering (#18843 ) A bit of prompt engineering to improve results	2024-03-09 11:27:16 -08:00
Alexander Dicke	66576948e0	experimental[minor]: adds mixtral wrapper (#17423 ) Description: Adds a chat wrapper for Mixtral models using the [prompt template](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1#instruction-format). --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 17:14:23 -08:00
Tomaz Bratanic	c8c592d3f1	experimental[minor]: Add LLM graph transformer (#18733 ) Add a class that constructs knowledge graphs based on text using an LLM.	2024-03-07 20:52:53 -08:00
Tomaz Bratanic	010a234f1e	docs: Fix diffbot graph transformer description (#18736 ) The previous docstring was invalid	2024-03-07 19:25:41 -08:00
Massimiliano Pronesti	3b975c6ebe	experimental[minor]: add support for modin in pandas agent (#18749 ) Added support for Intel's [modin](https://github.com/modin-project/modin) in `create_pandas_dataframe_agent`.	2024-03-07 19:23:07 -08:00
Erick Friis	4ac2cb4adc	anthropic[minor]: add tool calling (#18554 )	2024-03-05 08:30:16 -08:00
Bagatur	5efb5c099f	text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346 )	2024-02-29 18:33:21 -08:00
Bagatur	68ad3414a2	experimental[patch]: Release 0.0.53 (#18330 )	2024-02-29 09:13:21 -08:00
matt haigh	a4896da2a0	Experimental: Add other threshold types to SemanticChunker (#16807 ) Description Adding different threshold types to the semantic chunker. I’ve had much better and predictable performance when using standard deviations instead of percentiles. ![image](https://github.com/langchain-ai/langchain/assets/44395485/066e84a8-460e-4da5-9fa1-4ff79a1941c5) For all the documents I’ve tried, the distribution of distances look similar to the above: positively skewed normal distribution. All skews I’ve seen are less than 1 so that explains why standard deviations perform well, but I’ve included IQR if anyone wants something more robust. Also, using the percentile method backwards, you can declare the number of clusters and use semantic chunking to get an ‘optimal’ splitting. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-02-26 13:50:48 -08:00
Leonid Ganeline	3f6bf852ea	experimental: docstrings update (#18048 ) Added missed docstrings. Formatted docsctrings to the consistent format.	2024-02-23 21:24:16 -05:00
Erick Friis	ed789be8f4	docs, templates: update schema imports to core (#17885 ) - chat models, messages - documents - agentaction/finish - baseretriever,document - stroutputparser - more messages - basemessage - format_document - baseoutputparser --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-02-22 15:58:44 -08:00
Bagatur	5ed16adbde	experimental[patch]: Release 0.0.52 (#17763 )	2024-02-19 13:12:22 -08:00
Pranav Agarwal	86ae48b781	experimental[minor]: Amazon Personalize support (#17436 ) ## Amazon Personalize support on Langchain This PR is a successor to this PR - https://github.com/langchain-ai/langchain/pull/13216 This PR introduces an integration with [Amazon Personalize](https://aws.amazon.com/personalize/) to help you to retrieve recommendations and use them in your natural language applications. This integration provides two new components: 1. An `AmazonPersonalize` client, that provides a wrapper around the Amazon Personalize API. 2. An `AmazonPersonalizeChain`, that provides a chain to pull in recommendations using the client, and then generating the response in natural language. We have added this to langchain_experimental since there was feedback from the previous PR about having this support in experimental rather than the core or community extensions. Here is some sample code to explain the usage. ```python from langchain_experimental.recommenders import AmazonPersonalize from langchain_experimental.recommenders import AmazonPersonalizeChain from langchain.llms.bedrock import Bedrock recommender_arn = "<insert_arn>" client=AmazonPersonalize( credentials_profile_name="default", region_name="us-west-2", recommender_arn=recommender_arn ) bedrock_llm = Bedrock( model_id="anthropic.claude-v2", region_name="us-west-2" ) chain = AmazonPersonalizeChain.from_llm( llm=bedrock_llm, client=client ) response = chain({'user_id': '1'}) ``` Reviewer: @3coins	2024-02-19 10:36:37 -08:00
William FH	64743dea14	core[patch], community[patch], langchain[patch], experimental[patch], robocorp[patch]: bump LangSmith 0.1.* (#17567 )	2024-02-15 23:17:59 -07:00
Mattt394	7c6009b76f	experimental[patch]: Fixed typos in SmartLLMChain ideation and critique prompts (#11507 ) Noticed and fixed a few typos in the SmartLLMChain default ideation and critique prompts --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-02-14 13:20:10 -08:00
DanisJiang	de9a6cdf16	experimental[patch]: Enhance protection against arbitrary code execution in PALChain (#17091 ) - Description: Block some ways to trigger arbitrary code execution bug in PALChain. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-02-14 11:44:07 -08:00
Bagatur	b5d3416563	experimental[patch]: Release 0.0.51 (#17484 )	2024-02-13 13:14:38 -08:00
Bagatur	c0ce93236a	experimental[patch]: fix zero-shot pandas agent (#17442 )	2024-02-12 21:58:35 -08:00
Theo / Taeyoon Kang	1987f905ed	core[patch]: Support .yml extension for YAML (#16783 ) - Description: [AS-IS] When dealing with a yaml file, the extension must be .yaml. [TO-BE] In the absence of extension length constraints in the OS, the extension of the YAML file is yaml, but control over the yml extension must still be made. It's as if it's an error because it's a .jpg extension in jpeg support. - Issue: - - Dependencies: no dependencies required for this change,	2024-02-12 19:57:20 -08:00
Erick Friis	3a2eb6e12b	infra: add print rule to ruff (#16221 ) Added noqa for existing prints. Can slowly remove / will prevent more being intro'd	2024-02-09 16:13:30 -08:00
Charlie Marsh	24c0bab57b	infra, multiple: Upgrade configuration for Ruff v0.2.0 (#16905 ) ## Summary This PR upgrades LangChain's Ruff configuration in preparation for Ruff's v0.2.0 release. (The changes are compatible with Ruff v0.1.5, which LangChain uses today.) Specifically, we're now warning when linter-only options are specified under `[tool.ruff]` instead of `[tool.ruff.lint]`. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-02-09 14:28:02 -08:00
Eugene Yurtsev	780e84ae79	community[minor]: SQLDatabase Add fetch mode `cursor`, query parameters, query by selectable, expose execution options, and documentation (#17191 ) - Description: Improve `SQLDatabase` adapter component to promote code re-use, see [suggestion](https://github.com/langchain-ai/langchain/pull/16246#pullrequestreview-1846590962). - Needed by: GH-16246 - Addressed to: @baskaryan, @cbornet ## Details - Add `cursor` fetch mode - Accept SQL query parameters - Accept both `str` and SQLAlchemy selectables as query expression - Expose `execution_options` - Documentation page (notebook) about `SQLDatabase` [^1] See [About SQLDatabase](https://github.com/langchain-ai/langchain/blob/c1c7b763/docs/docs/integrations/tools/sql_database.ipynb). [^1]: Apparently there hasn't been any yet? --------- Co-authored-by: Andreas Motl <andreas.motl@crate.io>	2024-02-07 22:23:43 -05:00
Erick Friis	22b6a03a28	infra: read min versions (#17135 )	2024-02-06 16:05:11 -08:00
Leonid Ganeline	563f325034	experimental[patch]: fixed import in `experimental` (#17078 )	2024-02-05 17:47:13 -08:00
Giulio Zani	9f0b63dba0	experimental[patch]: Fixes issue #17060 (#17062 ) As described in issue #17060, in the case in which text has only one sentence the following function fails. Checking for that and adding a return case fixed the issue. ```python def split_text(self, text: str) -> List[str]: """Split text into multiple components.""" # Splitting the essay on '.', '?', and '!' single_sentences_list = re.split(r"(?<=[.?!])\s+", text) sentences = [ {"sentence": x, "index": i} for i, x in enumerate(single_sentences_list) ] sentences = combine_sentences(sentences) embeddings = self.embeddings.embed_documents( [x["combined_sentence"] for x in sentences] ) for i, sentence in enumerate(sentences): sentence["combined_sentence_embedding"] = embeddings[i] distances, sentences = calculate_cosine_distances(sentences) start_index = 0 # Create a list to hold the grouped sentences chunks = [] breakpoint_percentile_threshold = 95 breakpoint_distance_threshold = np.percentile( distances, breakpoint_percentile_threshold ) # If you want more chunks, lower the percentile cutoff indices_above_thresh = [ i for i, x in enumerate(distances) if x > breakpoint_distance_threshold ] # The indices of those breakpoints on your list # Iterate through the breakpoints to slice the sentences for index in indices_above_thresh: # The end index is the current breakpoint end_index = index # Slice the sentence_dicts from the current start index to the end index group = sentences[start_index : end_index + 1] combined_text = " ".join([d["sentence"] for d in group]) chunks.append(combined_text) # Update the start index for the next group start_index = index + 1 # The last group, if any sentences remain if start_index < len(sentences): combined_text = " ".join([d["sentence"] for d in sentences[start_index:]]) chunks.append(combined_text) return chunks ``` Co-authored-by: Giulio Zani <salamanderxing@Giulios-MBP.homenet.telecomitalia.it>	2024-02-05 16:18:57 -08:00
Bagatur	7d03d8f586	docs: fix docstring examples (#16889 )	2024-02-01 10:17:26 -08:00
Bagatur	c2d09fb151	infra: bump exp min test reqs (#16884 )	2024-02-01 08:35:21 -08:00
Bagatur	65ba5c220b	experimental[patch]: Release 0.0.50 (#16883 )	2024-02-01 08:27:39 -08:00

1 2 3 4 5

247 Commits