langchain/libs/community/tests/unit_tests
Ian Gregory e5472b5eb8
Framework for supporting more languages in LanguageParser (#13318)
## Description

I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.

Initial context is in the issue we opened (#11229).

This pull request adds:

- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
  - C
  - C++
  - C#
  - Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
  - Kotlin
  - Lua
  - Perl
  - Ruby
  - Rust
  - Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)

Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.

## Issues

- Closes #11229
- Closes #10996
- Closes #8405

## Dependencies

`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.

## Documentation

We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.

## Maintainer

- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)

Thanks!!

## Git commits

We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-13 08:45:49 -08:00
..
agent_toolkits community[minor]: Connery Tool and Toolkit (#14506) 2024-01-29 12:45:03 -08:00
callbacks community[patch]: Release 0.0.14 (#16384) 2024-01-22 08:50:19 -08:00
chat_loaders Restore self message sent before OSX 12 Monterey (#14818) 2024-01-01 16:04:14 -08:00
chat_message_histories community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
chat_models infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
docstore community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
document_loaders Framework for supporting more languages in LanguageParser (#13318) 2024-02-13 08:45:49 -08:00
document_transformers community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
embeddings infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
examples community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
graphs community[minor], langchian[minor]: Add Neptune Rdf graph and chain (#16650) 2024-02-12 21:30:20 -08:00
indexes community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
llms infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
retrievers community: Add you.com utility, update you retriever integration docs (#17014) 2024-02-08 13:47:50 -08:00
storage community: revert SQL Stores (#16912) 2024-02-01 16:37:40 -08:00
tools community[minor]: Connery Tool and Toolkit (#14506) 2024-01-29 12:45:03 -08:00
utilities community: Add you.com utility, update you retriever integration docs (#17014) 2024-02-08 13:47:50 -08:00
utils community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
vectorstores community: add delete method to rocksetdb vectorstore to support recordmanager (#17030) 2024-02-12 19:50:20 -08:00
__init__.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
conftest.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_dependencies.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_imports.py infra: Fix test filesystem paths incompatible with windows (#14388) 2023-12-21 13:45:42 -08:00
test_sql_database_schema.py infra: mv SQLDatabase tests to community (#17276) 2024-02-08 17:05:43 -08:00
test_sql_database.py community/SQLDatabase: Generalize and trim software tests (#16659) 2024-02-12 22:58:34 -05:00
test_sqlalchemy.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00