* Talk-codebase is a tool that allows you to converse with your codebase using LLMs (Large Language Models) to answer your queries.
* It supports offline code processing using [GPT4All](https://github.com/nomic-ai/gpt4all) without sharing your code with third parties, or you can use OpenAI if privacy is not a concern for you.
* Talk-codebase is still under development, but it is a tool that can help you to improve your code. It is only recommended for educational purposes and not for production use.
Additionally, if you want to use the GPT4All model, you need to download
the [ggml-gpt4all-j-v1.3-groovy.bin](https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin) model. If you prefer a
different model, you can download it from [GPT4All](https://gpt4all.io) and configure path to it in the configuration
and specify its
path in the configuration. If you want some files to be ignored, add them to .gitignore.
To install talk-codebase, you need to have:
To install `talk-codebase`, run the following command in your terminal:
* Python 3.9
* An OpenAI API [api-keys](https://platform.openai.com/account/api-keys)
* (Optional) [GPT4All](https://gpt4all.io) model
```bash
# Install talk-codebase
pip install talk-codebase
```
Once `talk-codebase` is installed, you can use it to chat with your codebase by running the following command:
# Configure talk-codebase
talk-codebase configure
```bash
talk-codebase chat <path-to-your-codebase>
# If you want some files to be ignored, add them to .gitignore.
# Once `talk-codebase` is installed, you can use it to chat with your codebase in the current directory by running the following command:
talk-codebase chat .
```
If you need to configure or edit the configuration, you can run:
```bash
talk-codebase configure
```
## Advanced configuration
You can also edit the configuration manually by editing the `~/.config.yaml` file.
If for some reason you cannot find the configuration file, just run the tool and at the very beginning it will output
@ -48,24 +38,22 @@ the path to the configuration file.
```yaml
# The OpenAI API key. You can get it from https://beta.openai.com/account/api-keys
api_key: sk-xxx
# maximum overlap between chunks. It can be nice to have some overlap to maintain some continuity between chunks
chunk_overlap: '50'
# maximum size of a chunk
chunk_size: '500'
# number of samples to generate for each prompt.
k: '4'
# maximum tokens for the LLMs
max_tokens: '1048'
# token limit for the LLM model only OpenAI
# Configuration for chunking
chunk_overlap: 50
chunk_size: 500
# Configuration for sampling
k: 4
max_tokens: 1048
# Configuration for the LLM model
model_name: gpt-3.5-turbo
# path to the llm file on disk.
model_path: models/ggml-gpt4all-j-v1.3-groovy.bin
# type of the LLM model. It can be either local or openai
model_type: openai
```
## The supported extensions:
## Supports the following extensions:
- [x] `.csv`
- [x] `.doc`
@ -78,4 +66,6 @@ model_type: openai
## Contributing
Contributions are always welcome!
* If you find a bug in talk-codebase, please report it on the project's issue tracker. When reporting a bug, please include as much information as possible, such as the steps to reproduce the bug, the expected behavior, and the actual behavior.
* If you have an idea for a new feature for Talk-codebase, please open an issue on the project's issue tracker. When suggesting a feature, please include a brief description of the feature, as well as any rationale for why the feature would be useful.
* You can contribute to talk-codebase by writing code. The project is always looking for help with improving the codebase, adding new features, and fixing bugs.