Limitations:
1) Context is not restored for gpt-j models
2) When you switch between different model types in an existing chat
the context and all the conversation is lost
3) The settings are not chat or conversation specific
4) The sizes of the chat persisted files are very large due to how much
data the llama.cpp backend tries to persist. Need to investigate how
we can shrink this.