AI models are not good at processing a whole repository yet. There are a few reasons for this:

  • First, they have a context window (=the maximum amount of text they can process at once), which is often times smaller than the repository. The chat and assistant have a limit of 20 files to increase the chance that the files fit into the context window. Oftentimes, a repository contains more than 20 files.

  • To handle documents or document batches that are larger than the context window, we have built the knowledge folder. Since the context window is a technical limitation of the model, not everything can be sent to the model. An embedding search, so a semantic pre-selection, identifies relevant sections of the documents and only these sections are sent to the model. For coding, it is important to consider the entire document, not only selected sections. Therefore, the context window also limits this behavior

  • Lastly, even if the repository fits into the context window, the model might still struggle to understand it if it is a large repository, since the answers decrease in quality with a fuller context window.

In our experience, the best approach is to work with individual files, smaller sections or screenshots only. Sorry - that was a long message, but I hope it helps