Why is there a limit of 20 files in the chat and in assistants?

Large Language models have a context window, which is the maximum amount of text they can process at once. This includes the prompt you send to the model, the previous chat history as well as attached files (this is explained in this guide). To work efficiently with files, reduce the number of documents to the lowest number of files possible.

The limit of 20 files helps to increase the probability that all of the content fits into the context window. To work with larger documents or more files, you can use a knowledge folder and attach it to an assistant.

When do I use which integration?

There are different integration types in Langdock: Search integrations, chatbots, action integrations and APIs. You can find out more about the integration types in this guide.

When do I attach a file to a chat, when to an assistant and when do I use a data folder?

When to attach a file to a chat or an assistant chat:

  • There a small number of files

  • The file(s) is/are relatively short

  • You use the file only once or only in one chat

When to add a file in the assistant knowledge:

  • There a small number of files

  • The file(s) is/are relatively short

  • You want to use the file regularly in the assistant

  • The file does not change often (maybe every few days), so it is not too much effort to attach it again to the assistant

When to use knowledge folders:

  • There is a large number of files

  • The files are very long

  • You only need specific sections of the files for a prompt, not the entire files

    • For example: You have built an FAQ assistant and attached documentation to it. For each prompt, only some topics are needed and only the relevant sections are used to answer the request.

To find out more about the functionality of the different features, please refer to the next section.

How does a file attachment work and how is it different to a file in a knowledge folder?

There are two ways how contents of a file are processed for generating an answer:

  • One is that the entire document is sent to the model, together with your prompt (see this guide). This is the standard in chats and assistants.

  • AI models have a context window, which is the limit of how much text can be processed at once. For long documents or a large number of documents, the documents are summarized and only relevant sections are sent to the model in the context window. This is used in knowledge folders.

Attaching the files directly to an assistant or to a chat leads to the best results. If possible, you should follow this option. We recommend to attach as few documents as possible to an assistant or chat.

In some use cases, for example when working with large documentation or for an FAQ assistant, attaching the documents to an assistant or to a chat directly is not possible. Here, you can use the knowledge folder feature, which works well for use cases, where only specific sections are relevant, but not the entire documents.

When are newly released models available in Langdock?

Models are usually available in the US first. It takes a few weeks until they are launched in the EU in a GDPR-compliant way. We add the models as soon as they are available in the EU.

Which file types does Langdock support?

Langdock supports the following file types:

Text-based files:

  • PDF

  • Markdown (.md)

  • Text (.txt)

  • Word (.docx) / Google Docs

  • Powerpoint (.ppt) / Google Slides

  • JSON

  • XML (.xml)

  • EML (.eml)

  • VTT (.vtt)

Tabular Files:

  • Excel (.xslx, .xls) / Google Sheets

  • CSV

Images:

  • JPG

  • PNG

Because of how knowledge folders work, only text-based files can be uploaded in them. Also, Images can only be analyzed directly in a chat, not in assistants or knowledge folders

How long are files saved in Langdock?

Files are connected to either a chat, an assistant or a knowledge folder. To remove a file, you can delete the according entity. When a chat, an assistant or a knowledge folder is deleted, connected files are deleted immediately and can not be retrieved.

To manage how long chats are saved, Langdock has a data retention period that can be set by admins. Chats which have not been used in the defined period are deleted automatically. You can read more about this here.

Why can I not add a repository to a chat / an assistant / a knowledge folder?

AI models are not good at processing a whole repository yet. There are a few reasons for this:

  • First, they have a context window (=the maximum amount of text they can process at once), which is often times smaller than the repository. The chat and assistant have a limit of 20 files to increase the chance that the files fit into the context window. Oftentimes, a repository contains more than 20 files.

  • To handle documents or document batches that are larger than the context window, we have built the knowledge folder. Since the context window is a technical limitation of the model, not everything can be sent to the model. An embedding search, so a semantic pre-selection, identifies relevant sections of the documents and only these sections are sent to the model. For coding, it is important to consider the entire document, not only selected sections. Therefore, the context window also limits this behavior

  • Lastly, even if the repository fits into the context window, the model might still struggle to understand it if it is a large repository, since the answers decrease in quality with a fuller context window.

In our experience, the best approach is to work with individual files, smaller sections or screenshots only. Sorry - that was a long message, but I hope it helps