The life cycle of an AI model

A Large Language Model (LLM) undergoes two main phases:
  1. The training phase
    • The model is trained on large data sets
  2. The usage phase.
    • The model can be used to generate an answer
    • The model can not learn anymore
The life cycle of an AI model

Training an LLM

During training, the model processes vast amounts of text data using a technique called “next token prediction.” The model learns statistical relationships between words and concepts by repeatedly predicting what word should come next in a sequence. For example, given the text “The capital of Germany is ___”, the model learns that “Berlin” has a high probability of being the next token. Through billions of these predictions across diverse text, the model builds a sophisticated understanding of language patterns, facts, and reasoning. Once training completes, the model’s parameters are frozen. The “knowledge cutoff date” marks when training data collection stopped, meaning the model has no knowledge of events after this date.

Using an LLM

During the usage phase (also known as inference), the model generates responses by sampling from the probability distributions it learned during training. When you ask about Artificial Intelligence, the model assigns much higher probability to related terms like machine learning than unrelated ones like banana cake. When a user sends a prompt to the model, the model will choose the next word or word-piece (token) based on these probabilities. For example, the word Hifrom the user lets the model probably answer with a greeting. It answers with Hello. Then, it generates the next most likely word based on Hiand Hello. This process is repeated until the model decides the request was sufficiently answered. The generation process works token by token:
  1. User sends: Hi
  2. Model predicts high probability for greeting tokens like Hello
  3. Model then predicts the next token based on Hi Hello
  4. This continues until the model generates an end-of-sequence token

Influencing the output of a response

Since deployed models cannot learn after being deployed, how do they remember previous messages or incorporate new information? The answer lies in the context window. Anatomy of a Message sent to the model Each request to the model includes everything needed for that specific response: your current message, the entire chat history, attached documents, system instructions, and any relevant knowledge base content. This complete context gets packed into the model’s context window (the maximum amount of text it can process in a single request). The model treats each request as completely independent, but by including all relevant context, it can maintain coherent conversations and reference previous information.