Selecting a model

  • Whenever you start a new chat, you can use the model you want to work with at the top left.

  • You can still change the model at the top left if you have already started a chat. For example, you can start with GPT-4o and, after three messages, switch to Claude 3.7 Sonnet.

  • When you switch models in an ongoing chat, the entire context of the previous chat history and your data (documents, texts, websites) is always passed to the selected model. So, you don’t need to worry about keeping the context when switching models.

  • As no data is ever stored in the models, the system provides all context to the model with every request you make.

  • You can also set your personal default model in the account settings here. The default for new users is GPT-4o.

Selecting the right model

Here we provide you with an overview of which models do exceptionally well at which use-cases that you might encounter. Below, we also outline our personal recommendations based on our current user-feedback.

ModelProviderStrengthsToken LimitsNotes
GPT-4oOpenAIContent Generation & Creative Writing, Data Analysis (Excel & CSV files), Image Analysis, Translation and Multilingual SupportInput: 128,000 Output: 16,384OpenAI’s versatile flagship model with excellent multimodal capabilities. Balances performance and speed with strong creative writing and reasoning skills. Best all-rounder.
GPT-4o MiniOpenAISpeed and Efficiency, Image AnalysisInput: 128,000 Output: 16,384Faster and more cost-effective than GPT-4o but with reduced capabilities for complex tasks. Ideal for quick responses and lightweight applications where speed matters more than depth.
o3 MiniOpenAIReasoning and Problem-Solving, Translation and Multilingual Support, Hallucination ResistanceInput: 200,000 Output: 100,000OpenAI’s specialized reasoning model with excellent efficiency. Balances speed and accuracy for complex math, physics, and coding tasks. More focused on reasoning than general-purpose GPT models.
Claude 3.7 SonnetAnthropicSoftware Development & Coding, Content Generation & Creative Writing, Image Analysis, Translation and Multilingual SupportInput: 200,000 Output: 8,192Anthropic’s most advanced model with unique dual-mode capability. Features standard mode for quick answers and extended thinking mode for complex reasoning tasks. Significantly improved coding capabilities over previous Claude models.
Gemini 2.0 FlashGoogleLong-Context Analysis & Document Processing, Image Analysis, Speed and EfficiencyInput: 1,048,576 Output: 8,192Google’s high-speed model with massive context window. Outperforms the previous Gemini 1.5 Pro model on key benchmarks while being twice as fast.
Gemini 1.5 ProGoogleLong-Context Analysis & Document Processing, Image AnalysisInput: 2,097,152 Output: 8,192Google’s previous flagship model with 2M token context window. Strong multimodal capabilities and excellent for long-context applications.
Mistral Large 2411MistralSoftware Development & Coding, Multilingual CapabilitiesInput: 131,000 Output: 4,096Mistral’s top model with excellent reasoning capabilities. Strong performance on sophisticated conversations and complex problem-solving. More refined than previous versions with better instruction following.
LLaMA 3.3 70BMetaSpeed and Efficiency, Multilingual CapabilitiesInput: 128,000 Output: 2,048Meta’s leading open-source model with competitive performance against proprietary alternatives. Significant improvement over previous Llama models with better reasoning and instruction following.
DeepSeek R1 32BDeepSeekSoftware Development & Coding, Hallucination ResistanceInput: 128,000 Output: 8,000As the first competitive model from China, Deep Seek R1 is particularly impressive, considering it was built with significantly fewer resources than competitors like OpenAI. Optimized for quality of reasoning rather than speed, it demonstrates exceptional performance on benchmarks like math and coding challenges.
DeepSeek V3DeepSeekReasoning and Problem-Solving, Software Development & CodingInput: 128,000 Output: 8,000General-purpose model, made for everyday AI tasks where speed and versatility matter more than deep reasoning.

Our Recommendations

Our Default for Everyday Tasks: GPT-4o (OpenAI)

GPT-4o from OpenAI is our top recommendation for a versatile standard model, excelling in a wide range of tasks with its exceptional multimodal capabilities. This flagship model is ideal for users who need a powerful all-rounder, offering excellent performance in content generation, creative writing, image analysis, and multilingual support. GPT-4o stands out for its ability to balance performance and speed while maintaining strong creative writing and reasoning skills.

A Hybrid Model for Everything Else: Claude 3.7 Sonnet (Anthropic)

Claude 3.7 Sonnet from Anthropic is our top recommendation for coding or text generation. Many of our users prefer Claude over GPT-4o for writing their software code, emails, texts, translations etc. This hybrid model is ideal if you need both quick answers and in-depth analysis, offering a standard mode for efficient responses and an extended thinking mode for complex tasks. Claude 3.7 Sonnet offers amazing instruction-following abilities, minimal hallucination, and improved coding capabilities. We recommend it for its consistently high user satisfaction in language-related and software development tasks.

The Specialized Model for Complex Reasoning: o3 Mini (OpenAI)

o3 Mini from OpenAI is our top recommendation for specialized reasoning and problem-solving tasks, offering a unique balance of efficiency and analytical power. This model can solve very complex mathematical, physics, and coding challenges, making it ideal for tasks that require deep analytical capabilities. Also, o3 Mini stands out for its excellent efficiency, larger input context window, and strong resistance to hallucination. We recommend it for its ability to handle complex scenarios and problems with high accuracy and speed.

Special Use-Cases

The other models are good for specialized tasks or specific languages over other models. Also, o1 and DeepSeek’s R1 reasoning is different compared to o3 Mini, so feel free to try different models to find your favorites.