Tips: Training a Custom Model

  • Start with a few-shot prompt. Don't jump right into training a model. The playground exists for a reason, so try out different prompt structures to see what works best for you. Found it? Use this exact structure to train your model.
  • Test the different models on the playground before choosing what to train on.
  • Model training is not a one-time affair, but an iterative process. In cases of complex tasks, we recommend training several versions on smaller parts of your dataset to root out any problems in the data (that cannot be identified by few-shots alone), as well as setting the correct hyperparameters.
  • We selected the default hyperparameters after testing several benchmarks and choosing those that gave optimal results across the board. Nevertheless, if your trained model doesn't work as expected, you may want to train another version with different hyperparameters.
  • Sometimes custom models work better when fed a few-shot prompt. Be sure to perform some calls with few-shot prompts when testing your trained model.
  • There is no guarantee that more epochs will provide a better result. From our experience, this is especially true when you want your model to generate expressive and diverse content. A relatively short training will help retain J-1’s original abilities. Longer training will provide a model that behaves more like the data in your dataset (for good or bad).
  • The model reflects the data, so if the data has biases, the model will learn the bias. For example, if most of your data begins with the words “I think that”, then the model will learn to give high probability for these words. This can help in debugging the model, so look at the probabilities (using alternative tokens) to understand the biases in the data.
  • For classification tasks, try to balance the data, meaning an equal number of examples for each class.