Build a Dataset
To train a custom model, you will need to collect a dataset of training examples. Each example consists of a prompt, which is a valid input text, and a completion, which is a correct output for that specific prompt.
π’ How many examples do I need?
At least 50 training examples are required to train a custom model. If you upload a file that contains fewer than 50 examples, you will encounter an error. More is better, so provide as many examples as you can.
π§© It's all about quantity?
No! Data quality and diversity matter as well, so itβs a good idea to go over your examples manually and make sure they accurately capture various aspects of your task. It's critical that your examples reflect the real-world distribution of your problem.
File Formats
The dataset must be submitted in a single .csv or .jsonl file. We recommend using a .jsonl file due to better handling of line breaks.
If you are submitting a .csv file, it should have two columns labeled "prompt" and "completion". Each row should contain a single training example, with no empty rows in between.
If you are submitting a .jsonl file, each line in the file should be a JSON dictionary with two fields: "prompt" and "completion".
Below is a simple example for both .csv and .jsonl formats:
prompt,completion
"Where was the steam engine invented?",England
"In what year did the US gain its independence?",1776
"Who was the longest reigning roman emperor?",Augustus
{"prompt": "Where was the steam engine invented?\n", "completion": "England"}
{"prompt": "In what year did the US gain its independence?\n", "completion": "1776"}
{"prompt": "Who was the longest reigning roman emperor?\n", "completion": "Augustus"}
π Common Errors
Uploading a file with fewer than two columns or with inconsistent columns in different lines will result in an error.
You may upload files with different column names - in which case, you will be prompted to identify the desired prompt column and the desired completion column.
Completions cannot be empty.
However, prompts can be empty.
Updated 9 months ago