Generation sets handbook

Generation sets

1. Introduction

a. Overview of "Generation Sets"

"Generation Sets" is a powerful and innovative feature available in the AI21 Studio application. Designed to streamline and optimize the process of generating completions for multiple prompts, this feature enables users to create and manage up to 1000 prompts at once. The ability to perform bulk generation without any coding or extensive use of the API framework makes it an invaluable tool for language model application developers and prompt engineers alike.

📘

Bulk generation supported only for Jurassic-2 foundation models

Please note that "Generation Sets" is not applicable with task-specific models, and its capabilities are reserved for those seeking extensive control and customization through foundation and custom models.

b. Purpose and Benefits

The primary purpose of "Generation Sets" is to increase productivity and efficiency. Whether testing a prompt, evaluating model parameters, or analyzing custom models, users can generate a multitude of completions and evaluate their quality with ease. Key benefits include:

  • Bulk Generation: Manage and generate up to 1000 prompts in a single set, speeding up the development process.
  • Customizable Model Parameters: Tailor generations with specific settings, such as temperature and max tokens, for individualized or uniform output.
  • Seamless Evaluation: Integrate with evaluation tools for a thorough analysis of generated results.
  • Cost-Effective: Use of "Generation Sets" is subject only to the API usage pricing associated with your account, without additional costs for the feature itself.
  • User-Friendly Interface: Perform complex actions with minimal effort through an intuitive interface.

c. Typical Use Cases

"Generation Sets" is particularly valuable for those engaged in continuous testing, evaluation, and development of language model based applications. Here are common scenarios where it excels:

  • Prompt Testing: Quickly generate multiple completions to evaluate and refine prompts.
  • Model Evaluation: Analyze the robustness and quality of various models by generating completions under controlled parameters.
  • Custom Model Development: Streamline the process of creating and fine-tuning custom models by managing bulk generations efficiently.

This feature opens up new possibilities for large-scale analysis and development, making it a must-have tool for professionals in the field.


2. Getting Started

a. Accessing "Generation Sets" in AI21 Studio

"Generation Sets" can be easily accessed within the AI21 Studio application. Simply navigate to the "Assets" section, and you'll find the "Generation Sets" page.

b. Prerequisites and Requirements

Before using "Generation Sets," make sure you have the following:

  • A Dataset with Prompts: Prepare a dataset that includes a list of LLM input prompts. Optionally, you can also specify individual model parameters for each prompt.

c. Creating a Dataset (Including Formatting Guidelines)

Creating a dataset that complies with formatting guidelines is crucial for successful use of "Generation Sets." You can include up to 1000 prompts in one file. Here's a general outline for formatting:

  • File Format: The dataset must be in either CSV or JSONL format.
  • Prompts: Include the prompts you wish to generate completions for.
  • Parameters (Optional): Specify individual model parameters such as temperature, max tokens, etc., if desired.

For detailed guidelines and examples, you can download the "sample generation set" CSV file provided in the application. This sample will guide you in structuring your own datasets.

d. Downloading Sample Generation Set

If you're new to "Generation Sets," the sample generation set is a helpful starting point. Available for download within the application, this pre-formatted CSV file provides an example of how to structure your prompts and parameters. Simply modify this template to suit your needs.


3. Using Generation Sets

a. Uploading a new generation set

To begin using "Generation Sets," first, go to the "Generation Sets" page and click to select a set to upload as a new evaluation set. You can upload a file containing up to 1000 generations/prompts in CSV or JSONL format. Follow the instructions on the page to complete the upload.

b. Setting Customizable Model Parameters

"Generation Sets" allows for full customization of model parameters for each prompt, including temperature, max tokens, and more. You can tailor each generation to your specific needs or apply uniform parameters across all generations for consistent output. Simply define these settings in your uploaded file. An excellent example can be found in the sample generation set CSV file.

c. Initiating Bulk Generation

Once you've uploaded your file and configured any necessary settings, click the "upload and generate" button to initiate the bulk generation process. You'll see a table displaying the set you've uploaded, with the generated completions filling in rapidly. It's a quick and easy process to generate completions for all the prompts in your set.

d. Monitoring Progress and Completing Generations

You can monitor the progress of the generations in real time. If, for some reason, not all the prompts receive generated completions, you can click the "complete generation" button, and it will continue from where it stopped. This ensures that you get completions for all prompts without any further intervention.

e. Understanding Limitations (e.g., Maximum Number of Prompts)

While "Generation Sets" supports up to 1000 generations/prompts in one file, if you wish to generate more, simply upload several generation sets of up to 1000 each and generate them all in sequence. Always ensure that your file complies with the formatting guidelines to ensure a smooth generation process.


4. Exporting and Evaluating Results

a. Exporting Generated Completions

Once all generations are completed, users have the option to export the entire set as a CSV or JSONL file. This provides flexibility in handling the results, enabling you to use them in various applications or for further analysis. Simply locate the relevant options within the "Generation Sets'' page and follow the prompts to complete the export.

b. Evaluating Completions Using the "Evaluation" Tool

"Generation Sets" offers seamless integration with the "Evaluation" tool, facilitating a comprehensive analysis of the generated completions. While the current process involves exporting the generation set and then uploading it as an evaluation set manually, the streamlined one-click action will soon be introduced, making the transition from generation to evaluation even more straightforward.

c. Understanding the Use of Exported Files

The exported CSV or JSONL files can be used for various purposes, depending on your needs:

  • Quality Analysis: You can analyze the generated completions to assess their quality and robustness, determining if they meet your specific requirements.
  • Model Fine-Tuning: If the quality of the generated completions meets your requirements, the exported set can be added to a fine-tuning set or serve as a new fine-tuning set. It can then be used to train a custom model, enhancing its performance and alignment with your specific needs and preferences.
  • Archival and Record-Keeping: The files serve as a permanent record of the generated completions, allowing for easy access and reference in the future.

5. Pricing and Subscription Considerations

a. Cost Associated with "Generation Sets"

It's important to note that the "Generation Sets" feature itself does not carry an additional cost. The primary cost associated with using this feature is the generation of completions, known as the usage cost.


6. Best Practices and Common Pitfalls

a. Best Practices

  • Understanding File Formats: Make sure to use CSV or JSONL files when uploading your prompt sets, as these are the supported formats. Utilize the provided "sample generation set" file to understand the correct formatting.
  • Planning Bulk Generations: Think through your bulk generation needs in advance and break them into manageable sets if more than 1000 prompts are required.
  • Utilizing Model Parameters Effectively: Carefully consider the customizable model parameters to tailor each generation to your specific needs. Uniform parameters across all generations can also provide consistent output.
  • Evaluating Generation Sets: Evaluating the generation set after bulk generation allows you to systematically analyze the quality and robustness of the completions, ensuring that they meet your requirements.

b. Common Pitfalls

  • Not Following Content Formatting Guidelines: Mistakes in content formatting, such as entering a temperature value that isn't a number between 0 and 1, can lead to problems in generating the completions. Always adhere to the specific guidelines for each parameter, as seen in the sample generation set CSV file.
  • Exceeding Prompt Limits: Trying to upload more than 1000 prompts in one file can lead to issues. Always comply with the limitation, and create separate sets if needed.
  • Exceeding Max Tokens per Prompt: Attempting to generate completions that exceed the maximum number of tokens allowed per prompt can cause errors. Be sure to adhere to the specified token limits for the selected model, as detailed in the model's documentation.