Troubleshooting Task Specific Models

Guidance for increasing accuracy when using TSMs

This documentation provides guidance for troubleshooting and improving AI21 Contextual Answers and Semantic Search Task Specific Models. It addresses common issues you may face and suggests methodologies to enhance performance when the customer is unsatisfied with the results.

Benchmarking Contextual Answers Guidance

When evaluating Contextual Answers (as when evaluating any model) the evaluation process is crucial for assessing the performance. Follow these general steps to refine your evaluation methods:

Create a Test Set: Begin with 10 or more questions, each with their own contexts and "golden answers." This helps in establishing a baseline for measuring improvements. We recommend having a diverse set of questions types.

Ensure Accuracy of your Test Set: Verify that the gold answers are correct and indeed contained within the given context. While you can use answers that are sourced from other large language models (e.g. Jurassic), it is essential to ensure that the responses are actually accurate and correct and are contained within the context.

Comprehensive Evaluation: Evaluate not only the True Positive instances (correct answers) but also consider True Negative instances (correctly identified "Answer not in doc"). This ensures a balanced evaluation of the model.

Evaluation Correctness of the response: Evaluation can be done either manually or automatically (e.g. using an LLM). However, if an LLM is used, care should be taken to avoid biases in evaluation, since LLMs generally prefer responses an LLM of a similar type. It is recommended that human evaluation be used either entirely or at least to evaluate the LLM classification of correctness of the Contextual Answers response.

Analyzing PDFs

When analyzing PDFs, the recommended approach is to use AI21's native PDF support. If you are using a custom parser, ensure that your parser is accurately parsing tables and other information, as Contextual Answers can be sensitive to incorrectly parsed input data. Note that, Contextual Answers also supports .docx, .html and standard .txt files.

When analyzing tables, we recommend passing the table contents as JSONL, where each row has the key (i.e. column name) and the value (i.e. the corresponding row entry). Note that for smaller tables, or for tables embedded withing a larger text, this step frequently can be skipped, as Contextual Answers will generally be able to surface answers from the raw table.

Improving the Question and Answers

Both RAG Engine and Contextual Answers expect a single, concise question. These questions should focus on surfacing answers directly from the document, as opposed to reasoning based on clues in the documen. Use the following guidelines:

  • Queries should be a single question only; avoid compound questions or giving further instructions.

  • Just ask a question, do not use prompt engineering techniques such as system prompts or explaining importance of the problem. This will not help Contextual Answers.

  • The question should be unambigiously answerable. Word questions to clearly specify what is desired. For example, if your document corpus covers querterly reports for the past 10 years, then "What was the earnings in Q1 2023?" is much clearer than "What was the earnings last quarter?"

  • If you wish to have answers be longer or shorter, use the answerLength = “short” / “medium” / “long” parameter.

Improving Semantic Search

  • If you find that Semantic Search is providing irrelevant extracts, use labels and paths to make the search more focused.

  • Modify the maxSegments parameter to refine the number of search results.