Data Extraction and Structured Output
Travel Booking Information Extraction
Challenge: Extract structured information from conversational data while handling missing information appropriately and avoiding hallucinations.
Scenario: Processing travel booking conversations to extract key booking details in a structured format.
Despite receiving explicit instructions, GPT-4.1 fails to properly handle partial information and makes unauthorized inferences:
You are an expert in extracting structured information from unstructured text, specifically travel booking conversations.
You are given a conversation as plain text between a user and an AI travel agent. Extract and return the following fields:
Name: Full name of the traveler
HotelName: Name of the hotel selected by the traveler
CheckInDate: Check-in date in YYYY-MM-DD format if the date can fully be extracted
CheckOutDate: Check-out date in YYYY-MM-DD format if the date can fully be extracted
City: City where the hotel is located
Country: Country where the hotel is located
NumberOfGuests: Total number of guests
Requirements:
1. Return a bulleted list where each field is a bullet
2. If any field is not mentioned or can’t be fully extracted from the conversation, fill it with "NA
3. Do not invent, infer, or assume any details that are not explicitly stated in the conversation.
Conversation:
User: Hi there! I'm planning a trip to Rome next month and need help booking a hotel.
AI: I'd be happy to help! Could you tell me your travel dates and how many people will be staying?
User: Sure. I'll be arriving on July 10th and checking out on July 15th. It'll be just me and my wife, so 2 guests.
AI: Great. Would you like me to recommend some hotels in Rome, Italy?
User: Yes, please.
AI: Here are a few options:
Hotel Artemide
The Rome EDITION
Hotel Nazionale
Do any of these interest you?
User: Hotel Artemide sounds perfect. Let's go with that.
AI: Excellent choice! May I have your full name for the reservation?
User: It's Jonathan Rivera.
AI: Thank you, Jonathan. I've reserved a room at Hotel Artemide in Rome, Italy for 2 guests from July 10 to July 15.
GPT-4.1 Output:
- Name: Jonathan Rivera
- HotelName: Hotel Artemide
- CheckInDate: 2023-07-10 ❌ VIOLATION: Inferred year (2023) not in conversation
- CheckOutDate: 2023-07-15 ❌ VIOLATION: Inferred year (2023) not in conversation
- City: Rome
- Country: Italy
- NumberOfGuests: 2
Issues Identified:
- Instruction Violation: Added year “2023” despite explicit instruction not to infer
- Compliance Failure: Ignored “fill with NA” requirement for incomplete dates
You are an expert in extracting structured information from unstructured text, specifically travel booking conversations.
You are given a conversation as plain text between a user and an AI travel agent. Extract and return the following fields:
Name: Full name of the traveler
HotelName: Name of the hotel selected by the traveler
CheckInDate: Check-in date in YYYY-MM-DD format if the date can fully be extracted
CheckOutDate: Check-out date in YYYY-MM-DD format if the date can fully be extracted
City: City where the hotel is located
Country: Country where the hotel is located
NumberOfGuests: Total number of guests
Conversation:
User: Hi there! I'm planning a trip to Rome next month and need help booking a hotel.
AI: I'd be happy to help! Could you tell me your travel dates and how many people will be staying?
User: Sure. I'll be arriving on July 10th and checking out on July 15th. It'll be just me and my wife, so 2 guests.
AI: Great. Would you like me to recommend some hotels in Rome, Italy?
User: Yes, please.
AI: Here are a few options:
Hotel Artemide
The Rome EDITION
Hotel Nazionale
Do any of these interest you?
User: Hotel Artemide sounds perfect. Let's go with that.
AI: Excellent choice! May I have your full name for the reservation?
User: It's Jonathan Rivera.
AI: Thank you, Jonathan. I've reserved a room at Hotel Artemide in Rome, Italy for 2 guests from July 10 to July 15.
Requirements:
- If any field is not mentioned or can’t be fully extracted from the conversation, fill it with “NA”
- Do not invent, infer, or assume any details that are not explicitly stated in the conversation.
- return a bulleted list where each field is a bullet
Output:
- Name: Jonathan Rivera
- HotelName: Hotel Artemide
- CheckInDate: NA ✅ CORRECT: Year not specified, properly marked as NA
- CheckOutDate: NA ✅ CORRECT: Year not specified, properly marked as NA
- City: Rome
- Country: Italy
- NumberOfGuests: 2