Agentic automation using GPT-o1 and GPT-4o — flight cancellation example

Narayana Swamy
5 min readDec 28, 2024

--

The agentic automation has been talked about a lot in 2024. There was a podcast by Microsoft CEO a few weeks back on how the AI agents will/could replace enterprise business software that is currently powered by code written to implement the complex business rules of a company. He even mused that Excel could become a scratch pad and all the work we currently do in Excel could be done by an AI agent running python. This is a pretty huge change in terms of how the business and consumer world runs today using purpose built software applications.

At the same time, I came across a short course on Deeplearning.ai that showcased ‘reasoning with o1’ and it had a lesson on ‘meta-prompting’. Since o1 has reasoning capabilities, o1 can be asked to come up with a detailed if then rule logic prompt from a given business policy document. The rule logic prompt can then be used in GPT-4o (o1 roughly costs 6x more than 4o) to assist a customer in a chat session. The prompt can be run thru multiple test cases to ascertain the accuracy of the predictions. The inaccurate results could then be fed to o1 to revise the rule logic prompt. This means that we can use basic python commands to create an Agentic prompt using o1 and then run the Chat using 4o without needing sophisticated agentic frameworks like Langgraph, Crewai etc. We will take an in-depth look at how such a agentic system can be built for a flight change/cancellation use case.

Flight reservation is a very complex system due to many degrees of freedom that interact with each other — date, fare, connections are all interdependent given a starting location and destination. There is always a need for a customer or the airlines to change a previously booked flight and there are many rules that dictate those changes. The code in this blog borrows heavily from the DLAI codebase but it has a lot of updates to showcase the use case. 4o-mini was able to achieve a 88% accuracy on the test examples with some slight modifications to the o1-mini generated cancellation routine (o1-mini was used, not the o1 for learning purposes). Even the 2 test cases that are evaluated as wrong are not really wrong as the customer query is vague and both the ground truth and the generated answer could be true.

We start with the change/cancellation policy of the airlines like below.

### **1\. General Guidelines for Handling Customer Requests**

* **Confirm Identity**: Verify the customer's identity by asking for their booking reference and any additional required details (e.g., name and flight number).
* **Listen and Understand**: Clarify if the customer is looking to cancel, change, or inquire about compensation.
* **Check the Ticket Type**: Determine if the ticket is non-refundable, refundable, or flexible. This will affect the available options.
### **2\. Cancellations: Types and Policies**

This policy is provided to o1-mini with a prompt along with a list of functions so that the proposed action can be linked to a function. This list of function definition is critical as ChatGPT will return a tool function to call based on the customer request and the prompt rules

Please follow these instructions:
1. **Review the customer service policy carefully** to ensure every step is accounted for. It is crucial not to skip any steps or policies.
2. **Organize the instructions into a logical, step-by-step order**, using the specified format.
3. **Use the following format**:
- **Main actions are numbered** (e.g., 1, 2, 3).
- **Sub-actions are lettered** under their relevant main actions (e.g., 1a, 1b).

**Important**: Always wrap the functions you return in backticks i.e. `check_ticket_type`. Do not include the arguments to the functions.

Here are the currently available set of functions in JSON format:
TOOLS: {TOOLS}

o1-mini generates a detailed prompt that looks like:

1. **Confirm Identity**
a. Prompt the customer to provide their booking reference, full name, and flight number.
- Call the `verify_identity` function.

2. **Listen and Understand**
a. Ask the customer to clarify if they wish to cancel, change, or inquire about compensation.
- Call the `ask_clarification` function with the prompt: "Could you please clarify if you would like to cancel your booking, make changes to your flight, or inquire about compensation?"

3. **Check the Ticket Type**
a. Retrieve the ticket type based on the booking reference.
- Call the `check_ticket_type` function.

4. **Handle Cancellations**
a. If the customer initiated the cancellation:
i. If the ticket is refundable:
- Check specific fare rules for applicable refund amounts.
- Call the `check_fare_rules` function.
1. If a full refund is applicable, then:
Process a full refund.

All we do now is to send this prompt to 4o-mini to handle customer requests on flight changes.

{
"role": "system",
"content": f"""
You are a customer service agent that is responsible for handling airline related issues.
Below is the exact policy that you must follow to address the customer's issue.

POLICY:
{policy}

The notebook contains the detailed code to evaluate and respond to the various functions that 4o-mini returns as part of the tool call. Some of these functions are linked to a sqllite table to generate a query based on the schema of the tables.

"function": {
"name": "verify_identity",
"description": "Verifies the customer's identity using booking reference, full name, and flight number.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": f"""
SQL query extracting info to answer the user's question.
SQL should be written using this database schema:
{self.database_schema_string}
The query should be returned in plain text, not in JSON.
""",
}
},
"required": ["query"],

Test Results

One thing to note is that I was not able to run all the test cases in parallel in the free tier due to getting timed out of 4o on token rate limits per min. A chat gpt plus account should have much higher token rate limits.

The DLAI example actually goes through another step wherein the o1-mini is provided with info on the test cases that failed and o1 is asked to update the policy prompt it generated before. This wasn’t explored in this example but that is possible too and would improve the accuracy further.

Conclusion

The release of o1 with high reasoning capabilities will probably unleash a lot more of business use cases on AI agentic processes in 2025. The simple flight cancellation example shows the power of using o1 and 4o together to build an agentic system from scratch to handle customer requests for changes/cancellations to their flights. In a production use case, a lot of edge cases will need to be tested to make sure unintended actions are not taken by the Chat and it routes it to a customer service agent if it can’t figure out the right action with certainty. As with any enterprise software, sufficient guardrails will need to be built. But the reasoning capability of o1 opens up a lot of opportunities to automate the business logic layer of an Enterprise application.

If you like the article, please clap a few times! The example code is here — https://github.com/kswamy15/GPT-o1-4o-flight-changes/tree/main

--

--

Narayana Swamy
Narayana Swamy

Written by Narayana Swamy

Over 17 years of diverse global experience in Data Science, Finance and Operations. Passionate about using data to unlock Business value.

No responses yet