Taming the AI Machine

Necessary Action

Well, the Business community is catching on to what we’ve been sharing over the last year. If you want to avoid a colossal wreck you need to design AI apps with controls and supervision. Air Canada’s experience is a fair representation of the issue. They were directed by the Courts to refund a passenger for a policy error made by an AI Chatbot. Companies have gained solid productivity results with Generative AI apps deployed internally, but the risk of aberrant behavior by a GenAI app is too great for customer touch points. Even for internal apps, caution is warranted for certain use cases, as highlighted by Microsoft in a recent survey.

Our most recent post described the success we’ve seen with function calling as a mechanism to restrict model access to trusted process. This architecture uses the LLM as a logic model, more than a language model, selecting from a range of available functions to execute the next best action in a process.

Dr. David Ferrucci from Elemental Cognition (EC) explains it this way:

“Put simply, we don’t use the LLM to establish facts. We use it only to communicate facts that are established by trustworthy components.”

Moreover, EC benchmarks show very impressive results when LLMs are combined with a finite set of ‘trusty components’ to address complex problems. The performance reported by EC point to an enduring architecture for the design and deployment of GenAI apps, even among customer facing systems.

Our most recent prototype was centered on handling a customer reservation at a micro-resort. If you haven’t read about the micro-resort rage, you can learn more about it here. We recognized that a GenAI app could provide winsome interactions for a customer, but needs to be harnessed with the ‘trusted components’ to handle everything from verifying room availability to processing payments.

In our prototype, we built a set of functions to address a host of typical customer interactions, such as booking a room reservation, capturing special requests and even finding tickets to local events. Each of these functions are defined, and their schemas provided to direct the model on the information required to complete the activity. The model is free to select which function to execute in the process, and our tests have shown promising (though not perfect) results.

Here is an example of a function from our prototype:

// Define the schema using zod based on the JSON structure
const ReservationSchema = z.object({
  hotel: z.string().describe('Name of the hotel'),
  location: z.string().describe('Location or branch of the hotel'),
  fullName: z.string().describe('Full name of the user making reservation'),
  numberOfGuests: z.number().describe('Total number of people who will be staying in the room'),
  checkInDate: z.string().describe("Date when the guests will arrive in 'YYYY-MM-DD' format"),
  checkOutDate: z.string().describe("Date when the guests will leave in 'YYYY-MM-DD' format"),
  roomType: z.enum(['single', 'double', 'suite']).describe('Type of room desired'),
  specialRequests: z
    .array(z.string())
    .optional()
    .describe(
      'Any specific requests like a room on a certain floor, near the elevator, extra bed, etc.'
    ),
})

const reserveHotel = async (args) => {
  const {
    hotel,
    location,
    fullName,
    numberOfGuests,
    checkInDate,
    checkOutDate,
    roomType,
    specialRequests,
  } = args

  if (FLAG_ALLOW_UNKNOWN_ERROR) {
    let chance = Math.round(15 * Math.random())

    if (chance === 13) {
      return {
        error: 'Unknown error',
        message: 'Failed to make hotel reservation. Please try again.',
        hotel,
        location,
      }
    }
  }

  if (!location)
    return { error: 'Invalid location', message: 'Please specify the location or branch' }
  if (!hotel) return { error: 'Invalid hotel name', message: 'Please specify the name of hotel' }
  if (!fullName) return { error: 'Invalid name', message: 'Please specify your full name' }
  if (!numberOfGuests)
    return { error: 'Invalid guest number', message: 'Please specify the number of guests' }
  if (!checkInDate)
    return { error: 'Invalid Check-In date', message: 'Please specify the Check-In date' }
  if (!checkOutDate)
    return { error: 'Invalid Check-out data', message: 'Please specify the Check-out date' }
  if (!roomType) return { error: 'Invalid room type', message: 'Please specify the room type' }

  if (fullName.toLowerCase().indexOf('full name') >= 0) {
    return {
      status: 'No name provided',
      message: 'Please ask user provide your full name',
      hotel,
      location,
      numberOfGuests,
      checkInDate,
      checkOutDate,
      roomType,
      specialRequests,
    }
  }

  const reservationId = getUniqueId()
  const reservation_data = {
    status: 'Reservation successful',
    reservationId: reservationId,
    message:
      'Your reservation has been completed. Please present your reservationId at the front desk.',
    hotel,
    location,
    fullName,
    numberOfGuests,
    checkInDate,
    checkOutDate,
    roomType,
    specialRequests,
  }

  QuickCache.save('reservation', JSON.stringify(reservation_data), location, hotel, reservationId)

  return reservation_data
}

// Tool metadata from the JSON schema
const toolName = 'reserve_hotel'
const toolDescription = 'Reserve a room for the user in the hotel'

// Creating the tool instance
const reserveHotelTool = Tool(ReservationSchema, toolName, toolDescription, reserveHotel)

export { reserveHotelTool as reserveHotel }

The job of the LLM is to recognize the information it needs to collect from the customer to complete the reservation. Our experience shows that the GenAI model can be very clever with customer interactions, and does an impressive job with gathering what's required as defined by the schema to complete the process. The function is responsible for executing the API that is tied to a backend production system and completing the reservation with precision.

We hope you agree that GenAI technology holds great promise for the way critical systems can be built and delivered in the future. While more work is needed before we can unleash GenAI at scale, the developments in the past year have been promising, and the investments worthwhile.

Give us a call if you would like to explore further the actions needed to tame GenAI for your organization.