Raphael Ancellin, Product Manager, AXA
Fellow, CodeX—The Stanford Center for Legal Informatics
NOTE ON THE DISCUSSION PAPERS: The CodeX Insurance Initiative has invited leaders from industry, academia, and the regulatory community to contribute short papers describing the authors’ views on important issues relating to the application of computable contracting in the insurance industry. The development of computable contacting for insurance is still a work in progress, and the sharing of ideas and approaches within the community of interest is a major goal of the Insurance Initiative. As a part of this conversation, these papers present the views of their authors, and do not necessarily reflect the views of CodeX, of the Insurance Initiative, or of any of its participants.
What is the problem?
Data analysis is a key tool for effectively and profitably managing an insurance business. Insurance policy contracts, or “products,” as a central component of the insurance business, are a primary target for such analytics, with many types of analyses possible. In this paper, we will focus on risk analytics applied to contracts at the portfolio level (active contracts grouped together for administration/management purposes), a topic at the very heart of the insurance business. The goal of this inquiry is eventually to automatically answer questions such as, “What is the cumulative risk in US dollars for peril X in portfolio Z if event A and/or B happens?”.
Insurance should be one of the best industries at analyzing or simulating the risk exposure on its portfolios. However, in traditional insurance practice, only pre-defined data points are reported in policy manager systems during contract issuance. In a world of paper and natural language, this is a mostly offline process involving multiple players (insurers, brokers, customers…), which is difficult to standardize and govern. The reporting system records and transmits only limited aspects of the transaction, which are not always accurate or standardized. Therefore, accuracy and granularity of the data are often insufficient to support complex forms of analysis such as coverage calculations.
Traditionally, when insurers seek to perform in-depth risk analysis, they must go back to “ground truth,” aka the signed paper contract – where all the data resides. Accurate extraction of these details is time-consuming and expensive.
Contract automation can greatly improve insurance data
Since traditional insurance risk analysis requires experts to process millions of documents, insurance companies are trying to automate the process to get answers faster at lower costs. Currently, a complete analysis of a corporate contract can take up to two days. One strategy has been to turn Machine Learning (ML) and Natural Language Processing loose on the legacy natural language policy documents, and solutions based on Machine Learning (ML) have been tested multiple times. Despite some progress, there are a number of obstacles that must still be overcome, including the following:
Challenge #1: Accessing and cleaning the data. This first phase begins with identifying the latest versions of the documents that have been stored in a variety of formats across several enterprise Content Management Systems (CMS) in multiple countries. The second step consists of connecting all the documents required for the analysis of a single contract (e.g., general conditions, specific conditions …). Currently, grouping of documents requires human analysis and cannot be fully automated because it involves complex reasoning. Then, the documents are “OCRized” to convert the PDF format into a computable data format.
Even with the most sophisticated solutions, data extraction (parsing and name entity recognition) can only approach 70-80% accuracy on average. This excludes documents containing elements such as infographics and tables that can significantly lower accuracy.
Challenge #2: Identifying relevant clauses for analysis. After natural language and numerical data have been converted from PDF to a computable format, analysis requires that the AI/ML tool identifies key clauses and information such as coverage definitions and limits, and exclusions.
Currently, ML-driven solutions can help analysts by identifying the recurrence of certain key words (e.g., “cyber,” “data”) and combining this with metadata (e.g., customer industry sector such as healthcare or energy) to retrieve contracts, and prioritize portions of the contract, that ought to be checked first by human analysts. Prototypes developed at AXA over the past four years have shown that a human expert training an ML solution could help further automate this triage task over time.
Challenge #3: Transforming content into quantifiable exposures. This step involves a qualitative analysis of the whole contract to understand if a certain loss, such as business interruption, is covered. What are the exclusions, limits, etc. in a given situation? Handling rules are also included to calculate claim payments.
This analysis must take each clause into account as well as the relationship between clauses. For example, an exclusion previously mentioned can change the output of another clause in the contract. After developing multiple prototypes, AXA technical teams concluded that there is currently no ML-driven solution that automates the task with “business-ready” accuracy. This conclusion is becoming consensus in the industry.
Challenge #4: Aggregating risk exposure (on selected perils) at portfolio level for further analysis. It would be tempting to skip Step 3 above, and instead extract maximum coverage limits from each contract and add them across a portfolio to calculate risk. However, this calculation is insufficient because exclusions and handling rules are not included. Limit totals do not equal total risk. The proper automation of this phase is necessarily linked to overcoming above-described Challenges #2 and #3, which, as we have shown, is not an easy undertaking.
Currently, NLP and ML simply do not provide safe automation of risk assessment, and there is a long road ahead. Even if 80% accuracy in risk data extraction and analysis could be reached in the near term, a very impressive result for data science, it would remain insufficient for risk management.
Automated Computable Contracts could be the solution
Prototypes of “insurance contracts as code” show promising results in calculating risk automatically and instantly in large portfolio of contracts, with an accuracy approaching 100%.
The NLP approaches previously described are based on interpreting a “paper-first” contract and providing a result with only a percentage of certainty. Computable products, by contrast, are code first, and their reasoning is based on logic programming. Provided the data on the contract are properly entered to begin with, the data and outcome are fully accessible for analysis as they are already set out in structured formats, with the logic of the agreement fully realized in the computer code embodying the contract.
In addition to improved risk analysis, other potential benefits include improved operations: a solution representing contract as code can be plugged into an API to then “feed” many other insurance systems (e.g., the claim manager, policy manager, and/or call center software) with the data source derived directly from the contracts. The approach can work with both legacy and natively computational contracts alike:
- For legacy contracts/products, our work at AXA has demonstrated that it typically takes two days for an insurance agent to convert one traditional insurance contract into code, leveraging a “no-code” interface developed at our company. AI/ML solutions are already supporting this effort – extracting data points to be verified by a human. Ongoing prototyping efforts leveraging cutting-edge technologies such as GPT3 are encouraging. These technologies could open the way to automatic conversion from text to “product as code” in future.
- For new products/contracts, technology solutions exist for designing insurance products as code. At AXA, we leverage computable clauses associated with certain risk and pricing models. A human agent can assemble these clauses using a no-code interface to build the computable insurance product, incorporating the associated model(s). Then, the tool generates the contract document to be signed by the customer. This approach has been tested in production for a direct-to-customer distribution model for personal insurance. However, this is not the only distribution scenario. For instance, in corporate insurance, brokers are typically involved in the process, and there is no standard way to design a contract. We could envision working with brokers to co-build a solution allowing them to maintain a central role in customization while still capturing the benefits of standardized automation. The standardization of this design process (business, data and tech standards) across the industry will bring benefits from actuarial analysis, to product comparison, to ultimately a streamlined reinsurance market.
How to move forward?
Here are steps that can help insurance companies accelerate automation of risk analytics at portfolio level – and that will also contribute to the work on computable products:
- Acknowledge that fully automating analysis of the paper-first contract is some distance off, and focus, instead, on machine-assisted human analysis. This feedback loop requires the agent to do most of the tasks at the beginning to train an AI/ML solution. Then the AI/ML will take over some tasks, such as clause identification, and accelerate other tasks, such as classification of the contract to be checked by humans in priority order, depending on established criteria. Extracting the reasoning of the contract itself – transforming the content into quantifiable exposures – will probably stay human-driven until a major technological breakthrough. And, when it does occur, insurance executives will need to accept that results of these analyses are an estimate based on a machine-lead interpretation that can be difficult to explain a posterior.
- Build standardized clause libraries. When we compare insurance contracts of different industry players, we quickly notice that many clauses are quite similar. Short term, having a standardized clause library at the enterprise level, and possibly at the industry level, will support better NLP-driven text search and clause comparisons. In the longer term, standardized clauses will also help insurers build faster and safer contracts by assembling clauses that have been pre-approved. There is a balance to be found between a contract designed in “total free text” and a too-structured/constrained format that hampers creativity or negotiation. Finally, these standardized clauses could be progressively converted from “text first” to “code first” to lay the foundation for computable contracts.
- Structure the data for improved analytics during the subscription phase of the contract. This is a challenging technological and business transformation involving multiple players (insurers, brokers, customers …). Archaic tools and habits must be changed. Replacing email and word processing solutions with contract lifecycle management systems appears to be an efficient way to streamline the subscription process, including contract drafting and clause comparison; collect the right data, including the mapping of contract clauses and limits; and support basic automation of analytics.
- Start to create computable product and contract portfolios in selected lines of business. Low-cost and/or ultra-tailored products, as well as product lines with problematic claims experience, can provide suitable initial targets for portfolio conversion into computable products for marketing and claims operations. The automation of analytics will come along naturally as the conversion goes forward. This trailblazing exercise will allow an insurer to learn as it goes.
Risk analysis and simulation for insurance contracts at the portfolio level is a critical area to automate. Despite progress, AI/ML approaches do not yet provide a magic solution, and their accuracy will stay well below acceptable levels for actuarial and risk analysis for some time to come.
A different strategy can help. It’s time to change how we represent insurance contracts, using software to make them computable, improving not only their interpretation and data production, but providing benefits to claim administration, sales, and many other aspects of the insurance business, while improving consumer choice and experience.
In order to fully capture these benefits, the industry should come together to work on a shared specification that can be used between and across insurance enterprises.