J P Morgan Case Study
J P Morgan Case Study
Introduction:
J P Morgan announced it had developed and deployed new software called COIN—shorthand for
Contract Intelligence—that automates document review for a certain class of contracts. COIN does
the mind-numbing job of interpreting commercial loan agreements that consumed 360,000 hours of
work each year by lawyers and loan officers within seconds.
Learning Outcomes:
Convert a business problem into an analytical problem and the ability to break the process down using
CRISP-DM.
Background Information: The software employs image recognition to identify patterns in these
agreements. While JPMorgan has been tight-lipped about the details of the proprietary technology, the
bank has stated that the algorithm digests data on the bank’s numerous contracts and it can identify
and categorize repeated clauses. The bank reports that the algorithm classifies clauses into one of
about one hundred and fifty different “attributes” of credit contracts. For example, it may note certain
patterns based on clause wording or location in the agreement.
Scenario:
The software reviews in seconds the number of contracts that previously took lawyers over 360,000
person-hours. Apart from shortening the time it takes to review documents, COIN has also managed
to help JP Morgan decrease its number of loan-servicing mistakes. J P Morgan intends to deploy
COIN for more complex filings, such as credit-default swaps and custody agreements In the medium
and long term, the bank also hopes to use machine learning to interpret altogether new regulations.
Problem Statement/ Business objectives:
How would you break the problem/scenario down with the help of CRISP-DM methodology:
“Automate the classification of various legal documents”.
Question:
Distinct the process based on all the steps in CRISP-DM.(100 Marks)
Solution :
CRISP – DM Methodology
CRISP – DM stands for cross-industry process for data mining. It provides a structured approach to
planning a data mining project. It is commonly used as an open standard strategy for developing
methodological business proposals for big data project regardless of domain or destination. The
models also provides opportunities for software platforms that help perform or augment some of these
tasks.
Phases of CRISP-DM Methodology
There are 6 phases of CRISP-DM Methodology :
1. Business Understanding : Customer need analysis is the crucial part of any business. Data
mining projects are no exception and CRISP-DM focuses on that.
This phase focuses on understanding the objectives and requirements of the project. Key tasks
are :
• Determine Business Objective
• Situation Assessment
• Determine Data Mining Goals
• Project Plan Production.
2. Data Understanding : It drives the focus to identify, collect, and analyze the data sets that
can help you accomplish the project goals. This phase also has four tasks :
• Collect initial data
• Describe Data
• Explore Data
• Verify Data Quality
3. Data Preparation : A common rule of thumb is that 80% of the project is data preparation.
This phase, which is often referred to as “data munging”, prepares the final data set(s) for
modeling. It has five tasks:
• Select Data
• Clean Data
• Construct Data
• Integrate Data
• Format Data
4. Modeling : In this phase various models are build and assessed based on several different
modeling techniques. This phase has four tasks:
• Select Modeling Techniques
• Generate Test Design
• Build Model
• Assess Model
5. Evaluation : Whereas the Assess Model task of the Modeling phase focuses on technical
model assessment, the Evaluation phase looks more broadly at which model best meets the
business and what to do next. This phase has three tasks:
• Evaluate Results
• Review Process
• Determine Next Step
J P Morgan – Introduction
1. Business Understanding
• Objective : Automate the classification of legal documents to reduce review time and
improve accuracy.
• Business Goals :
➢ Enhance efficiency by cutting down document review time.
➢ Minimize errors in loan servicing.
➢ Extend COIN’s functionality to more complex legal documents.
• Data Mining Goals : Develop a model to classify clauses in legal documents into
predefined attributes accurately.
2. Data Understanding
• Data Collection:
➢ Compile a dataset of annotated legal documents including commercial loan,
agreements, credit – default swaps and custody agreements.
➢ Gather metadata such as document type, date and relevant attributes.
• Data Exploration:
➢ Perform exploratory data analysis to understand data structure and
distribution.
➢ Identify patterns and common features in the document.
➢ Assess data quality, noting any gaps or inconsistencies.
• Data Quality:
➢ Address missing, incomplete or inconsistent data.
➢ Validate the accuracy and relevance of annotations.
3. Data Preparation
• Data Cleaning:
➢ Handle missing values by imputation or removal.
➢ Normalize and standardize text data.
• Data Transformation:
➢ Transform data using tools like MS-Excel, SQL or Python.
• Data Integration:
➢ Merge data from different sources to create a comprehensive dataset.
4. Modeling
• Model Selection:
➢ Choose appropriate algorithms for text classification, to use in AI Models and
Deep Learning Models.
• Model training:
➢ Split data into training, validation and test sets.
➢ Train multiple models and finetune hyperparameters using cross validation.
5. Evaluation
• Model Assessment
➢ Ensure the selected models meets business objectives and performs well on
unseen data.
➢ Conduct a thorough review to identify any biases or limitations.
• Stakeholder Review
➢ Present results to stakeholders and gather feedback.
➢ Ensure the models predictions are interpretable and actionable.
➢ Make necessary adjustments based on feedback.
6. Deployment
• Implementation:
➢ Deploy the model into the production environment, integrating it with
existing systems.
➢ Develop a user interface for easy access to the model’s predictions.
• Monitoring
➢ Setup monitoring to track model performance and detect any drift over time.
➢ Regularly update the model with new data to maintain accuracy.
• Documentation:
➢ Document the entire, including data sources, model techniques and
evaluation results.
➢ Create user guides and training materials for business users.
• Communication:
➢ Share insights and recommendations with relevant teams (e.g. legal,
compliance)
➢ Use the model to inform decision making and improve business process.
CONCLUSION
JP Morgan’s COIN software represents a significant advancement in legal document review,
demonstrating the potential of AI to revolutionize the industry. As regulations continue to evolve, the
adoption of AI power solutions like COIN will become increasingly crucial for financial institutions
and legal professionals.