How to use Generative AI Securely

by | Apr 2, 2024 | Blog

Allen Perkins – Allen began working with data over 35 years ago. Aligning business needs, constructing queries, gathering data, building data models, conducting analysis, preparing visualizations, generating recommendations, making it all work faster, and then iterating for continuous improvement and alignment with security, privacy, efficiency, and business objectives. The overarching process has not changed, the tools on the other hand.

From mainframe virtual storage access method and hierarchical file structures and early versions of relational databases in the 1980s, to modern commercial, open source, and cloud-based data environments; from analysis written in C, and later SQL and SAS, to R, Python, and Julia. Allen leverages modern machine learning and artificial intelligence tools and techniques to support data exploration, optimization, and prediction.

Allen is a dedicated technologist who constantly learns, leverages, and applies the latest proven tools, techniques, and processes for our data practice clients.

In the complex world of banking, errors in coding loans can pose substantial compliance and auditing risks. A recent whitepaper explores an innovative solution to this problem, leveraging the power of Generative AI, more specifically, Retrieval Augmented Generation (RAG).

This particular client had been struggling with inaccuracies in their loan coding procedures. The aim of this project was to utilize RAG to identify and rectify loan coding errors. RAG’s ability to use a local language model instead of one hosted by a third-party allowed us to securely augment a large language model (LLM) with a corpus of correctly coded loans.

Why Can Generative AI Become a Security Risk?

While Generative AI, such as the Retrieval Augmented Generation (RAG) model, introduces remarkable efficiencies and error reduction in tasks like loan coding, it is crucial to recognize the potential security risks associated with its implementation in business operations.

One significant concern lies in the model’s need to access a vast corpus of data to learn and make predictions. This exposure potentially puts sensitive information at risk, particularly if the AI system encounters a security breach. Furthermore, the generative nature of these models can sometimes produce unexpected outcomes, including the generation of data that mimic proprietary or confidential information.

The most recent McKinsey Global Survey on AI revealed that advancements in generative AI (gen AI) had driven 40% of organizations to plan an increase in their overall AI investment. However, readiness for the broad application of gen AI and understanding of its potential risks seems lacking among many companies. Over half of the surveyed organizations (53%) recognized cybersecurity as a risk associated with gen AI, but only a mere 38% are taking steps to address this risk.

It’s imperative for businesses to implement robust data protection measures, ensuring that the AI models are trained in secure environments and that their learning databases do not inadvertently expose or replicate sensitive data. Balancing the innovative capabilities of Generative AI with rigorous security protocols is essential in harnessing its potential while safeguarding against vulnerabilities.

How Our Doozers Navigated This Unique Challenge

Think of all the data you had to disclose the last time you applied for a loan. Would you want that data sent across the internet to a cloud-based generative AI company? Well, neither did our client.

Large language models (LLM) are certainly impressive at answering questions by drawing on their immense stores of parameters. However, building your own LLM can use expensive computing resources, assuming you even have access to them, and sending your internal company confidential and proprietary documents to a third-party may create a security risk. Here is the approach we followed for our client.

There are many LLMs available for download and each was designed with a different purpose or use case in mind. First, we evaluated different LLMs and selected several to validate against our client’s specific task. Second, we integrated knowledge from our client’s corpus of loan documents with different LLMs by tokenizing, encoding, and vectorizing that corpus. During this process, all activities were performed on a local copy of the LLM and as an added precaution, the machine we used had no outbound access to the internet. Third, loans previously unseen by the now-augmented LLM were presented, and the model suggested codes along with a ranked list of similar loans from the corpus where those codes were derived. Finally, after some iterating to find the best LLM to augment and the best approach for tokenizing, we were able to correctly code 75-80% of the unseen loans.

We learned that by having the model return several of the documents it relied on, we were better able to focus on the tokenizing of the source documents. We knew from our experience with full-text search, pattern matching, and deduplications in other natural language processing tasks, that a less meaningful word in one context, may be highly relevant in another. Bringing this experience to RAG proved valuable.

Our approach was methodical and focused on privacy first and foremost. Maintaining the confidentiality of our client’s sensitive information was paramount. This endeavor underscored the potential of Generative AI in revolutionizing a traditional business process, even in sectors as regulated and sensitive as banking. It was a learning curve for our team as well, pushing us to innovate while upholding the highest standards of data security, privacy, and confidentiality.

Leveraging Generative AI is Key to Efficient and Accurate Processes

Data security and privacy are paramount in today’s digital landscape, making careful consideration of these factors crucial when implementing AI technologies. That said, using RAG, when combined with careful selection and tokenization of documents, could lead to considerable savings, both in terms of time and funds, by minimizing the need for manual processing in back-office operations.

Taking a step further, Generative AI has the potential not only to perfect coding processes but also to contribute significantly to business efforts. It can be utilized to populate drop-down menus with suggested codes, analyze conversations, and offer suitable codes based on the information exchanged. As such, it can efficiently assist loan officers in correctly categorizing loan applications. Additionally, it can boost upselling, cross-selling, and suggestive selling tactics in marketing campaigns.

Our exploration into RAG’s application within the context of correcting loan coding inaccuracies has not only demonstrated the model’s capability to significantly minimize errors but also highlighted the critical importance of ensuring stringent data privacy and security measures.

The success of this project serves as a testament to the potential of Generative AI to revolutionize industry standards, offering a glimpse into a future where AI-driven solutions become integral to overcoming traditional challenges in banking and beyond. As we move forward, it is vital for businesses to continue to tread the fine line between innovation and security, ensuring that advancements in AI are leveraged responsibly and ethically.