A proof-of-concept description of extracting, unifying, and simplifying internal and external compliance requirements using Large Language Models (LLM), Latent Dirichlet Allocation (LDA) algorithm and other publicly available libraries.
In a dynamic business environment where policy and customer requirements are dispersed across various documents and resources, efficiently understanding, and managing these requirements is merely the starting point. The true endeavor extends far beyond – it involves developing a detailed understanding of your compliance posture against those stipulated requirements, culminating in a system that facilitates swift and informed responses to both customer and internal queries regarding your compliance status. This approach not only mitigates risks but fosters a culture of transparency and trust, essential in today's business landscape.
Through this solution description, we explain how CyVidia employs Large Language Models (LLM) to extract, standardize, and analyze these dispersed requirements, creating a unified compliance framework. This framework not only ensures a holistic understanding of compliance landscapes but also facilitates informed and efficient responses to customer and internal queries.
This solution was recently implemented as part of a multi-stage customer proof of concept (POC).
AI Enabled Cybersecurity Compliance - Proof of Concept Overview
As a large organization, it is challenging to understand and communicate cybersecurity posture against overlapping customer and internal & regulatory directives.
Why It Happens
External and internal requirements are spread over multiple documents and repositories making it harder to have a single view of the obligations.
Leverage LLMs and generative probabilistic models to extract, normalize, simplify, assess, and then efficiently answer internal and customer queries based on curated information.
Proof of Concept Stages
Stage 1: Requirements Extraction
As a starting point for unifying requirements, we extracted organization’s requirements scattered in multiple PDF files using python extraction libraries and placed them in a single location. Additionally, we extracted key words from the requirements and transformed them into LLM embeddings or vectors. These will be later mapped to the vectors created from customer and internal questions to allow for appropriate responses.
Stage 2: Topic Modeling with LDA
We utilized Latent Dirichlet Allocation (LDA) for topic modeling against the extracted requirements, an essential step in categorizing and assigning relevant topics to the individual requirements. The accuracy of assigning standardized topics to requirements laid the groundwork for organized and targeted compliance management.
Stage 3: Grouping Similar Requirements
Progressing from the topic modeling stage, we implemented clustering algorithms such as K-Means and DBSCAN to group similar requirements. By concentrating on the standardized topics, we simplified the data into distinct and manageable sets, enhancing the ease of subsequent analyses.
Stage 4: Unifying requirements into control questions and statements
During this stage we identified control questions that encapsulate the central idea of each requirement group. Utilizing tools like NLTK and spaCy, we defined the intent for each group, facilitating clarity and creating actionable measurement controls. Lastly, we mapped the control questions to NIST guidelines as required by the customer.
Stage 5: Internal Assessment and Control Statement Repository
In this stage, a manual approach was undertaken to understand the compliance posture against the unified requirements. This resulted in detailed internal assessment and answers for the control questions to help create a repository of “golden questions and answers”. This curated data set will be leveraged both internally and externally, fostering a comprehensive understanding and establishing a ready-to-access pool of information for swift and informed responses.
Stage 6: Chatbot Assistant Integration
To culminate our proof-of-concept, we integrated a chatbot assistant, powered to efficiently source answers from the repository of "golden questions and answers." The queries were transformed into vectors that were matched to the vectors created in the previous steps to get to the appropriate responses. Vectors significantly reduced the size of the context information sent to OpenAI LLM for the response. This AI-driven assistant stands ready to respond to a variety of queries from customers and internal users (e.g., sales teams) alike, offering prompt and accurate answers, thereby enhancing the user experience, and speeding up the response time.
As we look forward to the near future, the next evolutionary step in our customer POC is the implementation of a feature to automatically respond to customer due diligence questions coming in through spreadsheets. This enhancement promises to streamline the response process further, adding another layer of efficiency and speed to our unified compliance solution.
Through our proof of concept, we demonstrated a clear pathway to not just understanding and categorizing requirements efficiently but also preparing businesses to respond swiftly and accurately to both customer and internal queries through AI-enabled solutions.
As we move forward, our objective is clear – to continuously develop and refine AI enabled solutions that bring tangible business efficiencies, reducing the operational burden and fostering growth.