The proof of concept (PoC) described here is intended to demonstrate the feasibility of developing an ontology and a knowledge graph based on simple product data and implementing it in SAP HANA Cloud. Although the data structure was deliberately kept simple, the product name and product description columns contained extensive unstructured information, including additional features and entities. The goal was to extract this “hidden” information and make it queryable to extend the classic relational data structure.
Technologies and frameworks
The project utilized the SAP HANA Cloud Graph Engine to visualize entities and relationships, OpenCypher for querying, LangChain with OpenAI to extract entities from unstructured data, and hana-ml to link Python to SAP HANA. Together, these technologies provided a flexible framework that combines structured and unstructured data in a dynamic ontology.
Workflow and methodology
The workflow was divided into three main phases:
- Creating the ontology: First, an ontology was created consisting of predefined entities and new entities extracted from the product descriptions. The unstructured descriptions were processed using NLP techniques to identify relevant features and relationships.
- Validation with a subset of data: A subset of data was used to verify the quality of the ontology. Entities and relationships that could not be confirmed by the ontology were discarded to ensure that only validated information was included in the knowledge graph.
- Dynamic Adaptation: In the long term, the goal was to dynamically expand the ontology to automatically incorporate new information from future data sets and continuously enhance the graph.
Challenges and Solutions
Challenges included entity extraction from unstructured data, ontology validation, and automating the ontology’s evolution with new data. A crucial element was the versioning of input prompt templates to maintain overall consistent semantic extraction logic.
Results
The PoC successfully led to the creation of an ontology based on structured and unstructured product information. The SAP HANA Graph Engine and Graph Viewer enabled visual exploration and analysis of the graph. This demonstrated the potential of dynamic ontologies that can be automatically extended to gain new insights from the data.
Future developments
During SAP TechEd 2024, the introduction of a new SAP HANA Knowledge Graph Engine was announced. The release is planned for Q1/2025. This new engine is designed to support advanced querying and representation standards (Turtle, RDF), providing additional flexibility and open standards for ontology- and graph-based applications.
Author: Philipp Nell, Solution Architect Data Management and AI, Sulzer España
Do you want to know more?
Get in touch wiht our Experts!