In Part 1, we created a Neo4J instance with our Global-Chem ported into the Aura Database.
Our next goal is to come up with a way to have users interface with the database to query molecules. ChatGPT offers an easy to use interface that can allow us to start transferring information relevant to the public of chemical lists from different chemical communities.
If we follow the same thing as Part 1 and create the Neo4j Database we are going to establish the cypher schema with ChatGPT using APOC (Awesome Procedures on Cypher) using Neo4j Inputs.
Schema Setup
First login into the Neo4j:
We can type in on the Aura Database Query:
CALL apoc.meta.stats
YIELD labels, relTypes
Which give us this in our Text, format:
╒═══════════════════════════════════════════╤══════════════════════════════════════════════════════════════════════╕
│"labels" │"relTypes" │
╞═══════════════════════════════════════════╪══════════════════════════════════════════════════════════════════════╡
│{"Molecule":3256,"Category":37,"Name":2930}│{"()-[:IN_CATEGORY]->()":8993,"()-[:NAMED]->()":3256,"()-[:IN_CATEGORY│
│ │]->(:Category)":8993,"(:Name)-[:NAMED]->()":3256,"()-[:NAMED]->(:Molec│
│ │ule)":3256,"(:Molecule)-[:IN_CATEGORY]->()":8993} │
└───────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────┘
We can then copy and paste that with the following message into ChatGPT:
I have a Neo4j Database with the Following Schema
Response:
Now Copy/Paste the APOC schema:
Property Setup
Next we set up the nodes and relationships, type this query to fetch the names, categories and the MOLECULE as properties for each category with the following query.
MATCH (n)
UNWIND keys(n) AS key
RETURN labels(n)[0] AS label, collect(distinct key) AS propertyKeys
UNION
MATCH ()-[r]->()
UNWIND keys(r) AS key
RETURN type(r) AS label, collect(distinct key) AS propertyKeys
And our response text back:
╒══════════╤═══════════════╕
│"label" │"propertyKeys" │
╞══════════╪═══════════════╡
│"Category"│["category"] │
├──────────┼───────────────┤
│"Name" │["name"] │
├──────────┼───────────────┤
│"Molecule"│["id","smiles"]│
└──────────┴───────────────┘
We can then query the ChatGPT with the
Nodes and Relationships can have the following properties
Querying Molecules
Alright, now that our relationship schema is setup we can now start generating queries to put into our Neo4J schema:
Let’s take this query:
MATCH (m:Molecule)-[:IN_CATEGORY]->(c:Category)
WHERE c.category = "environment"
RETURN m
And then put into our AuraDB instance which then gives us our returned SMILES list.
From here we can do a lot more in terms of relationship setup and other complexities. ChatGPT offers an easy way for the public to query data into a Neo4j database. With Global-Chem being an open source chemical list storage managed by the public it would be very cool to have this be an official data connection.
Stay tuned for Part 3.