It’s time to start embedding our common chemical language from GlobalChem into ChatGPT so folk can query data effectively be able to use it. Our goal is to extend our discord application to pulling data from GlobalChem
So let’s get into it. First let’s handle our imports. We will be using langchain as the main connector tool.
pip install langchain
pip install python-dotenv
pip install pandas
Let’s import the packages and submodules we will need
import os
import pandas as pd
from dotenv import load_dotenv
from langchain.agents import create_pandas_dataframe_agent
from langchain.memory import ConversationBufferWindowMemory
from langchain import OpenAI, ConversationChain, LLMChain, PromptTemplate
For our integration we are going to need an OpenAPI key which is fairly simple to setup.
Create a .env
file and store your key and load the environment.
load_dotenv()
OPENAI_API_KEY= os.getenv('OPENAI_API_KEY')
Next we want to load GlobalChem
into a dataframe:
globalchem_dataframe = pd.read_csv(
'https://raw.githubusercontent.com/Global-Chem/global-chem/development/global_chem/global_chem_outputs/global_chem.tsv',
sep='\t',
header=None,
names=['name', 'smiles', 'node', 'predicate', 'path']
)
And now we want to create an agent that is only responsible for the GlobalChem
data.
agent = create_pandas_dataframe_agent(
OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY), df,verbose=True
)
We are using the OpenAPI
object where the temperature
means the diversity of the text generated from the agent. A higher temperature means more diverse words could be produced and vice verse for a lower temperature. We set ours at 0
because we don’t want it to be diverse. We want it directly from the database.
To test it we can do something pretty simple:
agent.run('return a list of names of the node rings_in_drugs')
Here we can see the agent is now pulling data from our GlobalChem repository. Next we want to add our ChatGPT from OpenAI. The ConversationBufferWindowMemory
means the amount of interactions to store in memory between the agent and the user.
prompt = PromptTemplate(
template=template,
input_variables=[]
)
chatgpt_chain = LLMChain(
llm=OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY),
prompt=prompt,
verbose=True,
memory=ConversationBufferWindowMemory(k=100),
)
The goal is to get the user to issue commands and now different commands will query different functionality. This is an excerpt of our bot
code that issues the commands and delegates the functionality.
# Global-Chem Integration
if 'dataframe' in message.content.lower():
question = text_message.split(':')[1]
output = agent.run(question)
await message.channel.send(str(output))
# Language Chaining Chemicals
if any(word in text_message for word in langchain_keywords):
output = chatgpt_chain.predict(
human_input=message.content.lower()
)
await message.channel.send(output)
Our final result is something like this, where prompt engineering really comes in. Depending on how you prompt the agent it can have some mishaps.
Prompt Engineering
Prompt Engineering is communicating with the chat system you are integrated with to get the answer you want. It’s like when you on a customer service call and you keep pressing some series of numbers to get to a representative.
Prompt: can you give me the names of the node of rings_in_drugs?
Answer: rings_in_drugs
Prompt: in the dataframe: can you give me a list of the names of the node rings_in_drugs?
Answer: The names of the nodes in the rings_in_drugs column are: ['rings_in_drugs']
Prompt: in the dataframe: can you give me the names of the node of rings_in_drugs?
Answer: rings_in_drugs
Prompt: in the dataframe: return a list of names of the node rings_in_drugs
Answer: ['alpha-ethylmescaline', '4-allyloxy-3,5-dimethoxyphenethylamine', ...]
It took me a couple of iterations to get the prompt right to get the data I wanted back and low and behold.
The next step is to integrate the tools part of langchaining to combine agents together. Stay tuned.
All the code is available here: