Neo4j
Neo4j is a graph database management system developed by
Neo4j, Inc
.
The data elements
Neo4j
stores are nodes, edges connecting them, and attributes of nodes and edges. Described by its developers as an ACID-compliant transactional database with native graph storage and processing,Neo4j
is available in a non-open-source "community edition" licensed with a modification of the GNU General Public License, with online backup and high availability extensions licensed under a closed-source commercial license. Neo also licensesNeo4j
with these extensions under closed-source commercial terms.
This notebook shows how to use LLMs to provide a natural language interface to a graph database you can query with the
Cypher
query language.
Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.
Setting upโ
You will need to have a running Neo4j
instance. One option is to create a free Neo4j database instance in their Aura cloud service. You can also run the database locally using the Neo4j Desktop application, or running a docker container.
You can run a local docker container by running the executing the following script:
docker run \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-d \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_PLUGINS=\[\"apoc\"\] \
neo4j:latest
If you are using the docker container, you need to wait a couple of second for the database to start.
from langchain_neo4j import GraphCypherQAChain, Neo4jGraph
from langchain_openai import ChatOpenAI
graph = Neo4jGraph(url="bolt://localhost:7687", username="neo4j", password="password")
We default to OpenAI models in this guide.
import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
Seeding the databaseโ
Assuming your database is empty, you can populate it using Cypher query language. The following Cypher statement is idempotent, which means the database information will be the same if you run it one or multiple times.
graph.query(
"""
MERGE (m:Movie {name:"Top Gun", runtime: 120})
WITH m
UNWIND ["Tom Cruise", "Val Kilmer", "Anthony Edwards", "Meg Ryan"] AS actor
MERGE (a:Actor {name:actor})
MERGE (a)-[:ACTED_IN]->(m)
"""
)
[]
Refresh graph schema informationโ
If the schema of database changes, you can refresh the schema information needed to generate Cypher statements.
graph.refresh_schema()
print(graph.schema)
Node properties:
Movie {runtime: INTEGER, name: STRING}
Actor {name: STRING}
Relationship properties:
The relationships:
(:Actor)-[:ACTED_IN]->(:Movie)
Enhanced schema informationโ
Choosing the enhanced schema version enables the system to automatically scan for example values within the databases and calculate some distribution metrics. For example, if a node property has less than 10 distinct values, we return all possible values in the schema. Otherwise, return only a single example value per node and relationship property.
enhanced_graph = Neo4jGraph(
url="bolt://localhost:7687",
username="neo4j",
password="password",
enhanced_schema=True,
)
print(enhanced_graph.schema)
Node properties:
- **Movie**
- `runtime`: INTEGER Min: 120, Max: 120
- `name`: STRING Available options: ['Top Gun']
- **Actor**
- `name`: STRING Available options: ['Tom Cruise', 'Val Kilmer', 'Anthony Edwards', 'Meg Ryan']
Relationship properties:
The relationships:
(:Actor)-[:ACTED_IN]->(:Movie)
Querying the graphโ
We can now use the graph cypher QA chain to ask question of the graph
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0), graph=graph, verbose=True, allow_dangerous_requests=True
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': 'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'}
Limit the number of resultsโ
You can limit the number of results from the Cypher QA Chain using the top_k
parameter.
The default is 10.
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
verbose=True,
top_k=2,
allow_dangerous_requests=True,
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}][0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': 'Tom Cruise, Val Kilmer played in Top Gun.'}
Return intermediate resultsโ
You can return intermediate steps from the Cypher QA Chain using the return_intermediate_steps
parameter
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
verbose=True,
return_intermediate_steps=True,
allow_dangerous_requests=True,
)
result = chain.invoke({"query": "Who played in Top Gun?"})
print(f"Intermediate steps: {result['intermediate_steps']}")
print(f"Final answer: {result['result']}")
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m
[1m> Finished chain.[0m
Intermediate steps: [{'query': "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\nWHERE m.name = 'Top Gun'\nRETURN a.name"}, {'context': [{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}]}]
Final answer: Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.
Return direct resultsโ
You can return direct results from the Cypher QA Chain using the return_direct
parameter
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
verbose=True,
return_direct=True,
allow_dangerous_requests=True,
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': [{'a.name': 'Tom Cruise'},
{'a.name': 'Val Kilmer'},
{'a.name': 'Anthony Edwards'},
{'a.name': 'Meg Ryan'}]}