DEV Community

Cover image for YugabyteDB as a Graph database with PuppyGraph
Franck Pachot for YugabyteDB

Posted on

YugabyteDB as a Graph database with PuppyGraph

PostgreSQL can handle any workload without streaming data to a purpose-built database. This includes graph databases using PuppyGraph, which maps relational tables to a graph model.
When resilience, scalability, and elasticity are necessary, YugabyteDB provides a distributed alternative to PostgreSQL.

This article details the step-by-step tutorial to using PuppyGraph to query data in PostgreSQL, but using YugabyteDB instead of PostgreSQL.


Deployment

I Create a file docker-compose.yaml with the following content:

services:
  puppygraph:
    image: puppygraph/puppygraph:stable
    pull_policy: always
    container_name: puppygraph
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    networks:
      yugabytedb_net:
    ports:
      - "8081:8081"
      - "8182:8182"
      - "7687:7687"
  yugabytedb:
    image: yugabytedb/yugabyte:2024.1.2.0-b77
    command: yugabyted start --enable_pg_parity_tech_preview --background=false
    environment:
      - YSQL_USER=puppygraph
      - YSQL_PASSWORD=puppygraph123
    networks:
      yugabytedb_net:
    ports:
      - "5433:5433"
    volumes:
      - ./yugabyte-data:/root/var/data
networks:
  yugabytedb_net:
    name: puppy-yugabytedb
Enter fullscreen mode Exit fullscreen mode

I run the following command to start YugabyteDB and PuppyGraph:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

I can connect on http://localhost:8080/

Image description

Image description


Data Preparation

I run the following command to start ysqlsh, the PostgreSQL shell, to access the database:


docker compose run -it -e PGPASSWORD=puppygraph123 -e PGUSER=puppygraph -e PGHOST=yugabytedb -e PGPORT=5433 -e PGDATABASE=yugabyte yugabytedb ysqlsh 

Enter fullscreen mode Exit fullscreen mode

I execute the following SQL statements in the shell to create tables and insert data:

create schema modern;
create table modern.person (id text, name text, age integer);
insert into modern.person values
                              ("v1", "marko", 29),
                              ("v2", "vadas", 27),
                              ("v4", "josh", 32),
                              ("v6", "peter", 35);

create table modern.software (id text, name text, lang text);
insert into modern.software values
                                ("v3", "lop", "java"),
                                ("v5", "ripple", "java");

create table modern.created (id text, from_id text, to_id text, weight double precision);
insert into modern.created values
                               ("e9", "v1", "v3", 0.4),
                               ("e10", "v4", "v5", 1.0),
                               ("e11", "v4", "v3", 0.4),
                               ("e12", "v6", "v3", 0.2);

create table modern.knows (id text, from_id text, to_id text, weight double precision);
insert into modern.knows values
                             ("e7", "v1", "v2", 0.5),
                             ("e8", "v1", "v4", 1.0);

analyze;

Enter fullscreen mode Exit fullscreen mode

Modeling a Graph

I use the PuppyGraph schema file schema.json with the following content:

{
  "catalogs": [
    {
      "name": "postgres_data",
      "type": "postgresql",
      "jdbc": {
        "username": "puppygraph",
        "password": "puppygraph123",
        "jdbcUri": "jdbc:postgresql://yugabytedb:5433/yugabyte",
        "driverClass": "org.postgresql.Driver"
      }
    }
  ],
  "vertices": [
    {
      "label": "person",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "person",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "type": "Int",
          "name": "age"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    },
    {
      "label": "software",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "software",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "type": "String",
          "name": "lang"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    }
  ],
  "edges": [
    {
      "label": "knows",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "knows",
        "metaFields": {
          "id": "id",
          "from": "from_id",
          "to": "to_id"
        }
      },
      "from": "person",
      "to": "person",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    },
    {
      "label": "created",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "created",
        "metaFields": {
          "id": "id",
          "from": "from_id",
          "to": "to_id"
        }
      },
      "from": "person",
      "to": "software",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

I upload the schema by selecting the file schema.json in the

Image description

Image description

Querying the graph via CLI

I execute the following command to access the PuppyGraph Gremlin Console

docker compose exec -it puppygraph ./bin/console
Enter fullscreen mode Exit fullscreen mode

I run the following queries in the console to query the Graph:

puppy-gremlin> g.V().has("name", "marko").valueMap()
Done! Elapsed time: 0.359s, rows: 1
==>map[age:29 name:marko]
puppy-gremlin> g.V().has("name", "marko").out("knows").out("created").valueMap()
Done! Elapsed time: 0.108s, rows: 2
==>map[lang:java name:lop]
==>map[lang:java name:ripple]
puppy-gremlin>
Enter fullscreen mode Exit fullscreen mode

SQL Queries

I refresh the local cache to check from pg_stat_statements that the queries are fast:

Image description

yugabyte=# select total_time/calls as time, calls, substr(query,1,80)
 from pg_stat_statements order by 1
;
Enter fullscreen mode Exit fullscreen mode

Image description

Scale-Out YugabyteDB

To add more nodes to the YugabyteDB cluster, you can add a service that starts YugabyteDB and join the first node. Add the following to docker-compose.yaml:

  more-nodes:
    image: yugabytedb/yugabyte:2024.1.2.0-b77
    command: yugabyted start --join yugabytedb --enable_pg_parity_tech_preview --background=false
    networks:
      yugabytedb_net:
    deploy:
     replicas: 0
Enter fullscreen mode Exit fullscreen mode

Start a second node with:

docker compose up -d --scale more-nodes=1
Enter fullscreen mode Exit fullscreen mode

Start a third node with:

docker compose up -d --scale more-nodes=2
Enter fullscreen mode Exit fullscreen mode

With three nodes, the cluster is automatically set to Replication Factor 3 and becomes resilient to one node failure

Conclusion

This tutorial demonstrates using PuppyGraph to query YugabyteDB as a graph database, leveraging the PostgreSQL compatibility of YugabyteDB"s distributed architecture. It follows the simple deployment steps from PuppyGraph documentation.

With PuppyGraph, you can directly query your existing data stores without needing ETL pipelines. PuppyGraph and YugabyteDB together provide a powerful solution for modern graph-based analytics.


YugabyteDB. The Distributed SQL Database for Mission-Critical Apps

YugabyteDB is the 100% open source cloud native database for mission critical applications. YugabyteDB runs in any public or hybrid cloud.

favicon yugabyte.com

Top comments (0)