PyGWalker integration in Notebooks

Available since version 6.2.0-Xenon

Introduction

Onesait Platform integrates PyGWalker as a data analysis and visualisation tool for Jupyter, which turns Pandas DataFrames into an interactive user interface for visual exploration.

image-20240917-140058.png

About PyGWalker

PyGWalker (pronounced ‘Pig Walker’) is short for ‘Python binding of Graphic Walker’. It integrates Jupyter Notebook with Graphic Walker, an open source alternative to Tableau. It allows data scientists to both visualise and cleanse and annotate data with simple drag-and-drop operations, and even perform natural language queries.

The following video explains how it works:

https://www.youtube.com/watch?v=rprn79wfB9E

How to use PygWalker from Notebooks

Create a new Notebook

First of all, you will have to create a new Notebook. To do this, from Control Panel and with an ‘administrator’ or ‘analyst’ account, navigate to the menu ML & AI > My Notebooks.

image-20241211-150925.png

From the list of Notebooks, a new one will be created by clicking on the ‘+’ button:

The first thing to do is to indicate the name of the Notebook:

Once this is done, you can proceed to create the Notebook.

Configuring the Notebook

The first step is to install the PyGWalker library. This will be done in the first paragraph of the experiment by pip, previously invoking the Zeppelin shell interpreter:

%sh pip install pygwalker

When the paragraph is executed, the necessary library and dependencies will be installed.

In a second paragraph, the ‘pandas’ and ‘pygwalker’ libraries will have to be imported via the Python interpreter:

%python import pandas as pd import pygwalker as pyg

With this, the environment is already set up and ready to work.

Load data to the Notebook

As example data, we will use the following CSV file found in a GitHub repository: https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv

To include it in the Notebook, a new paragraph will be created with the Python interpreter where it will be read as a CSV file:

%python iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

That done, the next step is to instruct PyGWalker to interactively parse and explore the data using graphs from the CSV that has been entered. This will be done using the pyg.walk() function:

If you run this paragraph, you will see that it outputs a huge string of alphanumeric code, but nothing really visual.

To be able to display it correctly, the result must be printed, invoking the function of walker.to_html(), which will generate the data viewer:

The result will look like the following:

Full Code