PyGWalker integration in Notebooks
Available since version 6.2.0-Xenon
Introduction
Onesait Platform integrates PyGWalker as a data analysis and visualisation tool for Jupyter, which turns Pandas DataFrames into an interactive user interface for visual exploration.
About PyGWalker
PyGWalker (pronounced āPig Walkerā) is short for āPython binding of Graphic Walkerā. It integrates Jupyter Notebook with Graphic Walker, an open source alternative to Tableau. It allows data scientists to both visualise and cleanse and annotate data with simple drag-and-drop operations, and even perform natural language queries.
The following video explains how it works:
How to use PygWalker from Notebooks
Create a new Notebook
First of all, you will have to create a new Notebook. To do this, from Control Panel and with an āadministratorā or āanalystā account, navigate to the menu ML & AI > My Notebooks.
From the list of Notebooks, a new one will be created by clicking on the ā+ā button:
The first thing to do is to indicate the name of the Notebook:
Once this is done, you can proceed to create the Notebook.
Configuring the Notebook
The first step is to install the PyGWalker library. This will be done in the first paragraph of the experiment by pip, previously invoking the Zeppelin shell interpreter:
%sh
pip install pygwalker
When the paragraph is executed, the necessary library and dependencies will be installed.
In a second paragraph, the āpandasā and āpygwalkerā libraries will have to be imported via the Python interpreter:
%python
import pandas as pd
import pygwalker as pyg
With this, the environment is already set up and ready to work.
Load data to the Notebook
As example data, we will use the following CSV file found in a GitHub repository: https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv
To include it in the Notebook, a new paragraph will be created with the Python interpreter where it will be read as a CSV file:
%python
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
That done, the next step is to instruct PyGWalker to interactively parse and explore the data using graphs from the CSV that has been entered. This will be done using the pyg.walk()
function:
If you run this paragraph, you will see that it outputs a huge string of alphanumeric code, but nothing really visual.
To be able to display it correctly, the result must be printed, invoking the function of walker.to_html(), which will generate the data viewer:
The result will look like the following: