Virtual Buckets on S3

Available since Release 6.0.0

Introduction

This functionality allows a platform administrator user to segment an S3 bucket (either AWS S3 or MinIO) in virtual buckets assigned to different platform users.

In this way, without the need to have different physical buckets, I can use each virtual bucket for a specific topic, having separated its uses (datamart, staging,...).

How to use it?

Required set-up configuration

In order to enable this functionality, it’s necessary to complete some previous steps:

  • Create a new metastorage service in the CaaS configured for this S3 repository. It’s possible to use the same Presto Metastorage image of the plataform. In this tutorial, we’ll create a new service called presto-metastore-server-aws. The current version of the image is:

presto-metastore-server:5.0.0

And by setting the enviroment vars to AWS and the service URL

- MINIO_ROOT_USER → with the Access Key

- MINIO_ROOT_PASSWORD → secret key

- MINIO_SERVER_ENDPOINT → endpoint http/https of S3 service

image-20240326-140318.png

This will result in the service up and running for AWS

image-20240326-140736.png
  • Configure the S3 system in centralized configuration of the platform. In Platform configuration setting set the path in onesaitplatform/env/externals3 property

By default, this settings will be already configured, but it must be them set like this:

onesaitplatform/env/database/prestodb-externals3-catalog → presto catalog name (externals3 by default)

onesaitplatform/env/database/prestodb-externals3-schema → presto schema name (default by default)

  • Create a new presto catalog in platform (with the same name of the onesaitplatform/env/database/prestodb-externals3-catalog setting) and it must be configured for the previous created metastorage url (hive.metastorage.url setting) with the following settings:

Creation of AWS S3 Bucket

After previous steps and with the right credentials of AWS, next step will be access to the AWS console:

Then, navigate to the Amazon S3 page

And finally, click on “Create bucket“ button in order to access the creation form. Inside that, we’ll fill all the inputs and create our AWS bucket:

After that, the system notify us with the creation of the bucket and it’ll apear in the bucket list

Asocciated Virtual Bucket creation in plataform

In platform, with an user with administrator role, we’ll navigate to the Virtual Buckets Management section

We’ll click on create and we’ll start to fill all the input fields. We’ll also can see all the AWS Buckets in S3 Bucket Name dropdown

We’ll select the new created bucket and fill all the input fields

After clicking on create button we’ll see the detail and the full generated path

In this moment, it’ll be important to authorise to some user in order to use this new Virtual Bucket for entity creation, in the example it’ll be create entities in path “data/input” in thr AWS Bucket onesaitdatamart

Entity creation in the Virtual Bucket

Finally, we can create the entity in this Virtual Bucket with the authorised user of the previous step.

When we log in we navigate to the virtual bucket list that show us which bucket are allowed to use. With this user we're not allowed to create, edit or delete Virtual Buckets,

In order to create the new entity for this Virtual Bucket, we’ll navigate to the “Create entity in Historical Database“ option:

After that, the Create Entity from Virtual Bucket will appear

Using a similar way to the historical entities, we’ll fill the different options of the form for the entity creation.

Below, we can select the entity location with the Virtual Bucket Identification dropdown. After that, if we update the SQL (with Update SQL button) we’ll see the new SQL sentence with the real bucket location in EXTERNAL_LOCATION attribute.

Finally, if we click on create button, we’ll create our new entity

If we navigate to the AWS Console, we can see how the full path for the new entity has been created

Entity operations

It will be posible to insert data that will appear as a new file in the AWS S3

 

And query them: