The technology behind our Data Labeling Engine: Label Studio

In Platform Release 4.2.0 a data labeling engine has been integrated, which will allow labeling the information stored in the platform, specifically on the basis of the files (stored in the FileRepository or Platform MinIO) or the Entities stored in the platform repositories.

For this purpose, the Label Studio tool has been integrated.

Label Studio is an open-source data tagging tool. It allows you to label data types such as audio, text, images, videos and time series with a simple user interface and then export to various model formats.

It can be used to prepare raw data or enhance existing training data for more accurate ML models.

Its main features are:

  • Multiple data types, such as images, audio, text, HTML, time series and video.

  • Multi-user: with multi-user registration and login, when you create an annotation it is linked to your account

  • Multiple projects to work on all your datasets in a single instance.

  • Configurable label formats that allow you to customize the visual interface to meet your specific labeling needs.

  • Import from files or from cloud storage in Amazon AWS S3, Google Cloud Storage, or JSON, CSV, TSV, RAR and ZIP files.

  • Export through the label-studio-converter module, which is a library that can take Label Studio's internal JSON-based format and output to some general purpose formats (JSON, CSV, TSV) or to model-specific formats such as CONLL for textual labelers or Pascal VOC or COCO for computer vision models.

  • Integration with machine learning models to visualize and compare predictions from different models and perform pre-labeling using the Label Studio SDK:

  • Comparison of Predictions: