(Revolution Team) Data tokenization service for Data Scientists

The Revolution Team, made up by Rafael, Pablo and Matteo and mentored by Ángel, implemented during their participation in the Onesait Plaform initiative a complete application for data tokenization for data scientists based entirely on platform capabilities.

The challenge:

When dealing with confidential information from proprietary external systems, data analysts or developers often encounter restrictions in their access and use. They then resort to mock or test data, which normally do not reflect all the casuistry present in the original datasets, to implement their algorithms and developments.

This causes that the obtained results are not the correct ones, and entails to make multiple iterations until the suitable development is achieved (causing delays in the phase of production).

For this purpose, they developed the Tokenify application that will allow, given a data file, to select which fields should be transformed to tokenized values and to carry out their transformation, obtaining as a result another file with the tokenized fields.

To provide greater flexibility, Tokenify provides 3 methods of tokenization:

  • FPE, format-preserving encryption, which transforms the values through encryption but preserves the original format of the data so that they maintain the properties that allow the suitability of the algorithms to be verified.
  • AES, symmetric encryption, which also uses encryption but does not preserve the format. This tokenization technique is more secure but less convenient.
  • Random map, which uses a trivial obfuscation technique. It is the least secure technique in computer terms, but also the fastest tone.

The features incorporated in the application are:

  • Reception of files to be tokenized (In this first version, only .CSV).
  • Selection of sensitive fields and of the tokenization technique to be applied.
  • Tokenization according to the three described techniques.
  • Delivery of the key used in the process so that the user can reverse the process.
  • Platform dashboard for activity tracking.

Platform components

Main modules:

  • Identity SSO: The authentication services provided by the platform are used. The protocol used is Oauth2.
  • Flow Engine: Defines the business flow and integrates the algorithms in the business flow.
  • Semantic Models: A data model has been created to store the application used in the information for each user.
  • Notebooks: The implementation of the 3 tokenization methods has been done through platform notebooks, using the Python interpreter. Its integration is done through the REST interface provided by the platform.
  • API Manager: It makes available the REST services that implement the functionality. It also provides a REST API to access to the utility’s audit data.
  • DataHub – Binary Repositoy: Allows to isolate from the file system and delegates all its management to the Platform. Besides, it is especially useful in this case since it adds the possibility of sharing the tokenized file among different authorized users.
  • Dashboards: The data generated by the ontology are shown in a dashboard built in the platform. These dashboards are exportable as image and in pdf format.
  • Web Projects: The web interface developed is displayed in the platform using the Web Projects module.
  • Marketplace: The application has been made available as a resource in the platform market to make it accessible to potential users.

What they have achieved:

✍🏻 Author(s)

Leave a Reply

Your email address will not be published. Required fields are marked *