GIS data extraction, transformation and loading with FME

11/07/2022 jefernandezm

In how many of the GIS projects, from those in which you constantly participate, do you need to extract alphanumeric or geographic information of interest, from other information systems of organizations, either public or private? And not only that, but you also need to transform this information and adapt it to your needs and, finally, upload it to your platforms so that it can be disseminated and exploited by your clients.

These initial loading, migration, synchronization or data dumping procedures, commonly called ETL (Extract, Transform and Load) processes, are extremely important and provide with a fundamental element: the data, just the way you need it, and in its correct location. .

Within the existing ETL platforms and products, one of the most complete and used technologies is Safe Software’s FME.

What is FME?

Basically, FME is an ETL platform mainly oriented to the treatment of geographic data. It allows you to connect applications, transform data and automate workflows, both in real time and scheduledly. It is a very versatile component, since it has numerous combinations of transformers that also allow it to work with many formats, both geographical and alphanumeric.

On the other hand, it also allows integration with many different systems and components (Autodesk AutoCAD, Esri, Excel, SAP, Oracle, PostGIS, BIM, etc.), which facilitates dynamic data exchange without the need for code development. The FME platform has three different products:

FME Desktop.
FME Server.
FME Cloud.

Each of them is explained in detail in the following sections.

FME Desktop

FME Desktop is the desktop version of the FME platform. With it, you can generate workflows for each data source you want to attack to perform the relevant transformations.

This software has a simple interface that allows you to create workspaces through drag-and-drop, where it is possible to design customized data transformation workflows through parameter settings, which can also be saved to be reused. Creating the workflows, therefore, requires no coding. However, if the basic functionalities are not enough for certain transformations, you can extend these flows using languages such as Python or R.

These workflows, which FME Desktop allows you to design and customize for the treatment of the data sources that you want to transform, are reusable for any other new source that you want to convert. This facilitates and speeds up the workflow creation process. .

Another advantage of FME Desktop is that it allows you to customize its interface to ease the organization of data transformation flows through markers, stylish connection lines, etc. Besides, the interface has the option of displaying a preview of the data at any transformation point in the workflow, without the need to save a new file on the computer where it is being operated. This preview allows you to analyze whether the designed transformations are correct for the goal that that specific flow aims. You can even visualize dynamic changes and render 3D models.

As its main functional features, FME Desktop allows for:

The integration of data to form a unified view of all the collected sources of information.
The transformation of data by modifying its structure, content and characteristics.
The conversion of the data to configure it for use in specific applications.
The integration of applications through connections, to allow the direct dissemination of data between them and to generate workflows that allow the same data to be used in different structures, depending on where they are exploited.
Data validation as a specific workflow or as an intermediate step within a specific workflow. Data validation can be validation of format, structure, data type, range (in the case of numerical data), uniqueness, consistent expressions, permissiveness of the existence or not of null values in a field, etc.

In conclusion, FME Desktop is a very useful and versatile component to perform data integration tasks in one go, and thus avoid recurring data processing and transformation tasks.

If you want to automate processes into event-driven workflows that integrate data, send notifications, or load data stored in the cloud, then it’s best to combine FME Desktop and FME Server, connecting the workflows of the former, to the later. Regarding the licensing of the product, in FME Desktop, you must define the number of users that develop and design the transformation processes.

To define the necessary edition according to the needs, this link shows the differences between the various editions of the product. These varations are basically due to the format in which you can read and write.

FME Server

FME Server is the online server product of the FME platform. Through FME Server, you can publish online the spaces and workflows created in FME Desktop, so that any user with access to FME Server can take advantage of the integration and data transformation capabilities that FME provides.

The automation of workflows in FME Server can be done by programming the execution of the flows, or by triggering their execution through the activation of a specific event. FME Server also allows you to send automatic notifications to interested parties.

Broadly speaking, the main functionalities that FME Server allows are:

Processing data in real time using event-based triggers in Automations.
Creating schedules to run workflows at regular intervals.
Running workflows in parallel with job orchestration.
Processing streaming data in real time with Streams.
Tracking exactly where the data is and how it’s integrated in a private and secure environment.
Establishing roles and rules according to data governance policies.
Seeing what jobs are running, queued, or completed.
Checking online the workspace details, without opening FME Desktop.
Creating FME server applications that anyone can access.
Creating and sharing projects in the web.
Integrating applications to maintain consistency across data sets.
FME Server allows you to use a configurable REST API to control data integration.
FME Server is a highly scalable system, so that, if you need to increase processing power, you can do this by adding more engines to your existing license, or by using dynamic engines for one-off projects or occassional evolutions that may require some additional processing power.

FME Server has a fault-tolerant architecture – that is to say, a robust architecture with integrated components and translation recovery, which is designed to handle any problem that may arise.

This server can be implemented in several ways:

Local Infrastructure (Physical Hardware): this is FME Server’s traditional configuration, and is installed on the hardware systems themselves.
Infrastructure as a Service (IaaS – Virtual Hardware): here, you must buy FME Server and install it on a virtual hardware provided, as a service, by a company such as Amazon.
Platform as a Service (PaaS – FME Cloud): FME Server is delivered, already pre-installed on an Amazon virtual machine, with the entire platform provided by Safe Software on a pay-as-you-go basis.

If you would rather use a cloud system to avoid having to meet the requirements, costs and maintenance of the hardware (FME Server), FME Cloud is the optimal choice to migrate all the workflows contained in FME Server. There is also the possibility of implementing FME Server in any of the cloud platform from the providers on the market.

The different installation options for FME Server are explained in detail on the Safe website. FME Server engines can be scaled in the following two ways, which can coexist together:

Additional FME Engines licenses: An FME Server license includes the first engine.
Dynamic Engines: Dynamic Engines consume hours out of a total of credits based on CPU processing time, and these hours do not expire. Credits are sold in packages, with the smallest package including 3,500 hours.

A basic installation of FME Server should have at least two engines. The transformation processes, or “Jobs”. in the FME Server, are queued per engine and executed one after the other. If you have at least two engines, you can run processes in parallel. An FME Server license is required for each environment.

FME Server Fault Tolerant Architecture Deployment Example.

FME Cloud

FME Cloud is the cloud product that allows you to automate data integration workflows, without the need to install the necessary hardware and resources needed for FME Server, since it works through instances.

The main functionalities that FME Cloud allows are practically the same as for FME Server. The difference is that some features that are applicable for a local implementation of FME Server, cannot be used in FME Cloud or would have limited use there. Besides, FME Cloud accommodates particular workflows because its server is hosted in the public cloud, rather than in a proprietary infrastructure.

The main implementation and usage differences between FME Server and FME Cloud are described in this Safe Community article.

In general rules, the functionalities allowed by FME Cloud are these:

It allows to process the data in real time with event-based triggers in Automations.
Schedules can be created to run workflows at regular intervals.
You can run workflows in parallel with job orchestration.
You can also create FME Server applications that anyone can access.
It maintains consistency across data sets with application integration.
It completes the tasks in a highly secure system.
It establishes roles and rules in accordance with data governance policies.
You can see what jobs are running, queued, or completed.
You can view workspace details online without opening FME Desktop.
You can launch instances from one of the seven AWS Regions around the world.
FME Cloud is scalable both up and down, allowing cores and RAM to be added or removed at any time as needed.
You can easily configure alerts based on conditions that can affect an instance’s uptime and performance. This makes it easy to optimize performance.
There is no limitation of engines in FME Cloud – you can use as many as needed at no additional cost. This allows you to process multiple jobs at the same time, and keep all your data organized, clean, and ready to use.
The FME Cloud architecture is built using Amazon Web Services, which uses high-grade encryption and supports all major compliance standards, making it a completely secure environment.

FME Cloud instances differ from each other in cores and RAM. A standard instance is the minimum recommended (2 cores, 8.0 GB of RAM and unlimited engines).

Why use FME?

Thanks to the FME platform and products, it is much easier to design, execute and automate classic spatial operations and their relationships with other tables, especially with massive data volumes or in repetitive processes. A wide variety of formats (readers and writers) and transformers are taken into account, which will allow us to not worry about interoperability between existing technologies in our information extraction, transformation and loading processes, using ad-hoc platforms to be implemented in the client infrastructures or through Cloud platforms.