Tutorials

Loading GeoJSON files into ontologies using DataFlow

The Platform feeds on many different forms of raw data, GeoJSON files being one of those forms.

A GeoJSON is but a JSON file that includes spatial location information that allows simple geometries to be represented in a map viewer. These geometries are points, lines and polygons.

Examples of geometries. Source: Wikipedia.

This type of JSON is structured as a FeatureCollection-type object made up of a number of Features, each of these Features being one or more geometric elements, together with its/their properties. For example, a GeoJSON that contains a point that represents the location of the Minsait headquarters would have this information:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
          "lugar": "Minsait"
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          -3.6412596702575684,
          40.52915218240344
        ]
      }
    }
  ]
}

When creating an ontology that incorporates this information, we can find ourselves in two different situations: either all GeoJSON shares the same type of geometry (a layer made only of points, or of lines, or of polygons), for which we must generate in the ontology a field of the corresponding geometry type (geometry-point, for example); or multigeometries are included, which implies we’ll be doing it in a more traditional way.

Once our ontology is defined, there are several methods to ingest data into it – from entering data one by one manually (which is not practical at all), to using data-loading tools that optimize the process.

One of the tools that we have in Platform to carry out these data loads is DataFlow, based on StreamSets.

After configuring the input parameters -the data source- and the output parameters -the ontology- launching the DataFlow we automatically load all the information contained in the GeoJSON in the ontology, making its content available for use in our projects.

To carry out this entire procedure, we have created a tutorial in our Confluence on how to load a GeoJSON into an ontology using DataFlow. Using practical examples, we explain step by step how to carry out the generation of the ontology, how to prepare a GeoJSON with the Autonomous Communities of the Peninsula area for loading, how to create the DataFlow Pipeline and how to configure it for its operation.

As a result, we will have a functional ontology that can be used, for example, to generate a layer to be displayed on a map in a Dashboard Gadget.


We hope that the tutorial will be of interest to you and that you can take advantage of it. Remember that this is a very simple example of a data load, so if you are interested in learning more about this tool, we recommend our help guides on DataFlow.

Any doubt or issue you may have, please leave us a comment and we will analyze it as soon as possible.

More information

✍🏻 Author(s)

Leave a Reply

Your email address will not be published. Required fields are marked *