KEN Interviews: Just van den Broecke on Stetl

For the third interview of the INSPIRE KEN Series, Just van den Broecke is my guest. Just is an independent Open Source Geospatial Professional, self-employed at Just Objects B.V.. He provides consultancy, support and development exclusively in the domain of Free and Open Source for Geospatial and is the core developer of the Stetl open source framework.

Thorsten: What was the big idea that led to the development your software?

Just: Not a single big idea. This goes back to 2009 when I consulted for the Dutch Kadaster. Within the EURADIN and ESDIN projects we prototyped various solutions dealing with INSPIRE Data Transformation and harmonized Data Download Services (via WFS). For the transformation we used a combination of shell-scripting, GDAL/OGR, XSLT and PostGIS. For the WFS we (and many others in ESDIN) used the deegree WFS server. It struck us that this combination was really powerful, above we were one of the first members in ESDIN that delivered harmonized and valid (via the ETF test tool) INSPIRE GML via WFS. We then realized that any working INSPIRE solution should be an integrated toolset, i.e. ETL plus datastore (PostGIS) + WFS (deegree). Later in 2011-2012, the ETL tooling became more integrated into what became Stetl, completely rewritten in Python and more of a generic ETL tooling, not just for INSPIRE harmonization but also for local Dutch GML-based datasets.

Thorsten: What is the main strength of your software? What are you particularly proud of?

Just: I think the ability to produce valid GML from any source, dealing with terabytes, the streaming architecture, the speed and versatility to integrate with other geo-tools, in particular in Open Source (Linux) server-environments. For example the integration with the deegree WMS/WFS, allows for a turnkey INSPIRE solution. In effect Stetl is standing on shoulders of the giants that developed XSLT, GDA/OGR and PostGIS.

Thorsten: For which problems or use cases is it particularly well suited? What differentiates it from other products or solutions?

Just: Stetl is particularly well suited in Open Source server-contexts. The two downsides may be that Stetl does not have a GUI and that XSLT has some learning curve. The first is a matter of time. As for XSLT: this is still a mainstream tool/standard. When well-developed XSLT scripts can be highly structured, as we did for various INSPIRE Annex I transformations. For example: we have high reuse for common structures like INSPIRE Identifiers and Geographical Names. A new data theme transformation can often be quickly (matter of days) derived working by example from previous transforms.

Thorsten: Please provide a number of your choice and its meaning.

Just: 2 hours: complete transformation of Dutch Addresses to the INSPIRE Annex I AD theme.

Thorsten: On which phases of a schema transformation project does you software focus?

Just: Mainly on the execution phase.

Thorsten: Describe (ideally use a concrete example) how users of your software should design the schema transformation in a project.

Just: Start with a semi-formal description (like in Excel sheet) of the source and destination features/attributes. Use a small subset of the data.

Thorsten: Describe how users of your software develop the actual schema transformation. Which tools do they need, which knowledge should they have?

Just: At a minimum just a text editor is required. But to be effective with GML and XSLT it is best to use an Integrated Development Environment (IDE) like Eclipse or IntelliJ. Also use a version control system (SVN, GitHub).

Thorsten: Describe how users of your software should debug and validate the schema transformation process they have developed. Again, which additional tools and knowledge do they require?

Just: One of the filter modules in Stetl is an XML Validator. During development this filter can be applied to check the output. In addition to test the WFS output the Open Source ESDIN Test Framework can be used.

Thorsten: Describe how users of your software should document a schema transformation project.

Just: In practice there is not so much code, a lot of it should be self-documenting. XSLT design can be documented as described here.

Thorsten: Describe how users of your software should maintain a schema transformation project.

Just: Best is to have an integrated architecture (ETL+ Web Services) and maintain all sources/configs in a version control system like SVN or Git. This is one of the weaknesses of some desktop tools I’ve seen: multiple versions lying around, deployment issues. With Stetl + deegree everything can be maintained and built from a single repository.

Thorsten: Please classify your software according to the criteria given in this article:

Just:

  • Paradigm: Stetl uses a combined declarative and procedural approach. Stetl uses a text file (.ini format) to specify the ETL as chained processing modules: inputs, filters, outputs.
  • Execution: Via the “stetl” command with a given Stetl config file (see Paradigm) the processing modules are executed. Typically input modules use OGR for reading from a source input and Coordinate Transformation and “schema flattening” while an XSLT-Filter module will do Schema Transformation.
  • Representation: Stetl uses a text file (.ini format) that is a specification the entire ETL chain. Specific processing and transformation steps, for example XSLT scripts, are parameterized and referenced from that specification.
  • Expressiveness: Out of the box offers the ETL-framework via a Pipes-and-filters Pattern. Many modules are already available like streaming GML parsing, XSLT-processing and OGR and deegree-integration. XSLT scripts need to be specified by the ETL-developer. Additional processing elements can be added via Python scripting by inheriting from existing processing modules.

Thorsten: Give an example of a schema transformation project in the INSPIRE context using your software.

Just: The Dutch National INSPIRE SDI PDOK for Addresses Theme (together with degree WFS/WMS) and various EURADIN and ESDIN projects.

Thorsten: What do you think of creating a schema transformation standard language, e.g. in the OGC?

Just: No strong opinion: the combination of GDAL/OGR + XSLT + PostGIS plus some custom Python coding, thus Stetl, should be able to handle any INSPIRE transform. The advantage is that these are or solid and widely supported tools.

Thorsten: Anything else that you would like to explain? Future plans for the software? The next big thing?

Just: Well, Stetl is more and more used for National Data transformations (Dutch Topography and Ordnance Survey Mastermap). Integration with WPS is is foreseen in combination with a (web-based) GUI mainly for execution and parameterization. No plans for developing transformations graphically as in HALE or FME – Stetl will remain firstly a commandline approach, like ogr2ogr.

Thorsten: Thank you very much, Just!

Leave a Reply

Your email address will not be published. Required fields are marked *