KEN Interviews: Just van den Broecke on Stetl

For the third interview of the INSPIRE KEN Series, Just van den Broecke is my guest. Just is an independent Open Source Geospatial Professional, self-employed at Just Objects B.V.. He provides consultancy, support and development exclusively in the domain of Free and Open Source for Geospatial and is the core developer of the Stetl open source framework.

Thorsten: What was the big idea that led to the development your software?

Just: Not a single big idea. This goes back to 2009 when I consulted for the Dutch Kadaster. Within the EURADIN and ESDIN projects we prototyped various solutions dealing with INSPIRE Data Transformation and harmonized Data Download Services (via WFS). For the transformation we used a combination of shell-scripting, GDAL/OGR, XSLT and PostGIS. For the WFS we (and many others in ESDIN) used the deegree WFS server. It struck us that this combination was really powerful, above we were one of the first members in ESDIN that delivered harmonized and valid (via the ETF test tool) INSPIRE GML via WFS. We then realized that any working INSPIRE solution should be an integrated toolset, i.e. ETL plus datastore (PostGIS) + WFS (deegree). Later in 2011-2012, the ETL tooling became more integrated into what became Stetl, completely rewritten in Python and more of a generic ETL tooling, not just for INSPIRE harmonization but also for local Dutch GML-based datasets.

Thorsten: What is the main strength of your software? What are you particularly proud of?

Just: I think the ability to produce valid GML from any source, dealing with terabytes, the streaming architecture, the speed and versatility to integrate with other geo-tools, in particular in Open Source (Linux) server-environments. For example the integration with the deegree WMS/WFS, allows for a turnkey INSPIRE solution. In effect Stetl is standing on shoulders of the giants that developed XSLT, GDA/OGR and PostGIS.

Thorsten: For which problems or use cases is it particularly well suited? What differentiates it from other products or solutions?

Just: Stetl is particularly well suited in Open Source server-contexts. The two downsides may be that Stetl does not have a GUI and that XSLT has some learning curve. The first is a matter of time. As for XSLT: this is still a mainstream tool/standard. When well-developed XSLT scripts can be highly structured, as we did for various INSPIRE Annex I transformations. For example: we have high reuse for common structures like INSPIRE Identifiers and Geographical Names. A new data theme transformation can often be quickly (matter of days) derived working by example from previous transforms.

Thorsten: Please provide a number of your choice and its meaning.

Just: 2 hours: complete transformation of Dutch Addresses to the INSPIRE Annex I AD theme.

Thorsten: On which phases of a schema transformation project does you software focus?

Just: Mainly on the execution phase.

Thorsten: Describe (ideally use a concrete example) how users of your software should design the schema transformation in a project.

Just: Start with a semi-formal description (like in Excel sheet) of the source and destination features/attributes. Use a small subset of the data.

Thorsten: Describe how users of your software develop the actual schema transformation. Which tools do they need, which knowledge should they have?

Just: At a minimum just a text editor is required. But to be effective with GML and XSLT it is best to use an Integrated Development Environment (IDE) like Eclipse or IntelliJ. Also use a version control system (SVN, GitHub).

Thorsten: Describe how users of your software should debug and validate the schema transformation process they have developed. Again, which additional tools and knowledge do they require?

Just: One of the filter modules in Stetl is an XML Validator. During development this filter can be applied to check the output. In addition to test the WFS output the Open Source ESDIN Test Framework can be used.

Thorsten: Describe how users of your software should document a schema transformation project.

Just: In practice there is not so much code, a lot of it should be self-documenting. XSLT design can be documented as described here.

Thorsten: Describe how users of your software should maintain a schema transformation project.

Just: Best is to have an integrated architecture (ETL+ Web Services) and maintain all sources/configs in a version control system like SVN or Git. This is one of the weaknesses of some desktop tools I’ve seen: multiple versions lying around, deployment issues. With Stetl + deegree everything can be maintained and built from a single repository.

Thorsten: Please classify your software according to the criteria given in this article:

Just:

  • Paradigm: Stetl uses a combined declarative and procedural approach. Stetl uses a text file (.ini format) to specify the ETL as chained processing modules: inputs, filters, outputs.
  • Execution: Via the “stetl” command with a given Stetl config file (see Paradigm) the processing modules are executed. Typically input modules use OGR for reading from a source input and Coordinate Transformation and “schema flattening” while an XSLT-Filter module will do Schema Transformation.
  • Representation: Stetl uses a text file (.ini format) that is a specification the entire ETL chain. Specific processing and transformation steps, for example XSLT scripts, are parameterized and referenced from that specification.
  • Expressiveness: Out of the box offers the ETL-framework via a Pipes-and-filters Pattern. Many modules are already available like streaming GML parsing, XSLT-processing and OGR and deegree-integration. XSLT scripts need to be specified by the ETL-developer. Additional processing elements can be added via Python scripting by inheriting from existing processing modules.

Thorsten: Give an example of a schema transformation project in the INSPIRE context using your software.

Just: The Dutch National INSPIRE SDI PDOK for Addresses Theme (together with degree WFS/WMS) and various EURADIN and ESDIN projects.

Thorsten: What do you think of creating a schema transformation standard language, e.g. in the OGC?

Just: No strong opinion: the combination of GDAL/OGR + XSLT + PostGIS plus some custom Python coding, thus Stetl, should be able to handle any INSPIRE transform. The advantage is that these are or solid and widely supported tools.

Thorsten: Anything else that you would like to explain? Future plans for the software? The next big thing?

Just: Well, Stetl is more and more used for National Data transformations (Dutch Topography and Ordnance Survey Mastermap). Integration with WPS is is foreseen in combination with a (web-based) GUI mainly for execution and parameterization. No plans for developing transformations graphically as in HALE or FME – Stetl will remain firstly a commandline approach, like ogr2ogr.

Thorsten: Thank you very much, Just!

KEN Interviews: Simon Templer of Fraunhofer IGD

In the second interview of the INSPIRE KEN series, my guest is Simon Templer. He is a researcher at the Fraunhofer Institute for Computer Graphics Research IGD and is the lead developer for HALE. His Spatial Information Management group at Fraunhofer IGD was the coordinator of the HUMBOLDT research project and is still the driving force behind the continuous development of HALE.

Thorsten: What was the big idea that led to the development your software?

Simon: Enabling domain experts to easily create and maintain Schema Mappings – even for complex schemata.

Thorsten: What is the main strength of your software? What are you particularly proud of?

Simon: The main strength is the instant feedback you get on the transformation during the whole process of creating the Schema Mapping. Starting with the first relation you create you can observe how the transformation takes form and visually verify the results.

As I’m also a developer working on HALE I see a different strength – HALE is designed to be extendable. It offers a large set of extension points, e.g. to add support for additional schema and data formats, transformation functions or User Interface components.

Thorsten: For which problems or use cases is it particularly well suited? What differentiates it from other products or solutions?

Simon: The INSPIRE GML Application Schemata and many other XML schemata are really complex. Other tools tend to work well with simple feature models, but get exponentially harder to use and understand with increased schema complexity. Scaling an intuitive, simple user experience to schemata of any complexity and data sets of any size is one thing we focused on when developing HALE – but without hard-coding everything, so that power users still have as much choice as they need.

Thorsten: On which phases of a schema transformation project does your software focus?

Simon: The main focus lies on the development and maintenance of the Schema Mapping. The design, development, debug and validation phases are not strictly separated in HALE, but are rather tied into a single, fast feedback loop. However, for each phase you can create specific resources.

Thorsten:Describe (ideally use a concrete example) how users of your software should design the schema transformation in a project.

Simon: To design a schema transformation, people use HALE’s schema explorer as well as the source data view to analyze the schemata and data. An especially helpful function in this phase are the statistics on the schema, such as which elements are actually filled with data, and with what kinds of data. The schemata can also be exported as a matching table to get feedback from persons used to working with matching tables. The next step is identifying correspondences and deciding how to express them as relations in HALE, first on type, then on property level.

Thorsten: Describe (ideally use a concrete example) how users of your software develop the actual schema transformation. Which tools do they need, which knowledge should they have?

Simon: The development tool is HALE, and it can be used without programming skills – just bring your data model knowledge. Developing the Schema Mapping is a short cycle of defining or adapting individual relations and immediately getting feedback on how the changes influence the transformation. Rapid development is supported through the accessible schema documentation, instant transformation feedback, validation of transformed instances, as well as generated and user defined mapping documentation.

In addition to the regular relations that are used to define the Schema Mapping, advanced users have the possibility to combine them with custom Groovy scripts.

Thorsten: Describe (ideally use a concrete example) how users of your software should debug and validate the schema transformation process they have developed. Again, which additional tools and knowledge do they require?

Simon: During the creation of the Schema Mapping sample data is transformed and validated based on the constraints defined by the associated schema. Transformed objects can be inspected individually or as a whole, and compared with the objects they originated from. Errors in individual relations won’t stop the transformation – HALE always provides as complete results as possible.

Thorsten: Describe (ideally use a concrete example) how users of your software should document a schema transformation project.

Simon: Documentation can be generated automatically for a mapping project in a variety of formats, such as HTML or Excel. It includes detailed information on all defined relations. In addition, notes and comments can be attached to each individual relation as well as the whole project.

Thorsten: Describe (ideally use a concrete example) how users of your software should maintain a schema transformation project.

Simon: Due to the declarative nature of our mapping language, each relation can be independently added, removed, edited or disabled. Mappings can be versioned, forked and merged. Furthermore it is possible to import existing mappings in your own project. As an example, you can import a CityGML to INSPIRE Buildings base mapping and then create your own mapping to deal with an ADE.

Thorsten: Classify your software according to the criteria given in this article:

Simon:

  • Paradigm: HALE uses a declarative approach.

  • Execution: HALE combines Schema-driven and Instance-driven elements during the transformation – schema and mapping are compiled into a transformation graph which can be modified for individual instances.

  • Representation: HALE uses a graphical representation of relations as well an RDF-Text-based one. The user creates and configures relations guided step-by-step by specific wizards.
  • Expressiveness: Out of the box, HALE offers almost complete expressiveness. It only lacks spatial filtering and loop constructs. Both are rarely used (so far they haven’t been requested by the users) and can be added through scripting or custom extensions.

Thorsten: Give an example of a schema transformation project in the INSPIRE context using your software.

Simon: In our newest user project, KU Leuven from Belgium uses HALE to produce Air Quality data compliant to INSPIRE and the EU Air Quality Directive IPR, based on their existing Web Feature Service.

Thorsten: What do you think of creating a schema transformation standard language, e.g. in the OGC?

Simon: A schema transformation standard language should be relatively easy to design and implement. It should focus on defining a framework, with some basic transformation functionality and mechanisms to extend it, coupled with a public registry to discover transformations.

Thorsten: Anything else that you would like to explain? Future plans for the software? The next big thing?

Simon: Be sure to check out the next release which will be out in the first half of November. It adds transformation based on PostgreSQL/PostGIS databases, an improved user interface for defining classifications and advanced scripting functions.

Thorsten: Thank you very much, Simon!

KEN Interviews: Ken Bragg of Safe Software

To support the work of the INSPIRE Knowledge Exchange Network, I have started to interview schema transformation software providers, specifically those who took part in the INSPIRE KEN workshop. These interviews provide a starting point for comparing strengths and weaknesses of different approaches and outline best practices for phases such as design, development or maintenance. I conducted the first interview with Ken Bragg, who is the European Services Manager for Safe Software. Safe Software are the makers of FME, perhaps the most the well-known spatial data translation and transformation tool.

TR: What was the big idea that led to the development of FME?

Ken: We believe you should have complete mastery of and access to your data where and how you need it. FME lets you transform your data to use and share.

TR: What is the main strength of your software? What are you particularly proud of?

Ken: FME supports over 300 data formats and enables users to transform data in limitless ways. We are proud of the way our users simply love working with FME and become incredibly enthusiastic about our products.

TR: For which problems or use cases is it particularly well suited? What differentiates it from other products or solutions?

Ken: FME is very well suited for virtually any kind of data transformation including: format, coordinate system, schema and content transformation. No other product supports the range of formats and transformers supported by FME.

TR: On which phases of a schema transformation project does you software focus?

Ken: FME is well suited for format transformation, coordinate system transformation and particularly attribute mapping including name, values and data type mapping.

TR: Describe (ideally use a concrete example) how users of your software should design the schema transformation in a project.

Ken: Many of our users use FME to migrate data from their own format and schema into an Inspire INSPIRE staging database for example ArcGIS Inspire Geodatabase. This transformation can be designed and edited in FME Workbench which is an easy to use and mature graphical environment. The design can be documented within FME Workbench and saved or edited as an FME Workspace file.

TR: What you are saying is that there is no explicit design step, right? The implementation of the project is the design?
Ken: Yes you’re right – there is no explicit design step.

TR: Describe (ideally use a concrete example) how users of your software develop the actual schema transformation. Which tools do they need, which knowledge should they have?

Ken: FME Workbench is the key tool we use to develop schema mapping. The basic steps for defining a transformation into an Inspire INSPIRE staging database transformation are as follows:

  1. Add a reader to the source data in whichever format it exists into FME Workbench. This will add the source feature types and their schema into your Workspace.
  2. Add a writer to the destination database and import the required destination feature types and their schema from an existing database or template.
  3. Define the schema mapping by connecting the feature types and using FME transformers such as AttributeRenamer, AttributeCopier etc. Or use the SchemaMapper transformer in FME to read a set of mapping rules from a table or spreadsheet.

A domain expert in the source data is required to perform the actual mapping and some knowledge of FME Workbench.

TR: Describe (ideally use a concrete example) how users of your software should debug and validate the schema transformation process they have developed. Again, which additional tools and knowledge do they require?

Ken: Errors in the transformation process (using the above example) can be easily trapped in FME Workbench by disabling and enabling connections and then using the transformers DataInspector and Logger to see features at various points in the transformation. Break-points can also be created by inserting Inspection Points along connections.
Output data can be validated with other FME Workspaces which can verify schema, attribute values and geometry.

TR: Describe (ideally use a concrete example) how users of your software should document a schema transformation project.

Ken: FME Workbench includes annotation tools for any objects in the canvas and for general annotation. For example in an Inspire INSPIRE schema mapping workspace we might annotate a StringConcatenator transformer to say this is where the _NationalID is defined. Workbench also includes a rich set of “Workspace Properties” for metadata such as data, history, usage, requirements, etc. These can be edited in a Workspace Properties dialog.

TR: Describe (ideally use a concrete example) how users of your software should maintain a schema transformation project.

Ken: Schema transformation workflows are maintained in workspaces which can be edited at any time. Also, if the SchemaMapper transformer is used then the mapping rules can be maintained in database tables or spreadsheets if preferable.

TR: Do you have any best practices when it comes to versioning or even merging workspaces?

Ken: No, there aren’t really best practices around versioning or merging workspaces yet. This is something on our list of things to do.

TR: Classify your software according to the criteria given in this article.
Ken:

  1. FME uses a procedural paradigm for schema mapping
  2. FME can be both schema and feature driven depending on the transformation defined in FME Workbench
  3. FME uses a graphical representation for defining workflows
  4. Arguably FME is completely Expressive when it comes to Inspire INSPIRE transformation requirements.

TR: Please give an example of a schema transformation project in the INSPIRE context using your software.

Ken: BKG (German: Federal Agency for Cartography and Geodesy) uses FME to perform schema mapping from ESRI Esri Geodatabases into Inspire INSPIRE staging Geodatabases or EuroGraphics datasets.

TR: What do you think of creating a schema transformation standard language, e.g. in the OGC?

Ken: In my personal opinion this would add an unnecessary layer of complexity and abstraction to schema transformation.

TR: Anything else that you would like to explain? Future plans for the software? The next big thing?
Ken:

  1. FME’s GML writer for FME 2014 fully supports XSD schema driven GML writing
  2. FME Server 2014 has an improved Streaming Service which allows flexible support for workspace drive WFS and other Web Services.
  3. FME already supports the 3D, AIXM and raster features which will be required in Inspire INSPIRE Annex III
  4. FME can support Application Domain Extensions (ADE’s) for CityGML which should also ease productions of 3D Building datasets for Inspire INSPIRE Annex III.

TR: Thank you very much, Ken!