On 8th and 9th of October, around 50 people gathered for the joint EuroSDR/INSPIRE Knowledge Exchange Network (KEN) Workshop on Schema Transformation. The workshop gave all participants the opportunity to get an overview of pretty much all approaches that are in market to help complete schema transformation projects.
For the full program, all slides and video recordings of the workshop, please go the the Eurogeographics website. What follows is not a detailed report of every presentation, but rather an account of my personal highlights, including the two discussion sessions that completed each day. Marie-Lise Vautier (IGN France) and myself started with presentations setting the frame by providing definitions of what schema transformation is and what general approaches are available. Morten then continued with experiences highlighting the schema matching methods originally developed by ESDIN and now widely used by Cadastral Agencies and other LMOs. He also mentioned that especially matching tables can get hard to create and maintain.
After a break, Just van den Broecke opened the block on Open Source Schema Transformation software. He has developed a streaming ETL framework called STETL which is based on GDAL/OGR, XSLT and other libraries and ties everything together using Python. Python support throughout the world of geospatial tools is very good – you can see it becoming a lingua franca in GIS for scripting. Just, like me in my earlier presentation, made it clear that schema transformation projects are essentially programming projects and thus have a certain complexity level. I fully agree, but see it as a disadvantage that for using STETL you have to learn multiple languages. Consequently I see STETL mostly as a tool for programmers who want to use the tools it is based on anyhow and need a rich “boilerplate”. Just also held up the flag for Open Source as a major community enabler, which I see as especially important for INSPIRE.
I was also very interested in the presentations on GeoKettle and Talend Spatial Data Integrator, which on a first glance seem to have close capabilities. Both presentations were given by users who had completed transformation projects using them. About both, I like that they are derived from general-purpose, non-GIS, which proves tool reusability. Talend was showcased by Jean-Loup Delaveau of CERTU. He explained how to create INSPIRE Planned Land Use Data by setting up a workflow in Talend that used components such as XSLT translators. An interesting note from his side was that GML should really be used as a machine-to-machine exchange format, and that providers and users should not see much of it.
Edith Vautard of IGN France explained how her group evaluated GeoKettle for INSPIRE Administrative Units generation. One thing that really impresses me is that IGN France is very open and trying out many approaches and tools to collect rich internal knowledge. On GeoKettle, I made note that I’d like to investigate their workspace format a bit. Edith ended with an overall positive assessment of Geokettle, citing from her slides:
- + It’s intuitive and easy to use
- + powerful and performant
- + provides a sufficient diversity of functions
- + reads the schema from the data
- – Transformations are only stored in the internal XML format and cannot be exported as executable files (e.g. XSLT)
- – INSPIRE complex structures are not supported, nor can you create non-simple GML 3.2.1
- – There is no help in the software, and documentation is light; however, there is good support.
The first day was then completed by an update on the model driven WFS work done by TU Munich, presented by Tatjana Kutzner. She highlighted findings of her recent research, which has been published under the title “Critical Remarks on the Use of Conceptual Schemas in Geospatial Data Modelling — A Schema Translation Perspective” (Kutzer, Donaubauer 2012). The core question they researched was what a core model of all UML profiles being used would look like and how to provide encoding rules for conceptual models in machine-readable formats.
After Tatjana’s presentation, only the discussion round stood between us and dinner – and everybody stayed for an interesting, engaged discussion, with these core findings on the subject “what are the main drivers to choose methods and tools for schema transformation” (citing from Dominique Laurent’s summary):
- Maintenance and documentation of tools are significant criteria
- Choice of tools depends on the business models of data providers: some want the best tool for each step (even if using many tools increases complexity), some want only a single supplier (or at least a small number of tool suppliers) and tender accordingly
- Choice of tools depend also on national policy; there may be order to use open-source tools
- Skills will also influence the methods and tools: if limited skills, would be better to choose a tool simple to use and/or to envisage training
- Choice of tools and methods will depend on the existing systems already in place (tools, data, …) and on the organization (e.g. one or several data producers)
Another item of discussion started from my earlier presentation on schema transformation approach classification: “To be able to choose our tools and methods, we need [a framework] to analyse the potential ones, to get an overview”. Meanwhile, I have posted a more extensive description of the framework presented in Paris here. The day then really ended with a very nice dinner :).