Tools for data harmonisation

Implement INSPIRE for Geology Data – Workshop at BGR, Hannover

Around 70 persons participated in a German-language workshop with the goal of exchanging information on the technical implementation of the INSPIRE directive. The workshop, which took place on 20th/21st of February 2013, was organised by the German Federal Institute for Geosciences and Natural Resources (BGR).

After three introductory presentations on INSPIRE and the state of implementation of the directive four presentations went into the nitty gritty details of harmonising and publishing data sets. In the first example, geologic survey data from the city of Freiburg was harmonised using a methodology similar to the HUMBOLDT approach, but using FME for the mapping and transformation step. In a second presentation from Lower Saxony, street network data was published as a Atom Pre-defined Dataset Download Service. Other presentations showed results of the GS-SOIL project, in which we also participated, and the final one reported on harmonisation work done in the INSPIRE DS teams on geology work.

The third block focused on product presentations, including GO Loader and GO Publisher by Snowflake, XtraServer by Interactive Instruments, FME Inspire Solution Pack and dhp HALE. While all products support aspects of the data harmonisation and publishing process, they have a different focus:

  • Snowflake GO Loader and GO Publisher: allow to consume any GML, to store a corresponding structure in a relational database and to provide access to this data using different interfaces and content formats, ranging from GML to Atom. To publish data in an XML schema differing from the database schema, an optional database structure to interface structure mapping step is supported using GO Publisher Desktop. To summarize, “GML to database, database to GML”.
  • Interactive instruments XtraServer: a Web Feature Service (supporting all relevant versions) that can serve arbitrary GML schemas and also allows to server other formats by performing server-side format transformations using XSLT. Also allows to perform on-the-fly reprojection. As with GO Publisher, database structures are mapped to the output schema using an internal configuration mechanism. To summarize the focus: “database to GML, with comprehensive WFS support“.
  • Safe FME (without extensions): Supports GML, as well as almost any spatial database, and about 100 other I/O formats, and performs best when input and output schemas are relatively simple. FME contains a huge gallery of different transformation functions. Due to this breadth of capabilities, different data harmonisation aspects (format, schema, projection systems, geometry/topology, metadata, …) can be covered using FME.  In the context of the INSPIRE use cases, the focus is “Data transformation for preparing data for publishing”.
  • conterra FME Inspire Solution Pack: Addresses that for complex schemas, work with “vanilla” FME leads to extremely complex workspaces and can be daunting. The solution pack contains preconfigured FME workbenches, schema-specific transformer packs (“INSPIRE_Lifespan_Setter”, “auAdmUnit_nationalLevel”) as well as integrated INSPIRE documentation. The focus is “configure FME so that it can be used better for transformation of specific input data to INSPIRE GML“.
  • Esri ArcGIS for INSPIRE: ArcGIS for Server extension that provides INSPIRE-compliant View and Download services as well as an ArcGIS for Desktop extension supporting data and metadata editing, based on standard ArcGIS editing tools. Uses a specific geodatabase layout underneath and comes with ETL tools to laod data from other specific geodatabases. To summarize, “maintain and publish your data for INSPIRE in a single place”.
  • dhp HALE and dhp HALE Web: The only application that enables users to do interactive mapping of arbitrary, complex schemas. High-performance transformation engine with a set of ~15 transformation functions, also available as a web service. Performs advanced validation and quality assessment on the fly, generates rich metadata. The summary: “Understand your schemas and mapping, ensure that the transformed data is of high quality“.

At this stage, most organisations will rely on more than one solution to cover the different aspects of data harmonisation and INSPIRE-compliant publishing of data. A typical workflow could involve FME for reading formats that other solutions don’t support and for geometric harmonisation, using HALE for the schema mapping and then using XtraLoader, GO Publisher, Deegree or ArcGIS for INSPIRE to publish the data. It will be interesting to see how the existing solutions converge and provide integration points to each other.

For us, it was a very encouraging day with positive feedback from the audience.

Thanks to Simon Templer of Fraunhofer IGD, who presented the current state of HALE to the participants, and provided me with the material for this report.

Feature Wish of the Month Polls

It’s important to us to know what you really need in tools to get to grips with data integration and harmonisation. Therefore we’ll be asking you once a month for your feature wishes. We’ll be offering three Features each month. The one to get the most votes will be prioritized for the next release. Of course you can always send us additional ideas via the commenting function to the poll!

Is there a Data Harmonisation Market?

Determining what kind of market exists for the toolset developed in the original HUMBOLDT project (2006 to 2011) was a major part of work. Among other activities, we conducted two market studies (2009 and 2011) to determine whether a market for data harmonisation products and services exists, and what the properties of that market are. These market studies used questionnaires and expert interviews to characterize the market, its actors and their needs.

Here are some of our findings of the 2011 study with 29 participants.

1. Importance and Use of Data Harmonisation

  • The participants in the study assess the importance of data harmonisation as very high, but the expected benefit varies by industry. The following diagram shows the expectations voiced:
Assessment of the importance of spatial data harmonisation for different industries (n= 29)

Assessment of the importance of spatial data harmonisation for different industries (n= 29)

However, many actors perform almost no data harmonisation, and if they do, it is focused on two fields – geographic names and geometric harmonisation. The effort involved is seen as prohibitive to perform data harmonisation for all but the most important data sets. If the costs of data harmonisation and reuse could be reduced, the full benefits of SDIs and INSPIRE could be unlocked.

  • In general, participants in the studies mentioned the following main benefits of data harmonisation activities:
    • Reducing duplication of data collection costs
    • Enabling easier discovery of datasets using standardized metadata and publishing such metadata electronically
    • Improved cross-departmental co-ordination of spatial data collection and publishing regimes due to harmonized datasets
    • Faster access to spatial data, especially using web-based delivery
    • Huge efficiency gains derived from a wider access to data of better quality within organizations / disciplines and across them
    • Benefits to society (better foundation for political decisions and monitoring)
    • Development of standardized fundamental core spatial databases from which new products and services can be developed more cheaply and quickly

2. Market structure and actors

For the data harmonisation services and products developed during the HUMBOLDT project, the following primary and secondary markets were identified:

Primary market:

  • National INSPIRE-responsible bodies (LMOs)
  • GIS developers implementing applications with data harmonisation issues.
  • Parties that have to use/offer heterogeneous data from various sources (Data Custodians/Data Integrators)

Secondary market:

  • People/institutions faced with spatial data interoperability difficulties in a cross-border situation or other application fields.
  • Other parties interested in data harmonisation.
  • Thematically related European or Industry Projects.

These are in total several thousand potential customers in Europe alone who are currently for the largest part due to a lack of tools and processes not investing into data harmonisation, but rather perform re-collection of data or just use heterogeneous data sets.

To summarize, the market for data harmonisation services and products is in a competitive situation. It is evolving and changing quickly with both customers and market actors using very different approaches to data harmonisation. There are almost no actors that actively promote data harmonisation activities, but rather perform them under different titles such as data transformation, integration or spatial/business analytics.

What is your take on this? Is there a market for specific data harmonisation products and services or are these just special fields of Business Intelligence, Business Integration, or Spatial Data Value Added Services? What’s the right label?



A Special Preview: The Next-Generation Schema Transformation Service is coming

The cst-wps project wraps the core transformation engine of HALE 2.1 and provides its functionality through a WPS interface. However, the cst-wps wasn’t in active development for some time, so what has happened?

The cst-wps project in its first incarnation was a collaboration project using the 52°North WPS framework to expose our transformation engine as a OGC Web Processing Service. The 52°N framework handles all requests, parsed and encoded GML and generally, while the cst engine performed the schema transformation according to a previously uploaded configuration. However, due to incompatible licensing at the time (52°North has, since then, changed their licensing rules), we decided to go ahead with a different WPS framework, called PyWPS. Integration proved to be complex, and the resulting project was hard to maintain. Thus, cst-wps became unsupported.

Of course, this wasn’t to be the last word. Based on our all-new transformation engine that is now shipped with HALE 2.5.0, we have also created a new cst web service. It is not yet officially released, but we want to give you a preview of what this entirely new piece of software is capable of. Continue reading