Workflow: 3 Steps to using SQL Databases in HALE

The headline feature of the HALE 2.8.0 release is the support for reading schemas and data from SQL Databases. Out of the box, PostGreSQL is supported. How you can access a PostGreSQL/PostGIS database in HALE and work with its content is detailed in this workflow. It will take you through the basic steps for mapping a database in HALE. In this concrete case the target is the current version of the INSPIRE Planned Land Use GML schema. Thus the starting point is the “Map to INSPIRE Planned Land Use” template which already includes the target schema and the respective code lists. You can access the template projects in HALE by clicking File -> New project from template....

1. Importing the Schema

After the template project has been loaded, we have to load the database schema as the source schema. To import the database schema (File -> Import -> Source Schema -> From Database (JDBC)) you need the host your database is running on, the name of the database and a user and password with at least read privileges:

Import a Schema from a SQL Database

Import a Schema from a SQL Database

HALE then reads the schema from the database, and each table or view is represented as a type with its columns as properties. More information is available in the online help. At this stage, foreign key relations are not yet taken into account:

Source Schema imported from the SQL Database

Source Schema imported from the SQL Database

Depending on your database the list of tables might include much more than you need for the mapping. Remove any tables that are not relevant – this removes the clutter from the Schema Explorer and ensures that if you are working with source data, only the relevant data is loaded. Note that the tables are organized per schema in the dialog, you can easily select or de-select a whole schema. In our example we only keep tables from the u_plan_v2 schema.

2. Importing sample data

An important feature of HALE is the continous testing of a defined mapping using real data. Sometimes using an entire dataset will slow down the workflow, though. To take only a subset of the data, enable instance sampling, e.g. to only load the first 1000 rows for each table. You can configure Instance Sampling by going to Window -> Settings -> Project -> Source Data:

Using Instance Sampling if the Source Dataset is too big for real-time work

Using Instance Sampling if the Source Dataset is too big for real-time work

Now import the data from your database, as you did before with the schema. To connect to the same database as before you can use the Recent resources in the From URL tab of the Load Source Data Dialog:

Importing Source Data from the SQL DB

Importing Source Data from the SQL DB

Once the data is loaded you can use the usual methods in HALE to inspect and analyse it.

3. Perform the Mapping

If you have only one relevant source type – i.e. all information you need for mapping to a target type is in a single table, you can simply use the Retype function. In most cases however, as we are dealing with a relational database, we need to Join multiple tables to combine the information.
In our example we want to map objects from the ft_landuse table to the ZoningElement type, but only objects that have the value planned in the discr column. So first we add a condition context on the ft_landuse type:

Adding a Type condition to match only a part of the Features of this type that satisfies the condition

Adding a Type condition to match only a part of the Features of this type that satisfies the condition

In the example database, several tables are linked to ft_landuse that hold information we need to populate ZoningElement properly. So we use the Join function to combine these types for the mapping:

Adding a 5-fold Join Retype Function

Adding a 5-fold Join Retype Function

After selecting and confirming which types from the source need to be joined, the next step in the configuration of the Join is specifying the Join order. The first type must be the main source type that corresponds to the target, in this case ft_landuse. In other words, for every source Feature with type ft_landuse, a target Feature of type ZoningElement will be created, with attributes of the other source features merged in. The main source type is followed by the other feature types it is directly related to. Next, for each feature type, the Join condition that specifies how it is linked to other types is specified. This is determined automatically if there are corresponding foreign key constraints defined in the database, so you can save yourselves this step if your database is set up completely. In many cases, a primary key and a foreign key need to be equivalent to have the features be joined:

Adding the Join conditions

Adding the Join conditions

After creating the type relation the transformation will produce empty ZoningElements. So, for each type relation you should then proceed with mapping the properties. With a Join just map the properties of the individual source types to the target type, as you would do it with a simple retype. The complete mapping for the ZoningElement example looks like this:

Complete Mapping with Join and Attribute Mappings

Complete Mapping with Join and Attribute Mappings

As we configured HALE before to only load a sub-set of the data, the transformation may not find all relevant objects to Join and thus information may be missing in the transformed data. To transform the whole data either disable the sampling to load all data in HALE or use Transform project data in the Transformation menu to directly transform and encode the complete data. Having done that, we now have a nice, INSPIRE-compliant dataset:

Completed INSPIRE PLU Data

Completed INSPIRE PLU Data

This post was prepared by Simon Templer. Thanks a lot for his contribution!

dhp-blogpost-standardimage

Workflow: Loading the INSPIRE Annex II and III schemas

Working with large GML schemas and their dozens of dependencies to ISO, OGC and INSPIRE application schemas can be quite a pain – it’s one of these cases where I can’t help but agree to James Fee’s statement about GIS being complicated. Issues related to just loading them were, for quite some time, the most commonly asked support queries we got. This workflow post shows you which options you have in HALE to load a complex schema easily.

1. Import from preset: The simple option:

  1. Go to File -> Import source (or target) schema
  2. Select the “From preset” tab
  3. Pick one of the “bundled” schemas (all dependencies are included and don’t need to be fetched online)
  4. Click “Finish”. You are done.

hale-import-preset

2. Load a schema from an online repository: If the schema you want to work with is not bundled with HALE, you can access one of the online repositories, which usually have all imported schemas and other resources.

  1. In HALE, go to File -> Import source (or target) schema
  2. Select the “From URL” tab
  3. Copy/Write the URL of your root schema into the text box, e.g. http://inspire.ec.europa.eu/…/PlannedLandUse.xsd. Example repositories include:
    1. INSPIRE draft schemas: http://inspire.ec.europa.eu/draft-schemas/
    2. INSPIRE final schemas: http://inspire.ec.europa.eu/schemas/
    3. OGC schemas: http://schemas.opengis.net/
  4. Click “Finish”. All dependencies are retrieved automatically.
dhp-blogpost-standardimage

Workflow: Create Code List References in GML 3.2 and GML 3.3

  One of the most common questions we got is to explain how to create and use coded values from lists. This article provides a summary of how to use HALE to create the appropriate references in a GML 3.2 or 3.3 target schema.

GML 3.2: CodeType

In GML 3.2(.1), references are modeled using the CodeType element. Citing from the GML spec, “gml:CodeType is a generalized type to be used for a term, keyword or name. It adds a XML attribute codeSpace to a term, where the value of the codeSpace attribute (if present) shall indicate a dictionary, thesaurus, classification scheme, authority, or pattern for the term.”

In hale, such a CodeType consists of two elements, the main attribute, which will hold the classification value, and the codeSpace subattribute, which will contain a URL/URI to identify the code list from which this classification value was selected:

hale_codetype_refs_01

To map such a CodeType, you need to create two cells – one usually a reclassification cell which has the CodeType as target, and an Assign cell to provide the URL to the codeSpace:

hale_codetype_refs_02

The result will look like this in the Transformed Data view:

hale_codetype_refs_03

One issue with this was that there was no consent on how these codeSpace URLs should look like, and there was no requirement for them to be resolvable. This changed in GML 3.3, which is now also used by INSPIRE for Annex II and III specifications.

GML 3.3: ReferenceType

With these shortcomings in the current approach, GML 3.3 does things different: It uses a single URL to reference both codelist and the value therein. In the case of INSPIRE GML, such a single URL follows the pattern

<namespace>/<value>

An example would be http://inspire.ec.europa.eu/codelist/HILUCSValue/1_PrimaryProduction/. We are using the following approach to create the reference, which looks like this in the Schema Explorer:

hale_codetype_refs_04 This is the typical structure of a full XLink. Usually, for pointing to a code list value, creating a simple XLink is completely sufficient. For this, you only need to set type to “simple” using an Assign mapping, and a classification mapping cell which targets the href attribute and writes the complete URL according to the pattern described above:

hale_codetype_refs_05

However, there is also a choice of more complex options available such as relative pointing in a document tree using an Xpointer. You can freely change the approach to take in HALE – it’s not fixedly bound to a single schema.

hale_codetype_refs_06

dhp-blogpost-standardimage

Workflow: INSPIRE Planned Land Use for the Trento Province

At the INSPIRE Conference 2013 HALE workshop in Firenze, we presented two case studies, where current published data was used to create INSPIRE-compliant data sets. This post explains how we created Planned Land Use data for the Province of Trento. A more detail version of this workflow is also available as a PDF, and you can see the accompanying slides on slideshare.

The data used for this example is the land use official plan of the Province of Trento, approved in February 2006, which consists of:

  1. a shapefile with the land use plan, including two updates, one approved with a regional decree of February 2013 and the other one was the previous one, approved in July 2011.
  2. a shapefile with the most current hydro-geomorphological risk zones.
  3. main official documentation (law text).

The INSPIRE PLU conceptual schema corresponds to a dataset that corresponds to a spatial planning document. Geographic information as well as the informative or descriptive parts contained in a spatial planning documents are taken into consideration in the LandUse data application schema.

Original Map of the Planned Land Use in the Province of Trento

Original Map of the Planned Land Use in the Province of Trento

Data Preparation

In most data harmonisation workflows, a data preparation step is required. In this case, this involved creating a HILUCS code mapping table and a resource for the official documentation.

In Trento, the PLU is classified with a local classification, since Inspire requires to use HILUCS classification for land use a reclassification is needed. A classification operation can be done directly in HALE or by loading a CSV file containing the correspondence between the original and the HILUCS classes. Regarding the PLU of the Province of Trento the following classification mapping has been used.

COD_TOT

DESCRIPTION

HILUCS CLASSIFICATION

1

Aree Residenziali 5_1_PermanentResidentialUse

2

Aree Produttive

2_SecondaryProduction

3

Aree Ricreative

3_4_4_OpenAirRecreationalArea

4

Aree Agricole

1_1_1_CommercialAgriculturalProduction

5

Improduttivo

6_3_1_LandAreasNotInOtherEconomicUse

6

Campeggi

5_3_OtherResidentialUse

7

Depuratori e Discariche

4_3_3_WasteTreatment

8

Aree Sciabili

3_4_3_SportsInfrastructure

9

Aree a bosco, prato e prateria alpina

1_2_Forestry

101

Strade di importanza primaria

4_1_1_RoadTransport

102

Ferrovie

4_1_2_RailwayTransport

103

Strade di importanza secondaria

4_1_1_RoadTransport

Another preliminary operation that would help us during the mapping process is the creation of a library of information and links for the documentation of the plan. These information are stored in a CSV file containing the basic properties requested in the Insprire schema for OfficialDocumentation, in particular:

  1. an identifier
  2. legislationCitation: reference to the document that contains the text of the regulation
  3. DocumentCitation: citation of scanned plans and structural drawings being sometimes georeferenced and sometimes not (raster images, vector drawings or scanned text)

The content of the CSV file for the use case of the PLU of the Province of Trento is:

ID documentCitation legislationCitation

Doc_1

http://pguap.provincia.tn.it/GIS/tiff/uso.html

http://www.delibere.provincia.tn.it/CercaSpecifica.asp?Modalita=Delibere&anno=2013&numero=228

Doc_2

http://www.delibere.provincia.tn.it/CercaSpecifica.asp?Modalita=Delibere&anno=2011&numero=1551

Doc_3

http://pguap.provincia.tn.it/pdf/Piano/0IndiceGenerale.pdf

The Alignment in HALE

The very first operation to do in HALE is to load the source schema, the source data and the target schema. This operation is very easy, select the correspondent element to import from the menu File → Import → …

HALE can load source schema from different sources and in particular from any GML/XML, shapefiles and CSV files. Considering the source data and the target schema there are different FeatureType that has to be considered in this example, in particular:

  1. ZoningElement: contains the information of the geometries of the Land Use
  2. SpatialPlan: contains all the information related to the official plan
  3. OfficialDocumentation: contains the links and reference for the available documentation
  4. SupplementaryRegulation: contains the information of additional regulation which supplement the zoning.

The first type to map is the LandUse shapefile which contains the basic information for both the ZoningElement and the SpatialPlan. All the mapping is based on the retype operation. The Retype function expresses that a source and a target type are semantically equal: for each instance of the source type, an instance of the target type is created. Property relations only take effect in the context of a type relation. First a type relation must be defined, then property relations between the involved types can be specified.

Initial HALE setup after loading source & target schema, source data

Initial HALE setup after loading source & target schema, source data

Mapping Process for PLU.ZoningElement

The operations to execute to map the original field of the shapefile to the properties of the types of the PLU target schema are basically rename, assign, date extraction, generation of unique ID and generation of the mandatory inspireID.

One of the most powerful operation in HALE is the possibility to integrate groovy script to map different properties.

Using the Groovy Script function to extract a date from a string

Using the Groovy Script function to extract a date from a string

Mapping Process for PLU.SpatialPlan

With the same source type it is then possible to map also the SpatialPlan. In this case, a merge operation has to be used to set a relation between the types. This function merges multiple instances of the source type into one instance of the target type based on one or more matching properties.

The basic operations for the mapping process are similar to the ones used with ZoningElement. HALE gives the possibility to generate the mandatory inspireID property which requires some more information on the origin of the data.

Using the Inspire Identifier Function in HALE

Using the Inspire Identifier Function in HALE

Link between ZoningElement and SpatialPlan

Since all land use polygons are part of one plan we can directly use the assign operation to assign plan → href in ZoningElement the name of the reference plan.

 

Mapping Process for PLU.OfficialDocumentation

A very important request in the Inspire schema for PLU is the link and reference to the official documentation of the plan. For the example only web links are used as listed in the CSV file. The basic operation to start with here is retype then other basic operation like rename, assign and generate ID are used to provide at least all the mandatory information.

Link between ZoningElement and OfficialDocumentation

Use a groovy script to assign the reference to the officialDocumentation of the features in ZoningElement, since ZoningElement contains two different update of the land use polygons, the official documentation will be different for the two and it is based on the date of the update.

Link between SpatialPlan and OfficialDocumentation

Since all the documents included in the OfficialDocumentation are related to the plan it is necessary to link all of them to SpatialPlan → officialDocumentation → href. In this case we have to create two other new instances of the same property officialDocumentation and assign to each of them the link to the documents .

 

Mapping Process for PLU.SupplementaryRegulation

The last information to be mapped is the additional regulation on hydro-geomorphological risk mapping. This information is stored in an other shapefile and the retype operations is used to map the contained elements to SupplementaryRegulation.

Link between SupplementaryRegulation and SpatialPlan

Since all risk zones polygons are part of one plan we can use directly assign operation to assign to plan → href in SupplementaryRegulation the name of the reference plan.

Link between SupplementaryRegulation and OfficialDocumentation

Since all risk zones polygons are part of the same update of the land use plan we can use directly assign operation to assign to officialDocument → href in SupplementaryRegulation the name of the reference documentation.

Final mapping for the SupplementaryRegulations type.

Final mapping for the SupplementaryRegulations type.

NOTE: the detailed description step by step of the alignment process is reported in the following presentation: http://www.slideshare.net/silli/2013-05-24hydrologisinspirepres

 

dhp-blogpost-standardimage

HALE 2.6.0 brings integration with FME

Last week, we published the new HALE 2.6.0 release. There is one particular feature that we have added in collaboration with Safe Software that I’d like to highlight: The integration of HALE and the CST engine as a GML Writer in the upcoming FME 2014 release. This integration reflects what I and other had been doing for a while now:

  1. Start FME and use the readers to import from a wide range of formats such as an Esri File Geodatabase,
  2. Perform operations that are only possible in FME, such as Geometry calculations,
  3. then write out a simple feature style output schema using a GML or SHP writer,
  4. then start HALE to map the data to the actual target schema, e.g. an INSPIRE Application Schema,
  5. and lastly, use HALE/CST to create the final GML product.

In other words, you can use HALE to either perform the mapping to the complex schema after all other data preparation in FME, or you create a subset of a complex data set before you go to FME. To make the new integrated workflow work, you’ll need FME 2014 and HALE 2.6.0 installed. These are the steps after installation of both tools:

1. Start FME. Add readers and transformers as needed.

2. Add the HALE GML/XML Writer.

fmehale01

3. Open the Parameters for the HALE GML/XML Writer and set at least a *.halex project file location and the path to the HALE executable:

fmehale02

4. Add a Feature Type to the Writer, e.g. by right-clicking on the canvas and selecting “Insert Feature Type…”. Either import a Feature type from any data set or, after creating the Feature type, manually define its schemas using the “User Attributes” tab in the Feature Type’s properties.

5. Connect the last transformer in the workspace to the writer as appropriate to have a complete workspace such as this one:

fmehale03

6. Set any necessary additional writer attributes, such as the number of features you want to use in HALE for interactive transformation. Also pay specific attention to the “Execution Mode” setting. “Schema Transformation” will just execute the project given in Step 3, while “Update Mapping” will launch the HALE UI to enable you to create/update a halex project. “Auto” switches between these two modes depending on whether a file is already present in the indicated halex location:

fmehale04

7. Execute the workbench, and see either HALE or CST get fired up in the final writing phase!

A press release giving more information is also available.