SQL DDL

This target derives SQL Data Definition Language (DDL) instructions from an application schema. The DDL can be used to set up an SQL database which can store application data compliant to the schema.

The overall workflow consists of four steps:

  1. Loading the model with the application schema
  2. Flattening the schema
  3. Handling keywords
  4. Deriving SQL DDL from the schema

Steps 2-4 are described in the following sections.

Pre-Processing: Flattening

In order to achieve a simple structure of tables, and also in order to cope with the potential complexity of the application schema, a number of flattening rules need to be executed:

  • rule-trf-cls-flatten-inheritance
  • rule-trf-prop-flatten-multiplicity
    • if associative tables shall be created, also enable the following subrules:
      • rule-trf-prop-flatten-multiplicity-keepBiDirectionalAssociations
      • rule-trf-prop-flatten-multiplicity-withMaxMultiplicityThreshold
  • rule-trf-prop-flatten-types
  • rule-trf-all-flatten-name

NOTE: the flattening rules and parameters are documented in detail on the Flattener transformation page.

The following parameters should be set for the flattening transformation:

  • maxOccurs: set the default value for representing properties with maximum cardinality > 1 (and a type that is not represented by a separate table)
  • flattenObjectTypes: setting this parameter to false will result in object types not being flattened, thus allowing the SQL DDL derivation process to generate instructions for creating tables not only for feature types but also for object types.
  • flattenDataTypesExcludeRegex: if the SQL DDL target shall create tables for particular data types, these data types can be excluded from type flattening.
  • codeByTaggedValue: setting this parameter allows you to indicate that the code to replace the name of a model element should not be looked up in the alias of that model element, but in the tagged value identified by the value of this parameter

Pre-Processing: Keyword Handling

If the application schema contains classes whose name is a SQL keyword (see this list, for example), the Naming Modifier transformation must be applied to change the names of these classes. The transformation simply adds a suffix to the class name, thus preventing a conflict with reserved SQL keywords.

NOTE: the same applies to property names. However, because the Naming Modifier transformation does not support modification of property names at the moment (Sep 29th, 2015), you need to avoid using SQL keywords as property names in your application schema – or postprocess the DDL created by ShapeChange.

Conversion to SQL DDL

Overview

In general, the target outputs a file with DDL statements for:

  • creating tables
  • altering tables
    • to add foreign key constraints
  • adding geometry metadata update statements
  • creating geometry indexes

Tables

For each type that is to be converted (as defined by the encoding rule) the target generates CREATE TABLE statements:

  • The (normalized) name of the type is used as the table name.
  • Primary key:
    • The name of the primary key column is defined via the idColumnName parameter (see parameters section).
    • The data type of the primary key column depends upon the database system:
      • PostgreSQL: bigserial
      • Oracle: integer
  • The conversion of properties into column definition statements is described in the following section.

Associative tables can also be created. The behavior for encoding various model constructs when creation of associative tables is enabled (and assuming that multiplicity was not totally flattened) is depicted in the following figure (click to enlarge):

relationship mapping with and without associative tables enabled

Property Conversion

A property is converted to a column definition statement as follows:

  • A derived property will not be encoded if rule-sql-prop-exclude-derived is enabled.
  • If the maximum cardinality of the property equals 1 a column definition statement is created; otherwise:
    • If the property belongs to a bi-directional association and the reverse property has maximum cardinality of 1 then we have an n:1 relationship in the model which can be represented.
    • Otherwise:
      • If rule-sql-all-associativetables is enabled we can represent the n:m relationship via an associative table.
      • Otherwise the property is ignored.
  • The data type for a column is defined as follows:
    • If the value type of the property is a type represented by a table, then the column contains a foreign key. As such, the data type is the same as that for primary keys as defined by the chosen database system. Furthermore, the (normalized) column name receives a suffix as specified via the foreignKeyColumnSuffix parameter (see parameters section).
      • NOTE: in order for a type to be recognized as being represented by a table, the following rules apply:
        • a map entry is defined for the type, containing the table parameter,
        • or:
          • the type is a feature, object, or data type that:
      • NOTE: via the createReferences parameter one can omit the creation of foreign key constraints. These constraints ensure referential integrity. For performance reasons (such as on insert or update, or to allow temporarily inconsistent data) a user may choose to omit the creation of these constraints. In any case, the data type of fields representing a property with a value type that is encoded as a table is still the one to represent primary keys (instead of just being a more generic textual data type). A foreign key can thus still be stored in the database.
        • Primary as well as foreign key constraints are always created for associative tables.
    • If a type mapping is available via the map entries set for this target (for further details, see the Map Entries section) then the type defined by the mapping is used.
      • A type mapping also provides information if a property is mapped to a geometry type. For fields with a geometry type, the target includes DDL statements to define a) the specific geometry type that applies for this field (POINT, POLYGON etc) and b) the (integer) identifier of the spatial reference system that shall be used to store geometric information (see srid in the parameters section)
    • If no type mapping is available then for properties with the value type being an enumeration, codelist, object or feature type (if the encoding rule does not cover the type) a textual data type is applied.
    • Otherwise “unknown” is set as data type.
  • If an initial value is set for the property in the UML model, this value is set as the DEFAULT.
  • If the property is NOT optional (minimum cardinality < 1), voidable, or nillable then “NOT NULL” is added to the column definition statement.
  • If the parameter createDocumentation is set to true, the documentation set for the property is appended to the column definition statement.

Normalization of Names

Database systems may enforce certain restrictions on names, for example table names. The target therefore normalizes names taken from the model when using them in DDL statements. Normalization is applied to names of: tables, columns, and indexes. In general, all occurrences of “.” and “-” are replaced by “_”. In addition, database system specific normalization is applied.

  • PostgreSQL: the name is converted to lower case.
  • Oracle: the name is converted to upper case and clipped to a length of 30 characters (omitting characters that exceed this range).

Textual Data Type with Limited or Unlimited Length

Database systems usually support textual data types with limited and unlimited length for table fields. In order to identify whether a textual field should have limited length or not, the target reviews the ‘size’ setting for the field. The value of ‘size’ is defined via the tagged value ‘size’ on the property, or by the value of the configuration parameter ‘size’ (which, if not explicitly set in the configuration, has a default value). If ‘size’ is 0 or negative, the data type for unlimited text size is chosen; otherwise it is for limited size (with the limit being defined by the value of ‘size’).

Geometry Metadata Update Statements

For some database systems, the target also creates statements to update geometry metadata. For Oracle databases, for example, these statements update the USER_SDO_GEOM_METADATA table.

Indexes

The target creates indexes for fields with a geometry value.

If needed, indexes for other fields must be added later on.

Configuration

Class

The class for the target implementation is de.interactive_instruments.ShapeChange.Target.SQL.SqlDdl.

Rules

An <EncodingRule> element defines an encoding rule.

Example 1:

<EncodingRule name="sql">
  <rule name="rule-sql-cls-feature-types"/>
</EncodingRule>

The name attribute of the <EncodingRule> element defines the identifier of the encoding rule to be used. The value of the target parameter defaultEncodingRule must contain this name.

NOTE: The encoding rule with name ‘sql’ is pre-configured in ShapeChange. It can thus be used as-is, without the need to explicitly define it in the configuration. It can also be extended as required (see example 2).

The optional extends attribute (shown in example 2) of the <EncodingRule> element includes all rules from the referenced encoding rule in this encoding rule, too.

Example 2:

<EncodingRule name="my_sql_rule" extends="sql">
  <rule name="rule-sql-all-associativetables"/>
  <rule name="rule-sql-prop-exclude-derived"/>
  <rule name="rule-sql-cls-data-types"/>
  <rule name="rule-sql-cls-object-types"/>
  <rule name="rule-sql-prop-check-constraints-for-enumerations"/>
</EncodingRule>

Each <rule> references either a conversion rule or – possibly in the future – a requirement or recommendation to be tested during the validation before the conversion process.

The default behavior of this target is described in the conversion section.

The following sections list the rules that are supported by this target.

rule-sql-all-associativetables

Contained in named encoding rule: none

Behavior:

This rule ensures that associative tables are created for cases in which an n:m relationship exists between types.

The name of the associative table is taken from the tagged value associativeTable – which exists either on an association or an attribute. If the tagged value is not present or empty, the name is created as follows:

  • If the table represents an n:m relationship represented by an association, then:
    • for a bi-directional association: the name of the class (from both ends of the association) that is lower in alphabetical order is used, concatenated with the according property name
    • for a uni-directional association: the name of the inClass of the navigable property is used, concatenated with the property name
  • If the table represents an n:m relationship that is caused by an attribute with max multiplicity greater than one, then the name of the class that the attribute belongs to is used, concatenated with the property name

Parameters: none

rule-sql-all-exclude-abstract

Contained in named encoding rule: none

Behavior: If this rule is enabled, abstract classes will be ignored by the target.

Parameters: none

rule-sql-all-foreign-key-oracle-naming-style

Contained in named encoding rule: none

Behavior: Under this rule, foreign key identifiers are generated as follows:
“fk_” + tableNameForFK + “” + targetTableNameForFK + “” + fieldNameForFK + count

where:

  • tableNameForFK is the name of the table that contains the field with the foreign key, clipped to the first eight characters
  • targetTableNameForFK is the name of the table that the field with foreign key references, clipped to the first eight characters
  • fieldNameForFK is the name of the field that contains the foreign key, clipped to the first eight characters
  • count is the number of times the foreign key identifier has been assigned; it ranges from 0-9 and can also be omitted, thus supporting eleven unambiguous uses of the foreign key identifier (NOTE: if the foreign key identifier is used more than eleven times, ShapeChange logs an error)

Parameters: none

rule-sql-cls-code-lists

Contained in named encoding rule: none

Behavior: Tables are generated for code lists. The first column will contain the initial value of the code, if set – otherwise, it will contain the code name. Other columns will contain descriptor information (see this page on further information on descriptors). Which descriptors are represented is controlled via the parameter descriptorsForCodelist.

NOTE: The name of the column containing the code name (or initial value) can be controlled via the parameter codeNameColumnName. By default, the type of this column will be textual with unlimited length. The parameter codeNameSize can be used to restrict the text length. The name and length of columns with descriptor information can be controlled via a specific metadata syntax of the descriptorsForCodelist parameter value.

Insert statements are created for the codes of a code list. As mentioned before, the first column (whose name can be controlled via the configuration parameter codeNameColumnName) contains the initial value of the code, if defined by the application schema, otherwise the name of the code. Properties with a code list as value type will be converted to fields with foreign key type.

Parameters:

rule-sql-cls-data-types

Contained in named encoding rule: none

Behavior: Ensures that table creation statements are generated for data types.

Parameters: none

rule-sql-cls-feature-types

Contained in named encoding rule: sql

Behavior: Ensures that table creation statements are generated for feature types.

Parameters: none

rule-sql-cls-object-types

Contained in named encoding rule: none

Behavior: Ensures that table creation statements are generated for object types.

Parameters: none

rule-sql-cls-references-to-external-types

Contained in named encoding rule: none

Behavior: If this rule is enabled, then a property whose type is neither covered by a type mapping entry nor contained in the currently processed schema – but in the overall model – is still encoded as a field with a foreign key – if other rules allow table creation for this type. Otherwise the field is encoded using a textual data type.

Parameters: none

rule-sql-prop-check-constraints-for-enumerations

Contained in named encoding rule: none

Behavior: Ensures that CHECK constraints are created for fields representing enumeration values.

Parameters: none

rule-sql-prop-exclude-derived

Contained in named encoding rule: none

Behavior: If this rule is enabled derived properties will be ignored.

Parameters: none

Parameters

The <targetParameters> recognized for this target are described in the following sections.

NOTE: The following parameter(s) should be set in the <input> element of the ShapeChange configuration:

  • addTaggedValues – If the parameter ‘codeByTaggedValue’ is set for the flattening transformation then the tagged value identified by the parameter value should be added via the input parameter ‘addTaggedValues’.

codeNameColumnName

Alias: none

Required / Optional: optional

Type: String

Default Value: ‘name’

Explanation: This parameter controls the name of the column that contains the name or – if available – the initial value of a code. NOTE: The column name will be normalized according to the rules of the chosen database system.

Applies to Rule(s): rule-sql-cls-code-lists

codeNameSize

Alias: none

Required / Optional: optional

Type: Integer

Default Value: none (meaning that the name can have any length)

Explanation: This parameter controls the maximum length that a code name may have. This influences the type of the required field of a code list table that contains the name of a code.

Applies to Rule(s): rule-sql-cls-code-lists

createDocumentation

Alias: none

Required / Optional: optional

Type: Boolean

Default Value: true

Explanation: True if model documentation shall be added to the DDL via comments, else false.

Applies to Rule(s): none – default behavior

createReferences

Alias: none

Required / Optional: optional

Type: Boolean

Default Value: false

Explanation: True if the creation of foreign key constraints is desired, else false. The constraints are used to ensure referential integrity. However, in some cases it can be advantageous to omit them to improve performance.

Applies to Rule(s): none – default behavior

databaseSystem

Alias: none

Required / Optional: optional

Type: String

Default Value: PostgreSQL

Explanation: Identifier of the database system for which SQL DDL shall be created. Supported systems – and also relevant identifiers – are:

  • PostgreSQL
    • NOTE: the geometry format used in the generated DDL requires PostGIS 2 or higher
  • Oracle
    • NOTE: The DDL produced by ShapeChange has been tested with Oracle version 12.1. (Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production)

Applies to Rule(s): none – default behavior

defaultEncodingRule

Alias: none

Required / Optional: optional

Type: String

Default Value: sql

Explanation: The identifier of the default encoding rule governing the conversion into SQL DDL.To use a custom encoding rule defined in the configuration, simply provide the name of the custom encoding rule via this parameter.

Applies to Rule(s): none – default behavior

descriptorsForCodelist

Alias: none

Required / Optional: optional

Type: String (with specific syntax)

Default Value: ‘documentation’

Explanation: Comma-separated list of descriptors that shall be encoded as individual columns in a table representing a code list. The descriptors are specified by their identifier (‘alias’, ‘definition’, ‘description’, ‘example’, ‘legalBasis’, ‘dataCaptureStatement’, ‘primaryCode’). In addition to the descriptors, the following identifiers can be used:

  • ‘documentation’ – to include documentation that is derived from descriptors using the parameters documentationTemplate and documentationNoValue
  • ‘name’ – to include the name of the code again, if necessary

An identifier can have metadata attached to it (in parentheses). Each metadata item is given as a key-value-pair. Metadata items are separated by a semicolon. The following metadata keys are supported:

  • columnName – alternative name to be used as name for the column representing the specific code property (descriptor, documentation, name)
  • size – the maximum length that the column representing the specific code property (descriptor, documentation, name) may have

EXAMPLE:

<targetParameter name="descriptorsForCodelist" value="name(columnName=MY_NAM;size=255),definition(columnName=MY_DEF;size=255)" />

Applies to Rule(s): rule-sql-cls-code-lists

documentationNoValue

(since 2.0.2)

Alias: none

Required / Optional: optional

Type: String

Default Value: “” (i.e., the empty string)

Explanation: If a descriptor is used in the documentation template, but has no value, this parameter will be used. See here for more information about descriptors.

Applies to Rule(s): none – default behavior

documentationTemplate

(since 2.0.2)

Alias: none

Required / Optional: optional

Type: String

Default Value: [[definition]]

Explanation: The template for the documentation that is placed in the notes field in the EA model. The patterns “[[descriptor]]” will be replaced by the value of the descriptor, or the value of documentationNoValue, if the descriptor has no value for the model element. See here for more information about descriptors.

Applies to Rule(s): none – default behavior

foreignKeyColumnDatatype

Alias: none

Required / Optional: optional

Type: String

Default Value: the primary key type defined by the database strategy, e.g. ‘bigserial’ for PostgreSQL and ‘INTEGER’ for Oracle

Explanation: Datatype to use for foreign key fields, for example ‘bigint’ in case of a PostgreSQL database.

Applies to Rule(s): none – default behavior

foreignKeyColumnSuffix

Alias: none

Required / Optional: optional

Type: String

Default Value: <the empty string>

Explanation: Suffix to append to the name of columns that contain foreign keys (except if it references a table that represents a data type).

Applies to Rule(s): none – default behavior

foreignKeyColumnSuffixDatatype

Alias: none

Required / Optional: optional

Type: String

Default Value: <the empty string>

Explanation: Suffix to append to the name of columns that contain foreign keys referencing tables that represent data types.

Applies to Rule(s): none – default behavior

idColumnName

Alias: none

Required / Optional: optional

Type: String

Default Value: _id

Explanation: Name for the identifier column when generating table creation statements.

Applies to Rule(s): none – default behavior

outputDirectory

Alias: none

Required / Optional: optional

Type: String

Default Value: <the current run directory>

Explanation: The path to the folder in which the resulting SQL DDL file will be created.

Applies to Rule(s): none – default behavior

size

Alias: none

Required / Optional: optional

Type: Integer

Default Value: 1024

Explanation: Size for fields representing textual properties with limited length, to be used in case that the property represented by the field does not have a ‘size’ tagged value.

Applies to Rule(s): none – default behavior

srid

Alias: none

Required / Optional: optional

Type: Integer

Default Value: 4326

Explanation: EPSG code of the spatial reference system to use for geometries.

Applies to Rule(s): none – default behavior

Map Entries

<mapEntries> contain individual <MapEntry> elements, which for this target contain information for mapping specific types (classes) from the UML model to SQL.

Examples:

<mapEntries>
  <MapEntry type="CharacterString" rule="*" targetType="cond:textOrCharacterVarying" param=""/>
  <MapEntry type="Boolean" rule="*" targetType="boolean" param=""/>
  <MapEntry type="GM_Point" rule="*" targetType="POINT" param="geometry"/>
</mapEntries>

A <MapEntry> element contains the attributes described in the following sections.

NOTE: The file StandardSqlMapEntries-PostgreSQL.xml defines mappings for a number of types of the ISO Harmonized Model when deriving SQL DDL for a PostgreSQL database. The file StandardSqlMapEntries-Oracle.xml does the same in the case of an Oracle database. This file can be included in ShapeChange configuration files (via XInclude). Additional XInclude files, or individual <MapEntry> elements added to the <mapEntries> section of the configuration file, may be used to customize the map entries to support additional pre-defined conceptual UML classes.

type

Required / Optional: Required

Explanation: In general, the UML type/class name to be mapped.

 

rule

Required / Optional: Required

Explanation: The encoding rule to which this mapping applies. May be “*” to indicate that the mapping applies to all encoding rules.

 

targetType

Required / Optional: Required

Explanation: Name of the type to use in the DDL statement(s).Use ‘cond:’ to indicate that conditional type selection needs to be performed, with the following string identifying the nature of the type mapping. Currently supported are the following conditions:

  • textOrCharacterVarying – use this type to indicate that the UML type should be mapped to a textual type, the length of which depends upon the setting of ‘size’ for this property (as described before).

 

param

Required / Optional: Optional

Explanation: Defines one or more parameters for the mapping. If no parameter is provided (leaving the ‘param’ attribute empty) then the map entry contains a straightforward mapping, usually to a simple type.

Each parameter has a name. A list of parameters is separated by commas. Each parameter can also have characteristics defined for it, providing even further information for the conversion to SQL. Characteristics for a parameter are provided within curly braces. A characteristic usually is provided by a key-value pair, with the key being the identification of the characteristic.

Examples:

  • geometry
  • geometry{layer_gtype=line}
  • table
  • table{representedCategory=datatype}

Supported parameters, their interpretation as well as characteristics are described in the following sections.

Parameter: geometry

Explanation: Indicates that the map entry contains a mapping for a geometry type (specifically from ISO 19107). IMPORTANT: if geometry types are not identified via map entries containing this parameter then ShapeChange cannot determine for which table fields geometry indexes need to be established!

Characteristics:

  • layer_gtype:
    • Applies to database type: Oracle
    • Description: the value specifies the exact geometry type to assign to a field with geometry type when building a spatial index for it. This is necessary because the restriction to specific geometry types within an Oracle spatial database is specified via the spatial index. The type of a field with geometry type is always SDO_GEOMETRY.

Parameter: table

Explanation: Indicates that the map entry contains a mapping for a type that is represented by a table. This is important for correctly setting the type of fields representing properties with this type as value, as well as creating foreign key constraints.

Characteristics:

  • representedCategory:
    • Applies to database type: all
    • Description: the value identifies the category of the conceptual type that is identified by the map entry. Recognized values are (currently there is only one):
      • datatype

 

Configuration Example

<Target class="de.interactive_instruments.ShapeChange.Target.SQL.SqlDdl" inputs="step2"
  mode="enabled">
  <targetParameter name="defaultEncodingRule" value="sql"/>
  <targetParameter name="outputDirectory" value="testResults/sql/ddl"/>
  <targetParameter name="size" value="-1"/>
  <targetParameter name="srid" value="31467"/>
  <targetParameter name="createReferences" value="true"/>
  <targetParameter name="createDocumentation" value="true"/>
  <xi:include href="http://shapechange.net/resources/config/StandardSqlMapEntries-PostgreSQL.xml"/>
</Target>