FSSI File Processor

The data warehouse swiss army knife

View project on GitHub

Schema Guide


Schemas are the centerpoint of the FSSI File Processor. They contain all of the business rules for the validator organizer functions which are the meat and potatoes of the whole system.

At runtime, Schemas are loaded into the system. Then, when a source file is being processed, if a schema is associated (via Provider object), it is personalized and then binded to the source file object. This allows the FSSI File Processor to be more flexible in

leveraging the principles of CSV Lint and based upon the JSON Table Schema RFC, The FSSI File Processor Schemas are written in XML.

Features

Here are some of the things that set our schemas and validation process apart from other tools and services:

  1. Multiple Error Levels: Constraints and Types can be configured to fail as a Warning, Error, or Fatal. This helps teams better define which issues are imporant.
  2. Effective Date Based Validations: As business rules change for providers over time, your schemas can too. Constraints can be configured with effective dates that will actually tailor the schema at runtime to only enforce validations relevant to that file.
  3. Alias' and Field Name Mappings: Sometimes, field names change over time. In other scenarios, different providers call the same field 2 different things. With Alias', you can actually create a data dictionary that will map fields to the correct field.

Basic Structure of a Schema

<schema>
    <fields>
        <field>
            <!-- Field 1 -->
            <constraints>
                <constraint_name><!-- Constraint 1 --></constraint_name>
                <constraint_name><!-- Constraint n --></constraint_name>
                </constraints>
            </field>
        <field><!-- Field 2 --></field>
        <field><!-- Field 3 --></field>
        <field><!-- Field n --></field>
        etc...
    </fields>
</schema>

As you can see, there is a parent->child relationship between the schemas, fields, and constraints objects. There is no limit as to the number of fields a schema can have, nor the number of constraints a field can have.

Schema Attributes


The schema consists of the following top level elements:

Name (Required)

Schemas are linked to source files by their name, which is documented in the provider file. Schema names should be unique. If 2 schemas have the same name, the FSSI-File-Processor will ignore the second. Name is a required field.

<name>example_schema</name>

Provider

In prior versions of the FSSI File Processor, providers and schemas were loosly coupled. After some thought, it made sense to not depend upon this field to physically map a schema to a provider. This is still, however, a valid field that could be useful for documentation purposes.

<provider>example_provider</provider>

Version

This is a user defined field. The system does not currently leverage it. This is still, however, a valid field that could be useful for documentation purposes.

<version>1.0</version>

Title

This is a user defined field. The system does not currently leverage it. This is still, however, a valid field that could be useful for documentation purposes.

<title>1.0</title>

Fields - Required

Fields defines all of the field objects within the schema. Please see the "Field Attributes" section for more information. Fields are required.

<fields>
    <field>
        <name>field 1</name>
    </field>
    <field>
        <name>field 2</name>        
    </field>
</fields>

Field Attributes


        <description>Each order should be identified by a unique order identification number</description>
        <constraints>
            <required level="error">true</required>
            <maxLength level="error">30</maxLength>
        </constraints>
        <alias>Ord_Num</alias>

Name - Required

This is the standard or dictionary name for the field. Used for mapping to the source file as well as outputting to a staged file. Though the system does not enforce strict naming standards, it behooves the user to develop a consistant naming convention amongst fields. This is a Required field.

<name>PRODUCT_DESCRIPTION</name>

Title

This is the informal or long name for the field. Used for mapping to the source file as well as outputting to a staged file. The system does not currently leverage this field. This is still, however, a valid field that could be useful for documentation purposes.

<title>Product Description</title>

Description

This is a user defined field. The system does not currently leverage this field. This is still, however, a valid field that could be useful for documentation purposes.

<description>Description of the product as provided by the supplier.</description>

Alias

Alias' are used to map variations and permutations of field names to a standard field name. If you have a provider that sends files with little tweaks and difference in the field names, you can use Alias' to map them all to the standard Name.

<alias>agn_code</alias>

A more practical usage of Alias can be seen below:

<field>
    <!-- First we have our standard field name -->
    <name>AGENCY_CODE</name>
    <!-- now we put in a list of alias' that map to that standard field name. -->
    <alias>agn_code</alias>
    <alias>agency_dc</alias>
    <alias>agency code</alias>
    ...etc
</field>

Anytime a file had a field with a permutation (agn_code, agency_cd, agency code), it would map to the AGENCY_CODE standard name.

NOTE: If 2 fields have the same alias', the FSSI File Processor will only use the first one.

Type

A field’s type attribute is a string indicating the data type of this field.

The Type field is Optional. If a type is not defined, then the type is defaulted to any (AKA string). Additionally, if type is not defined, format ignored.

The system recognizes the following data types.

Any

Default format when another is not applied.

<type>any</type>

String

A string (of arbitrary length). Example: "Hello World"

<type>string</type>

Number

A number including floating point numbers. Example: 123456.99

<type>number</type>

Integer

An integer. No floating point numbers. If floating points are provided, this will fail validation. Example: 123456.

<type>integer</type>

Date

a date. This MUST be in ISO6801 format "YYYY-MM-DD" or, if not, a format field must be provided describing the structure. Date validation is strict and format should be based upon Joda Time Format. Example: 2014-02-28

NOTE: There is currently an ancillary type validation in place that will fail records that are + or - 50 years from system date. This was designed to compensate for ussues related to poorly designed date formats that don't have a separator.

<type>date</type>

A more practical usage of the Date Type can be seen below:

<field>
    <name>ORDER_DATE</name>
    <!-- First, we define the type -->
    <type>date</type>
    <!-- Now we define the format.  -->
    <type>yyyy-MM-dd</type>
</field>

Type Error Levels

In addition to the different types of types that you can define, each type can have the error level modified. If no error level is provided, it is defaulted to "error".

Warning
<type_name level="warning"></type_name>
Error
<type_name level="error"></type_name>
Fatal
<type_name level="fatal"></type_name>

Constraint Attibues

Each field can have many constraints. Constraints, in addition to types, control the validation process. Think of constraints as business rules that you want the data to meet.

Required

A boolean value which indicates whether a field must have a value in every row of the table. An empty string is considered to be a missing value.

Options * true - data is required * false - data is not required

Example:

<!-- This field is required -->
<required>true</required>

Minimum Length

An integer that specifies the minimum number of characters for a value.

Example:

<!-- field must be 7 characters long -->
<minLength>7</minLength>

Maximum Length

An integer that specifies the maximum number of characters for a value.

Example:

<!-- data can be a maximum of 7 characters long -->
<maxLength>7</maxLength>

Pattern

A regular expression that can be used to test field values. If the regular expression matches then the value is valid. Values will be treated as a string of characters. It is recommended that values of this field conform to the standard XML Schema regular expression syntax.

NOTE: Regular expressions are not case sensitive

Example Either/Or:

<!-- data can only either be Yes or No -->
<pattern>Yes|No</pattern>

Minimum

Specifies a minimum value for a field. This is different to minLength which checks number of characters. A minimum value constraint checks whether a field value is greater than or equal to the specified value. The range checking depends on the type of the field. E.g. an integer field may have a minimum value of 100; a date field might have a minimum date. If a minimum value constraint is specified then the field descriptor MUST contain a type key

Example with Number:

<!-- Must be above the number 6 -->
<minimum>7</minimum>
Example with String:
<!-- String must be at least 7 characters long -->
<minimum>7</minimum>

Example with Date:

NOTE: Dates must be in ISO 8601 yyyy-MM-dd format

<!-- Dates must come on or after January 1st, 2014 -->
<minimum>2014-01-01</minimum>

Maximum

Specifies a maximum value for a field. This is different to maxLength which checks number of characters. A maximum value constraint checks whether a field value is less than or equal to the specified value. The range checking depends on the type of the field. E.g. an integer field may have a maximum value of 100; a date field might have a maximum date. If a maximum value constraint is specified then the field descriptor MUST contain a type key

Example with Number:

<!-- Must be below the number 8 -->
<maximum>7</maximum>

Example with String:

<!-- String must be at less than 8 characters long -->
<maximum>7</maximum>

Example with Date:

NOTE: Dates must be in ISO 8601 yyyy-MM-dd format

<!-- Dates must come on or after January 1st, 2014 -->
<maximum>2014-01-01</maximum>

Constraint Effective Dates

As programs mature and age, it is common that business rules age. The FSSI File Processor handles this by allowing each constraint to be modified by an "effective date". At runtime, the FSSI File Processor looks at the reporting period of the file and dynamically personalizes the constraints to only apply the business rules that were effective for that reporting period.

The Effective Date field must be put in ISO 1801 Format ('yyyy-MM-dd').

NOTE: If you have 2 of the same constraints with the same effectiveDate, the FSSI File Processor will only use the first one.

NOTE: Even though file reporting periods are in a month/year format, effective dates are required to be in a date format. When the schema is personalized it converts the effectiveDate to month/year and performs a date comparison.

Example:

<!-- If the source files reporting period is anytime before March of 2015, this constraint will be ignored -->
<required effectiveDate="2015-03-01">true</required>

Constraint Error Levels

In addition to the different types of constraints that you can create, each constraint can have the error level modified. If no error level is provided, it is defaulted to "error".

Warning

<constraint_name level="warning"></constraint_name>

Error

<constraint_name level="error"></constraint_name>

Fatal

<constraint_name level="fatal"></constraint_name>

Example Schemas

Example schemas can be found in the samplefiles/schema folder in Github.

XSD

When creating your schema, we highly recommend performing XSD Schema Validation (which can be easily done through a website like http://www.xmlvalidation.com/). The current version of the FSSI File Processor XSD can be found here: https://github.com/GSA/fssi-file-processor/blob/master/src/main/resources/schema.xsd.

This will allow you to quickly root out issues with the schema.