POSC Specifications
Version 2.3
Epicentre Subsetting and Extension

Subsetting the Epicentre Logical Data Model

This document focuses on the problem of creating subsets of the Epicentre Logical Data Model. It is intended to define the POSC standard subsetting operations which may be applied to the complete Epicentre Data Model to produce valid Epicentre Logical Subsets.

This document does not discuss in detail the methods that can be used to decide what scope a subset should have. Many in industry have advocated the use of Common Business Objects (CBOs) to delineate the scope of functionality required by users operating in the same business domain1. By mapping a CBO to the Epicentre entities that represent it, one can discover the portions of Epicentre that must be in a subset to support the business functions the CBO embodies. A complete discussion of this topic is beyond the scope of this paper.

Preface

Creating a physical implementation of the Epicentre Logical Data Model can be divided into three main steps:

Step 1: Subsetting Operations
Subset the logical model as necessary based on the scope of data the target implementation is intended to hold.
Step 2: Projection Operations
Project the subsetted logical model to a physical schema. In the case of a relational implementation, this consists of creating a DDL file of CREATE TABLE statements which define the valid tables and columns that represent the logical subset.
Step 3: Data Integrity Enforcement
Apply the referential data integrity enforcement mechanism for the data store. This can be accomplished by applying integrity constraints and triggers in the relational database management system (RDBMS), applying integrity constraints through the DAE compatibility layer and/or documenting procedures to be followed by the people and applications that maintain the data.

Steps of Subsetting

Subsetting Operations

Applying the valid subsetting operations2 defined herein to the Epicentre Logical Data Model results in subsets which modify model behavior. Because of this, Epicentre subsets do not precisely fit the classical definition of a subset3. However, modifying model behavior creates the opportunity for simplifying physical implementations of Epicentre. At the same time, it has implications for application portability and interoperability and data portability (see Consequences of Subsetting).

This section defines what constitutes a valid subsetting operation and specifies a set of valid, POSC-supported subsetting operations.

Criteria for Identifying Valid Subsetting Operations

Uploadable Subset Requirement

One requirement a subsetting operation must meet is that applying it will result in a recoverable subset as specified by the POSC Board of Directors Project Work Order Team (PWOT)4. The PWOT used the term recoverable subset to mean that data can be moved from a subset to the complete Epicentre model and back again without loss of semantics. POSC is choosing to use the term uploadable subset for this criteria, as it is more descriptive of the supported function.

DAE Compatibility Layer Requirement

The PWOT also recommended that POSC design and specify a DAE compatibility layer, provide a DAE sample implementation> and encourage its development as a commercial product. This recommendation supports the following requirement: the application of a valid subsetting operation must not result in a subset which cannot support a POSC-compliant DAE implementation.

Valid Subsetting Operations

The subsetting operations described below are capable of producing simpler logical subsets of Epicentre which result in less complex physical implementations. They are also meet the requirements of producing uploadable subsets which permit the potential utilization of a commercial compatibility layer implementation of the DAE specifications.

Valid subsetting operations are described in each of the subsections below with examples where necessary. Each description is followed by the formal definition of a subsetting operation or set of operations that may be applied to modify the EXPRESS file defining the complete Epicentre Logical Data Model. Each formal operation is prefixed with the letter 'S' and a number. These are the only kinds of operations that may be applied to the complete Epicentre EXPRESS file to create a valid subset.

Element Subsetting

Epicentre Version 2.0 contained rules for subsetting which allowed for removing optional attributes and removing entities not having mandatory relationships to them from entities in the desired subset. This type of subsetting, which simply removes elements that are in the inclusive set, is referred to as element subsetting5. Element subsetting allows one to exclude discipline-specific portions of Epicentre, such as seismic exploration, and retain other entities directly related to the applications the subset is being designed to support, such as core sample analysis.

Functional Subsetting

An EXPRESS modeling concept that can be used to build valid subsets of Epicentre is to allow model characteristics to be more constrained. This type of subsetting is referred to as functional subsetting. POSC has identified one valid type of functional subsetting: constrained relationships.

Constrained Relationships

Epicentre has many more relationships than most legacy data models used in the E&P industry today. These additional relationships allow Epicentre to keep track of more information related to the complete life cycle of E&P data (e.g., the versioning of properties). To reduce the complexity of Epicentre, some of these relationships can be simplified by constraining, for example, a one-to-many relationship in the complete model to a one-to-one relationship in the subset6.

Population Subsetting

Model constraints may also include operations that define subsets of Epicentre based on restricted populations of standard instance data. This technique is referred to as population subsetting.

Constrained Reference Entity Populations

The population of instances of a reference entity may be constrained to contain a subset of the POSC standard reference instances. As a simple example of constrained reference populations, consider an entity which specified the kind of existence of an instance (e.g., planned, actual, required, predicted). One could create a subset that constrained standard instances of that entity to be those where its identifier had a value of 'actual'. Under this rule, every instance of any entity having a mandatory relationship to the existence kind entity could only be instantiated as being an actual object and not as being a planned, required or predicted object.

Constrained Non-Reference Entity Populations

The population of a non-reference entity may be constrained based on the values of POSC standard instances in reference entities to which it has mandatory relationships. As an example, one might wish to restrict a well entity to contain only instances with an existence kind of actual, while at the same time letting other entities with a mandatory relationship to the existence entity to have instances which are actual, planned, required or predicted. In that case, one can constrain the population of wells with a rule on the well entity specifying that it can only be instantiated with a relationship to an instance of the existence entity, where its identifier is 'actual'.

Consequences of Subsetting

The consequences of subsetting are discussed below in terms of access to the subset at the logical level.

Potential Epicentre Computing Environment

To discuss the effects of subsetting, it is necessary to understand the ways in which Epicentre will be implemented. The figure below illustrates a potential computing environment that includes complete and subset implementations of Epicentre.

Four scenarios are depicted:

  1. Scenario A: A subset of the Epicentre logical model is implemented on an RDBMS with the compatibility layer running on the complete logical model and having knowledge of the subset logical model.
  2. Scenario B: The complete Epicentre logical model is implemented on a Hybrid DBMS (relational/object) with the DAE.
  3. Scenario C: The complete Epicentre logical model is implemented on an RDBMS with the compatibility layer on top.
  4. Scenario D: A logical subset of Epicentre is implemented on an RDBMS with the compatibility layer implemented directly on the subset.

Potential Epicentre Computing Scenarios

Application Isolation/Portability7

Applying any subsetting operation to the complete Epicentre model causes behavioral changes in the resulting subset by modifying or removing entities. Therefore, regardless of the computing environment in which a subset is implemented, applications accessing that subset which were designed to run against the complete Epicentre model or a different Epicentre subset will not be isolated from the effects of subsetting operations on entities they utilize.

The compatibility layer will be able to assist application developers in determining which entities have been modified and removed if it is implemented on the complete model as in Scenario A8. If the compatibility layer were implemented directly on a subset, as in Scenario D, however, it would not be able to provide any assistance in determining the differences between the subset and the complete model. POSC recommends that Epicentre subsets not be implemented as shown in Scenario D.

Applications designed to access a particular logical subset through the compatibility layer should be portable to other implementations of the exact same logical subset.

Application Interoperability9

Applications designed to work on the complete Epicentre model or a different subset may or may not interoperate with applications designed to work on specific subset. The problem is the same as for application portability. If the entities in subset A that an application utilizes have been modified by subsetting operations, the application may exhibit impaired functionality or not work at all. It is difficult to see how interoperability could consistently be achieved between applications designed to work on a particular subset and applications designed to work on the complete model or another subset.

Different applications designed to use the same logical subset should be able to fully interoperate with each other.

Data Portability10

Long Transactions

When data is transferred from a complete Epicentre data store to a subset data store the situation generally represents a long transaction11. The long transaction problem is not restricted to the subject of subsetting, but applies to transfers between complete Epicentre data stores. Long transaction problems are out of scope for this discussion, as POSC specifications do not address the rules or mechanisms for handling them. It is important to recognize, however, that changes to either the complete data store or the subset data store could cause the transferred data to get out of sync with the data in the source data store, thus creating situations where data is lost or its meaning is modified.

Logical Portability Issues

Data being transferred from a complete Epicentre data store to a subset data store may have to be reduced in scope, depending on which entities have been modified in or removed from the subset. This reduction in semantics can be relatively benign, e.g., a subset may only allow one well location per well, or it could be significant, e.g., a subset may only allow one wellbore per well. In the latter case, for instance, one could no longer determine if a well had been sidetracked.

Data being populated in a subset data store, whether coming from another Epicentre data store (complete or subset) or some non-Epicentre data store, could be populated in ways which are inconsistent with the complete Epicentre model. For example, in the case of a subset that supports only one wellbore per well, a well having two wellbores could be handled in three ways: 1) the well could be rejected as inappropriate to store in the subset; 2) one of the wellbores could be chosen to be stored, rejecting the other; or 3) two instances of well could be created, one having the first wellbore and the other having the second wellbore.

Clearly, method three above is incorrect with respect to the complete Epicentre model -- even though there is no mechanical problem storing the subset data in the complete data store. A simple transfer of this data to a complete data store would present problems in reconciling its meaning. To transfer the data correctly to the complete model would require that the relationship of the two wellbores to one well be reestablished.

Method two is correct with respect to the complete Epicentre model, but demonstrates a significant loss of information about the well.

Method one seems to be the most correct from a theoretical point of view, but may not represent a practical solution in the real world.

Physical Portability Issues

Moving data from a subset to any other environment requires that the complete Epicentre semantics be restored to the data before it is transferred. POSC's recommended solution for this requirement is to build the compatibility layer on the complete logical model with knowledge of the mapping to the logical subset. In this environment, the compatibility layer could be used to build an Epigramme exchange file (which is defined in the context of the complete Epicentre logical model) for moving data from the subset to any other Epicentre implementation.

Logically, moving data from a particular subset to another subset having the exact same specifications could be done without bringing the data back into the context of the complete model. Exchanges of data between like subsets on the physical level must conform to the constraints imposed by the subsetted logical model.

Subsetting Recommendations

The following are POSC's recommendations regarding Epicentre subsets:

  1. To achieve the goal of simpler and more performant POSC data stores, POSC is developing subsetting operations which enable the creation of uploadable subsets of Epicentre.
  2. The subsetting operations outlined by POSC in this paper are the only valid subsetting operations that may be applied to the complete Epicentre model to create subsets.
  3. To ensure maximum portability and interoperability of applications between different implementations of Epicentre, data store providers are advised to implement the DAE on the complete Epicentre logical model with a known mapping to the underlying logical subset.
  4. To ensure maximum portability and interoperability of applications between different implementations of Epicentre, application developers are advised to:
    1. Write against the complete Epicentre model using the DAE for data access.
    2. Design applications to dynamically handle exceptions which result from Epicentre subsetting operations by using new functions provided in the DAE.

Notes:

1 The GEMSIG group has described a CBO as a single definition of a thing that services the needs of a set of activities. The definition is not specific to a given representation. It is a user sensible description, not one which arises from the world of computer data representations. And, the description is intended to define the state of a thing as opposed to its activities or attributes. For more information on GEMSIG's concept of a CBO, see their World Wide Web Page at http://www.sas.com/standards/GEMSIG/GEMSIG.html.

2 Beginning with POSC Specifications for Version 2.1, the Epicentre deliverables will serve as the official documentation for the valid subsetting operations.

3 A subset, as defined by Webster's Ninth New Collegiate Dictionary, is "a set each of whose elements is an element of an inclusive set". With regard to Epicentre, the elements of the set are Epicentre's entities, attributes (which includes the relationships between entities following the rules of EXPRESS), rules and POSC-specified standard reference data. The inclusive set is the complete set of Epicentre entities, attributes, rules and standard reference data. Applying the subsetting operations outlined in this paper may modify the elements of the subset by removing optional attributes, changing attributes, changing model rules, etc. Therefore, some elements of a subset could have modified behavior and would not be considered exact copies of the corresponding element of the complete set.

4 Recommendations of the POSC Board of Directors Project Work Order Team (PWOT), May 15, 1995.

5 Element subsets of Epicentre can introduce behavior changes to entities by removing optional attributes and, therefore, do not strictly fit the classical definition of subsets.

6 The upper bounds of the constrained relationship does not have to be one, it may be any positive integer.

7 In the context of subsetting operations, application isolation/portability refers to the ability to isolate an application from changes in the underlying Epicentre subset relative to the complete model.

8 POSC will need to define new functions in the DAE specifications which will assist users in determining what entities and attributes exist in a subset relative to the complete Epicentre model.

9 In the context of this subsetting operations discussion, application interoperability refers to the ability for different applications to logically interoperate asynchronously through the data store. Other types of interoperability are not excluded, they are simply not in the scope of this discussion.

10 In the context of subsetting operations, data portability refers to the ability to logically move data between Epicentre implementations without loss of information or change of meaning.

11 A long transaction is characterized by situations where data may be checked out of one environment for an extended period of time, perhaps modified or added to, then the data is checked back into the original environment. Many things can change in either one or both of the environments managing the original data and checked out data which could make it difficult to successfully execute the check in process.


© Copyright 1997-2001 POSC. All rights reserved.