POSC Specifications Version 2.3 |
Epicentre Subsetting and Extension |
This document focuses on the problem of creating subsets of the Epicentre Logical Data Model. It is intended to define the POSC standard subsetting operations which may be applied to the complete Epicentre Data Model to produce valid Epicentre Logical Subsets.
This document does not discuss in detail the methods that can be used to decide what scope a subset should have. Many in industry have advocated the use of Common Business Objects (CBOs) to delineate the scope of functionality required by users operating in the same business domain1. By mapping a CBO to the Epicentre entities that represent it, one can discover the portions of Epicentre that must be in a subset to support the business functions the CBO embodies. A complete discussion of this topic is beyond the scope of this paper.
Creating a physical implementation of the Epicentre Logical Data Model can be divided into three main steps:
Applying the valid subsetting operations2 defined herein to the Epicentre Logical Data Model results in subsets which modify model behavior. Because of this, Epicentre subsets do not precisely fit the classical definition of a subset3. However, modifying model behavior creates the opportunity for simplifying physical implementations of Epicentre. At the same time, it has implications for application portability and interoperability and data portability (see Consequences of Subsetting).
This section defines what constitutes a valid subsetting operation and specifies a set of valid, POSC-supported subsetting operations.
One requirement a subsetting operation must meet is that applying it will result in a recoverable subset as specified by the POSC Board of Directors Project Work Order Team (PWOT)4. The PWOT used the term recoverable subset to mean that data can be moved from a subset to the complete Epicentre model and back again without loss of semantics. POSC is choosing to use the term uploadable subset for this criteria, as it is more descriptive of the supported function.
The PWOT also recommended that POSC design and specify a DAE compatibility layer, provide a DAE sample implementation> and encourage its development as a commercial product. This recommendation supports the following requirement: the application of a valid subsetting operation must not result in a subset which cannot support a POSC-compliant DAE implementation.
The subsetting operations described below are capable of producing simpler logical subsets of Epicentre which result in less complex physical implementations. They are also meet the requirements of producing uploadable subsets which permit the potential utilization of a commercial compatibility layer implementation of the DAE specifications.
Valid subsetting operations are described in each of the subsections below with examples where necessary. Each description is followed by the formal definition of a subsetting operation or set of operations that may be applied to modify the EXPRESS file defining the complete Epicentre Logical Data Model. Each formal operation is prefixed with the letter 'S' and a number. These are the only kinds of operations that may be applied to the complete Epicentre EXPRESS file to create a valid subset.
Epicentre Version 2.0 contained rules for subsetting which allowed for removing optional attributes and removing entities not having mandatory relationships to them from entities in the desired subset. This type of subsetting, which simply removes elements that are in the inclusive set, is referred to as element subsetting5. Element subsetting allows one to exclude discipline-specific portions of Epicentre, such as seismic exploration, and retain other entities directly related to the applications the subset is being designed to support, such as core sample analysis.
An EXPRESS modeling concept that can be used to build valid subsets of Epicentre is to allow model characteristics to be more constrained. This type of subsetting is referred to as functional subsetting. POSC has identified one valid type of functional subsetting: constrained relationships.
Epicentre has many more relationships than most legacy data models used in the E&P industry today. These additional relationships allow Epicentre to keep track of more information related to the complete life cycle of E&P data (e.g., the versioning of properties). To reduce the complexity of Epicentre, some of these relationships can be simplified by constraining, for example, a one-to-many relationship in the complete model to a one-to-one relationship in the subset6.
Model constraints may also include operations that define subsets of Epicentre based on restricted populations of standard instance data. This technique is referred to as population subsetting.
The population of instances of a reference entity may be constrained to contain a subset of the POSC standard reference instances. As a simple example of constrained reference populations, consider an entity which specified the kind of existence of an instance (e.g., planned, actual, required, predicted). One could create a subset that constrained standard instances of that entity to be those where its identifier had a value of 'actual'. Under this rule, every instance of any entity having a mandatory relationship to the existence kind entity could only be instantiated as being an actual object and not as being a planned, required or predicted object.
The population of a non-reference entity may be constrained based on the values of POSC standard instances in reference entities to which it has mandatory relationships. As an example, one might wish to restrict a well entity to contain only instances with an existence kind of actual, while at the same time letting other entities with a mandatory relationship to the existence entity to have instances which are actual, planned, required or predicted. In that case, one can constrain the population of wells with a rule on the well entity specifying that it can only be instantiated with a relationship to an instance of the existence entity, where its identifier is 'actual'.
The consequences of subsetting are discussed below in terms of access to the subset at the logical level.
To discuss the effects of subsetting, it is necessary to understand the ways in which Epicentre will be implemented. The figure below illustrates a potential computing environment that includes complete and subset implementations of Epicentre.
Four scenarios are depicted:
Applying any subsetting operation to the complete Epicentre model causes behavioral changes in the resulting subset by modifying or removing entities. Therefore, regardless of the computing environment in which a subset is implemented, applications accessing that subset which were designed to run against the complete Epicentre model or a different Epicentre subset will not be isolated from the effects of subsetting operations on entities they utilize.
The compatibility layer will be able to assist application developers in determining which entities have been modified and removed if it is implemented on the complete model as in Scenario A8. If the compatibility layer were implemented directly on a subset, as in Scenario D, however, it would not be able to provide any assistance in determining the differences between the subset and the complete model. POSC recommends that Epicentre subsets not be implemented as shown in Scenario D.
Applications designed to access a particular logical subset through the compatibility layer should be portable to other implementations of the exact same logical subset.
Applications designed to work on the complete Epicentre model or a different subset may or may not interoperate with applications designed to work on specific subset. The problem is the same as for application portability. If the entities in subset A that an application utilizes have been modified by subsetting operations, the application may exhibit impaired functionality or not work at all. It is difficult to see how interoperability could consistently be achieved between applications designed to work on a particular subset and applications designed to work on the complete model or another subset.
Different applications designed to use the same logical subset should be able to fully interoperate with each other.
When data is transferred from a complete Epicentre data store to a subset data store the situation generally represents a long transaction11. The long transaction problem is not restricted to the subject of subsetting, but applies to transfers between complete Epicentre data stores. Long transaction problems are out of scope for this discussion, as POSC specifications do not address the rules or mechanisms for handling them. It is important to recognize, however, that changes to either the complete data store or the subset data store could cause the transferred data to get out of sync with the data in the source data store, thus creating situations where data is lost or its meaning is modified.
Data being transferred from a complete Epicentre data store to a subset data store may have to be reduced in scope, depending on which entities have been modified in or removed from the subset. This reduction in semantics can be relatively benign, e.g., a subset may only allow one well location per well, or it could be significant, e.g., a subset may only allow one wellbore per well. In the latter case, for instance, one could no longer determine if a well had been sidetracked.
Data being populated in a subset data store, whether coming from another Epicentre data store (complete or subset) or some non-Epicentre data store, could be populated in ways which are inconsistent with the complete Epicentre model. For example, in the case of a subset that supports only one wellbore per well, a well having two wellbores could be handled in three ways: 1) the well could be rejected as inappropriate to store in the subset; 2) one of the wellbores could be chosen to be stored, rejecting the other; or 3) two instances of well could be created, one having the first wellbore and the other having the second wellbore.
Clearly, method three above is incorrect with respect to the complete Epicentre model -- even though there is no mechanical problem storing the subset data in the complete data store. A simple transfer of this data to a complete data store would present problems in reconciling its meaning. To transfer the data correctly to the complete model would require that the relationship of the two wellbores to one well be reestablished.
Method two is correct with respect to the complete Epicentre model, but demonstrates a significant loss of information about the well.
Method one seems to be the most correct from a theoretical point of view, but may not represent a practical solution in the real world.
Moving data from a subset to any other environment requires that the complete Epicentre semantics be restored to the data before it is transferred. POSC's recommended solution for this requirement is to build the compatibility layer on the complete logical model with knowledge of the mapping to the logical subset. In this environment, the compatibility layer could be used to build an Epigramme exchange file (which is defined in the context of the complete Epicentre logical model) for moving data from the subset to any other Epicentre implementation.
Logically, moving data from a particular subset to another subset having the exact same specifications could be done without bringing the data back into the context of the complete model. Exchanges of data between like subsets on the physical level must conform to the constraints imposed by the subsetted logical model.
The following are POSC's recommendations regarding Epicentre subsets:
Notes:
1 The GEMSIG group has described a CBO as a single definition of a thing that services the needs of a set of activities. The definition is not specific to a given representation. It is a user sensible description, not one which arises from the world of computer data representations. And, the description is intended to define the state of a thing as opposed to its activities or attributes. For more information on GEMSIG's concept of a CBO, see their World Wide Web Page at http://www.sas.com/standards/GEMSIG/GEMSIG.html.
2 Beginning with POSC Specifications for Version 2.1, the Epicentre deliverables will serve as the official documentation for the valid subsetting operations.
3 A subset, as defined by Webster's Ninth New Collegiate Dictionary, is "a set each of whose elements is an element of an inclusive set". With regard to Epicentre, the elements of the set are Epicentre's entities, attributes (which includes the relationships between entities following the rules of EXPRESS), rules and POSC-specified standard reference data. The inclusive set is the complete set of Epicentre entities, attributes, rules and standard reference data. Applying the subsetting operations outlined in this paper may modify the elements of the subset by removing optional attributes, changing attributes, changing model rules, etc. Therefore, some elements of a subset could have modified behavior and would not be considered exact copies of the corresponding element of the complete set.
4 Recommendations of the POSC Board of Directors Project Work Order Team (PWOT), May 15, 1995.
5 Element subsets of Epicentre can introduce behavior changes to entities by removing optional attributes and, therefore, do not strictly fit the classical definition of subsets.
6 The upper bounds of the constrained relationship does not have to be one, it may be any positive integer.
7 In the context of subsetting operations, application isolation/portability refers to the ability to isolate an application from changes in the underlying Epicentre subset relative to the complete model.
8 POSC will need to define new functions in the DAE specifications which will assist users in determining what entities and attributes exist in a subset relative to the complete Epicentre model.
9 In the context of this subsetting operations discussion, application interoperability refers to the ability for different applications to logically interoperate asynchronously through the data store. Other types of interoperability are not excluded, they are simply not in the scope of this discussion.
10 In the context of subsetting operations, data portability refers to the ability to logically move data between Epicentre implementations without loss of information or change of meaning.
11 A long transaction is characterized by situations where data may be checked out of one environment for an extended period of time, perhaps modified or added to, then the data is checked back into the original environment. Many things can change in either one or both of the environments managing the original data and checked out data which could make it difficult to successfully execute the check in process.
© Copyright 1997-2001 POSC. All rights reserved.