Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Semantic Modeling In Formal English
Semantic Modeling In Formal English
Semantic Modeling In Formal English
Ebook406 pages3 hours

Semantic Modeling In Formal English

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Semantic Modeling in Formal English describes how information can be modeled in a system independent way, by using standardized formalized English. This way of modeling enables universal data exchange and interoperability of systems by expressing information, knowledge and requirements in a human as well as computer interpretable way. That is done by creating semantic information models that include meaning as well as context. The resulting expressions are unambiguous and system independent. The book also describes the Gellish expression format for storage and exchange of information. Formal English is a standardized structured subset of natural English. Semantic Modeling describes how Formal English is defined and how information models can be composed and how they can be interpreted. It also provides a basis for the creation of universal semantic databases. This enables reduction of costs for data conversion and misinterpretation and speeds up data communication and data integration processes.
LanguageEnglish
PublisherLulu.com
Release dateJun 11, 2014
ISBN9781312190214
Semantic Modeling In Formal English

Related to Semantic Modeling In Formal English

Related ebooks

Computers For You

View More

Related articles

Reviews for Semantic Modeling In Formal English

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Semantic Modeling In Formal English - Dr. Ir. Andries Van Renssen

    Semantic Modeling In Formal English

    Semantic Modeling In Formal English

    Universal Information Exchange and Interoperability of Systems

    Dr. Ir. Andries Van Renssen

    Gellish.net

    Copyright © 2014: Dr. Ir. Andries van Renssen.

    Gellish.net, Zoetermeer, The Netherlands

    www.gellish.net

    All rights reserved.

    No part of this document may be reproduced or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of the Author.

    ISBN 978-1-312-19021-4

    1           Introduction

    Semantic modeling is a discipline that combines aspects from data modeling with aspects that belong to linguistics. Semantic modeling delivers semantic information models. Semantic information models can be either models about individual things and occurrences or knowledge models, requirements models and definition models or combinations of them. Semantic modeling belongs to the information science and technology discipline, but is applied in various domain disciplines.

    Data modeling is an information technology discipline that is based on the convention that storage of data in databases requires the definition of data models. A data model defines the storage capabilities of one or more databases for a particular application domain by defining the structure and definitions of the ‘classes’ or entity types and their ‘attributes’ and ‘relationships’ in that application domain and thus defines a database. The content of a defined database is a matter of instantiating the data model. Data models typically serve two things: they define the business concepts (thus defining the language of the application domain) and they specify the business requirements and constraints for entered data.

    Semantic information modeling uses an alternative approach by applying a predefined formal language for the expression of knowledge and requirements as well as for the expression of information that satisfies those requirements. The predefined formal language enables to express nearly any knowledge, requirements, ideas, facts and queries and can be the same language for many application areas. The formal language is more flexible than a (fixed) data model and has nearly unlimited capabilities for data storage as well for data exchange. Thus semantic information modeling reduces the modeling effort for database design to the expression of requirements. As knowledge and business requirements are expressed in the formal language, they can be stored themselves in a universal semantic database, whereas the database definition does not need to be modified when the business requirements grow or change.

    Data model developers typically use tools that are based on dedicated data definition languages (DDL’s) for the definition of data models. Examples of such languages are XML-Schema, SQL/DDL, EXPRESS, etc. or their graphical equivalents, such as UML and IDEFx. The data models act as meta-models (meta-languages) for their content.

    This means that conventional data modeling distinguishes three separate languages:

    The highest level languages are the data modeling languages in which data models are written.

    The medium level languages are the data models themselves, because they act as meta-languages for their content. Their vocabularies consist of names of entity types and attribute types or similar things.

    The lowest level is the user language in which the database content is written. This user language typically consists of terms (data) that don’t have their own syntax, because the data are stored in the syntactic structure of the data model (the meta-language).

    The first two (meta) languages are formal languages, as they are precisely defined. The user language is usually not formalized and does not belong to the domain of the data modelers, apart from the definitions of ‘allowed values’ lists or ‘pick lists’. This means that the definition of data requirements and storage capabilities is done in another language than the actual user language. However those languages are not independent from each other, because the content of a database in user language can only be interpreted correctly by knowing the semantics of the data model definitions and its syntax.

    Furthermore, it should be noted that data models fixate and limit the data storage capabilities of databases by allowing only instantiations of their entities. Thus data models are restrictive (meta) languages without flexibility to store other information than the scope of the models allow.

    Semantic modeling does not make such distinctions in languages and is flexible, as it does not fixate nor limit data storage capabilities of semantic databases. Semantic modeling uses a single formal language that enables the specification of data requirements and data storage capabilities as well as enables the expression of the content of databases and messages. This means that there is no other (meta) language or data model required for the interpretation of expressions.

    A formal language that includes user language should be as close as possible to a natural language. Therefore, for the English language, it should be a formalized version of English, called Formal English. Several variants of Formal English can be envisioned, resulting in formal dialects. However, to enable interoperability of systems, the definition of Formal English requires that particular choices are made and standardized. This book therefore describes the general principles of formal languages and Formal English in particular, whereas it does so by describing the choices that are made to create a particular variant of Formal English, called Gellish Formal English. The vocabulary of this variant of Formal English is specified in the Gellish Formal English Dictionary-Taxonomy. Gellish Formal English is extensible and is intended to be used as a common language to enable interoperability between systems and parties.

    Semantic modeling is a potential successor of data modeling, and more than that, as it also standardizes normal English user terminology, which enables harmonization of terminology in database systems and data exchange messages. Formal English is user extensible and enables the definition of company specific terminology and synonyms.

    1.1                   Terminology

    This document uses the following terms:

    TopicSomething about which expressions of ideas can be communicated.

    Fact Something that is the case.

    Possible fact Something that might be the case and maybe is the case, either in a real or in an imaginary world. People can communicate about it by expressing an idea about it with a particular communicative intention.

    IdeaSomething that is the case, is assumed to be the case, or is wanted to be the case or that was the case, either in a real or in an imaginary world. It may be uttered with a communicative intention, such as a statement, denial, promise or question, etc. about a topic

    ExpressionA formulation of an idea about a topic, including an intention with which it is formulated.

    IntentionA purpose with which an idea is expressed and communicated, which purpose can be derived from the way in which it is expressed. For example, the intention to make a statement, to ask a question, etc.

    Elementary fact A basic fact from which atomic facts can be composed. There are two basic kinds of elementary facts which are expressed by two kinds of relations. The first one is a relation between a role and a relation that expresses that the relation requires that role as a role that is played by one of the related things. The second one is a relation between a role in a relation and something that plays that role in the relation that expresses that the player plays the role.

    Atomic factsA binary fact about a role that is played by something in a relation. An atomic fact can be composed of two elementary facts about the same role and it can be expressed as a participation relation between something and a relation that expresses that the thing plays a particular role in the relation.

    Binary factA (possible) fact in which two things play their role.

    Unary relationA relation that expresses one atomic fact. This means that the relation expresses that one thing plays a role in a relation. This does not exclude that other things also play a role in the relation.

    Binary relationA relation that expresses two atomic facts about the same relation. This means that the relation expresses that there are two things that each plays its role in a relation.

    Higher order fact

    A (possible) fact in which more than two things play their role.

    Higher order relation

    A relation that expresses more than two atomic facts about the same topic. This means that the relation expresses that there are more than two things that each plays its role in the relation.

    Variable order relation

    A relation in which the number of things that are involved varies or can vary over time.

    Unit of communication

    An expression of a (possible) fact or an idea about a topic that comprises only one relation and its communicative intention. Such a relation may be composed of atomic and elementary relations.

    1.2                  Nomenclature

    This book uses conventions for graphical models as explained below.

    Afbeelding1-Nomenclature

    Figure 1, Conventions for graphical models

    The graphical elements in Figure 1 have a meaning as follows:

    1:A box with rounded corners represents a totality or aspect or a high level concept and can represent an individual thing as well as a kind of thing.

    1 and 2: A line in the top left corner of a box indicates that the box represents an individual thing.

    2 and 4: A rectangular box with an arrow passing behind the box represents a relation or a kind of relation that is an expression of a fact or idea.

    = A term or phrase in a rectangular box that denotes a kind of relation requires by definition a particular kind of left hand object and a right hand object.

    =  The circle at one end of the arrow indicates the left hand object in the expression. The arrow point indicates the right hand object in the expression.

    Note: In this document the unqualified term ‘object’ is used as synonym for the term ‘anything’. The terms left hand object and right hand object refer to the things denoted by terms at the left hand and right hand in Formal English expressions.

    2:A rectangular box with a line in the top left corner indicates that the relation expresses a fact about an individual thing, being either a relation between individual things or a relation between an individual thing and a kind of thing.

    Furthermore:

    =  A shaded rectangular box represents a relation and a classification relation between that relation and a kind of relation.

    =  A term or phrase in a shaded rectangular box is a name of the kind of relation that classifies the relation.

    For example, if the phrase in box 2 would be: ‘is classified as a’, then the arrow behind relation 2 indicates that 1 is related to 3 by relation 2.

    Furthermore, the line in the top left corner indicates that 2 is an individual relation, whereas the shade indicates that relation 2 is classified as a classification relation (an relation).

    3:A box with rounded corners without a line in the top left corner represents a particular concept (kind of thing).

    4:A rectangular box without a line in the top left corner represents a relation between concepts.

    5:A hexagonal box in an arrow at the side of the circle represents a first role (role-1) in a relation that is played by a role player. For example, the role that is played by object (3) in relation (4).

    If the hexagonal box is shaded, then the term in the box denotes the kind of role that classifies the individual role.

    Often the roles are not graphically represented as their type can be derived from the definition of the kind of relation.

    6:A hexagonal box in an arrow at the side of the arrow point represents a second role (role-2) that is played by a role player. For example, the role that is played by object (7) in relation (4).

    7 and 8: A thick line with a circle at one end is an equivalent of a specialization relation.

    = The circle indicates the subtype (8) and the other connected box (7) represents the supertype.

    = Thus box (8) represents a particular concept (kind of thing) that is a subtype of (7).

    =  The inverse means: (7) is the supertype of (8).

    2          Semantic Modeling

    Semantics is the study of meaning and its expressions by humans.

    Natural language semantics typically takes the actual usage of natural languages as basic material and studies its meaning. This makes that the pragmatics of natural language expressions are the basic subject of the natural language semantics study. Such a study derives the apparent rules that apply for the expressions and interpretations from the practice of the language usage. The study also interprets meaning(s) from the expressions in various languages.

    However, the meanings themselves, the semantics, which are expressed in natural languages, are language independent, because the same meaning can be expressed in different languages. So, if there is one meaning then there can be many expressions of that meaning.

    2.1                  What is semantic modeling

    Semantic modeling starts with a meaning itself and then develops a method for the expression of that meaning in the form of semantic information models that are interpretable by software in computers.

    A semantic information model is an information model in which the meaning of data can be interpreted from the model itself, without the need to consult a meta-model or external documentation.

    The statement that a particular meaning is expressed in a particular semantic model implies that the semantic model includes everything that is necessary to interpret the meaning from the semantic model. This makes that it is required that a semantic model shall be written in a formal language, such as Formal English. Such a formal language uses a formal vocabulary, as well as formal kinds of expressions (relations) and a formal syntax (sentence structure). The formality implies that the terms or phrases (the vocabulary) that are used in writing in that formal language denote concepts that are defined in a formal dictionary. Furthermore it is required that the concepts in that dictionary are arranged in a taxonomy structure. The reason why that taxonomy structure is required is explained in chapter 10.

    Semantic information models typically use natural language terminology, which make the expressions and models natural language dependent. However, if such models would use language independent unique identifiers (UID’s) to represent the concepts in the expressions, then the models become natural language independent. This is possible, because semantic models reflect general human information and knowledge, which is not dependent on its expression in a particular language. By combining natural language terminology with language independent UID’s we get the best of both worlds: human readability as well as language independent computer interpretability.

    Natural language expressions are not a suitable means to make semantic models, as natural languages allow for too much freedom for making expressions, so that unambiguous interpretation of natural language expressions is not achievable with the current generation of computers and software. Computer interpretability can be achieved by defining a formal language, which is a subset of natural language in which the ambiguity is eliminated and the degrees of freedom are reduced.

    In natural language, meaning or information is typically expressed as statements, questions, commands, etc. about (possible) facts. According to the Gellish Semantic Modeling Methodology, information is expressed as collections of Formal English expressions, in such a way that software can interpret the meaning (semantics) from the expressions, without the need to use a separate meta-model. The methodology also defines a universal data structure (syntax). This enables that databases can have common data structures to store expressions and to be loaded with an initial vocabulary (dictionary-taxonomy) and language definition and thus use the same common language. This enables that software that can interpret the semantic expressions in multiple databases and messages and that different databases can interoperate or be treated as if they are one distributed database. Interoperation of databases enables verification and management of the consistency as well as combination of their content.

    This differs from conventional information modeling. In Software Engineering it is a widespread convention to create semantic meta models as a basis for database designs and designs for exchange messages (usually called interfaces). (A meta model is a model about an instance model) Such a meta model defines the database structure or message structure and acts as its documentation. Typically the meta model remains separate from the database instances (its content). To interpret the meaning of the data instances in a database or message it is then required to use the meaning that is contained in the semantic meta model. Typically, each database and message uses its own meta model, thus the data structures of all databases and messages are different. The different meta model for different databases are the root cause of the fact that it is costly and time consuming to integrate data from different databases and to develop new interfaces.

    Semantic modeling thus means that meaning is included in and can be inferred from the created semantic models. To enable this, it is required that not only the objects are defined and represented in the expressions, but the relations between things shall also be defined and explicitly be represented in the expressions. Therefore, semantic models are fact oriented or expression oriented (as opposed to object oriented). Knowledge as well as information is modeled as expressions of ideas, including ideas about possible facts and real facts.

    2.2                  Formal English

    The above description of semantic modeling illustrate that a semantic model is in fact nothing more than a collection of expressions in a computer interpretable formal language. For the definition of such a formal language we need to distinguish between:

    Language definition

    Rules for the creation of expressions

    Language usage (creation of semantic information models)

    Verification of the correctness of expressions

    The definition of a formal language requires the definition of at least three components:

    The Syntax, which consists of

    = Unambiguously defined syntactic structures and rules on how to express what needs to be communicated, which implies the rules on how to interpret the expressions.

    The Lexicon, which consists of:

    = Unambiguously defined concepts and individual things and their denotations by terms and phrases, being the components from which expressions in syntactic structures can be formed.

    = Unambiguously defined concepts for relations (kinds of relations).

    Semantic patterns, which consists of

    = A specification of the minimum amount of information and context that shall be expressed in order to enable unambiguous interpretation of meaning.

    In other words: a language definition consists of a definition of words and sentences (statements, questions, etc.), their structure and their context.

    There are formal languages that are defined by using artificial terminology whereas others are based on natural language terminology. Artificial terminology is practiced for example in formal logic notation systems that use for example formulae and parameters to make expressions. This book describes how formal expressions can be made using natural language terminology in universal semantic patterns. This means that formal expressions are made using components that are taken from terms and phrases that are also used and defined in natural languages. This defines Formal English as a semantic language that can be used to create computer interpretable semantic models, being collections of expressions in Formal English. Similarly, translated terms and phrases can be used for formalizations of other languages (Formal Dutch, etc.).

    2.3                  The communication cycle

    Communication is an interaction between an information creator and one more addressees that act as information user or replier. An information creator typically is a speaker, an author (writer) or a user of a computer system who enters ‘data’ or creates drawings, whereas an information user typically is a receiver, hearer, a reader or a user of a computer system who searches and retrieves information (data and documents, including drawings) or (re)uses information in a business process in which additional information is generated and which is possibly part of a reply. In some cases this information exchange is a one way traffic, in other cases there is real communication in the form of a dialogue. Data exchange between computers is traditionally primarily one way traffic, but is transforming more and more into dialogues, in which receiving systems are expected

    Enjoying the preview?
    Page 1 of 1