Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Database Modeling and Design: Logical Design
Database Modeling and Design: Logical Design
Database Modeling and Design: Logical Design
Ebook387 pages

Database Modeling and Design: Logical Design

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

Database Modeling and Design, Fourth Edition, the extensively revised edition of the classic logical database design reference, explains how you can model and design your database application in consideration of new technology or new business needs. It is an ideal text for a stand-alone data management course focused on logical database design, or a supplement to an introductory text for introductory database management.

This book features clear explanations, lots of terrific examples and an illustrative case, and practical advice, with design rules that are applicable to any SQL-based system. The common examples are based on real-life experiences and have been thoroughly class-tested. The text takes a detailed look at the Unified Modeling Language (UML-2) as well as the entity-relationship (ER) approach for data requirements specification and conceptual modeling - complemented with examples for both approaches. It also discusses the use of data modeling concepts in logical database design; the transformation of the conceptual model to the relational model and to SQL syntax; the fundamentals of database normalization through the fifth normal form; and the major issues in business intelligence such as data warehousing, OLAP for decision support systems, and data mining. There are examples for how to use the most popular CASE tools to handle complex data modeling problems, along with exercises that test understanding of all material, plus solutions for many exercises. Lecture notes and a solutions manual are also available.

This edition will appeal to professional data modelers and database design professionals, including database application designers, and database administrators (DBAs); new/novice data management professionals, such as those working on object oriented database design; and students in second courses in database focusing on design.

+ a detailed look at the Unified Modeling Language (UML-2) as well as the entity-relationship (ER) approach for data requirements specification and conceptual modeling--with examples throughout the book in both approaches!
+ the details and examples of how to use data modeling concepts in logical database design, and the transformation of the conceptual model to the relational model and to SQL syntax;
+ the fundamentals of database normalization through the fifth normal form;
+ practical coverage of the major issues in business intelligence--data warehousing, OLAP for decision support systems, and data mining;
+ examples for how to use the most popular CASE tools to handle complex data modeling problems.
+ Exercises that test understanding of all material, plus solutions for many exercises.
LanguageEnglish
Release dateAug 5, 2010
ISBN9780080470771
Database Modeling and Design: Logical Design
Author

Toby J. Teorey

Toby J. Teorey is a professor in the Electrical Engineering and Computer Science Department at the University of Michigan, Ann Arbor. He received his B.S. and M.S. degrees in electrical engineering from the University of Arizona, Tucson, and a Ph.D. in computer sciences from the University of Wisconsin, Madison. He was general chair of the 1981 ACM SIGMOD Conference and program chair for the 1991 Entity-Relationship Conference. Professor Teorey’s current research focuses on database design and data warehousing, OLAP, advanced database systems, and performance of computer networks. He is a member of the ACM and the IEEE Computer Society.

Read more from Toby J. Teorey

Related to Database Modeling and Design

Databases For You

View More

Reviews for Database Modeling and Design

Rating: 4.166666666666667 out of 5 stars
4/5

6 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Database Modeling and Design - Toby J. Teorey

    Publishing Director Michael Forster Publisher Diane Cerra Publishing Services Manager Simon Crump Editorial Assistant Asma Stephan Cover Design Yvo Riezebos Design Cover Image Getty Images Composition Multiscience Press, Inc. Technical Illustration Dartmouth Publishing, Inc. Copyeditor Multiscience Press, Inc. Proofreader Multiscience Press, Inc. Indexer Multiscience Press, Inc. Interior printer Maple-Vail Book Manufacturing Group Cover printer Phoenix Color

    Morgan Kaufmann Publishers is an imprint of Elsevier. 500 Sansome Street, Suite 400, San Francisco, CA 94111

    This book is printed on acid-free paper.

    Designations used by companies to distinguish their products are often clained as trademarks or registered trademarks. In all instances in which morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration

    No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—without prior written permission of the publisher.

    Permissions may be sought directly, from Elsevier's Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: Permissions@elsevier.co.uk. You may also complete your request online via the Elsevier homepage (http://elsevier.com) by selecting Customer Support and then Obaining Permissions.

    Library of Congress Cataloging-in-Publication Data, Application submitted.

    ISBN-13: 978-0-12-685352-0 ISBN-10: 0-12-685352-5

    For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com or www.books.elsevier.com

    Printed in the United States of America 06 07 08 09 5 4 3 2

    To Matt, Carol, and Marilyn (Toby Teorey) To my wife and children, Elisheva, Hodaya, and Avishai (Sam Lightstone) To Carol, Paula, Mike, and Lagi (Tom Nadeau)

    Preface

    Database systems and database design technology have undergone significant evolution in recent years as business applications have been dominated by the relational data model and relational database systems. The relational model has allowed the database designer to separately focus on logical design (defining the data relationships and tables) and physical design (efficiently storing data onto and retrieving data from physical storage). Other new technologies, such as data warehousing, OLAP, and data mining, as well as object-oriented, spatial, temporal, and multimedia databases, have also had an important impact on database design.

    In this fourth edition, we continue to concentrate on techniques for database design in relational database systems. However, because of the vast and explosive changes in new physical database design techniques in recent years, we have reorganized the topics into two separate books:

    Database Modeling and Design: Logical Design (4th Edition))

    Physical Database Design

    Logical database design is largely the domain of application designers, who design the logical structure of the database to suit application requirements for data manipulation and structured queries. The definition of database tables for a particular vendor is considered to be within the domain of logical design in this book, although many database practitioners refer to this step as physical design.

    Physical database design, in the context of these two books, is performed by the implementers of the database servers, usually database administrators (DBAs), who must decide how to structure the database for a particular machine (server) and optimize that structure for system performance and system administration. In smaller companies, these communities may in fact comprise the same people, but for large enterprises they are very distinct.

    We start the discussion of logical database design with the entity-relationship (ER) approach to data requirements specification and conceptual modeling; then, we take a detailed look at another dominating data modeling approach, the Unified Modeling Language (UML). Both approaches are used throughout the text for all data modeling examples, so the user can select either one (or both) to help follow the logical design methodology. The discussion of basic principles is supplemented with common examples that are based on real-life experiences and have been thoroughly classroom tested.

    Organization

    The database life cycle is described in Chapter 1. In Chapter 2, we present the most fundamental concepts of data modeling and provide a simple set of notational constructs (the Chen notation for the ER model) to represent them. The ER model has traditionally been a popular method of conceptualizing users’ data requirements. Chapter 3 introduces the UML notation for data modeling. UML (actually UML-2) has become a standard method of modeling large-scale systems for object-oriented languages such as C++ and Java, and the data-modeling component of UML is rapidly becoming as popular as the ER model. We feel it is important for the reader to understand both notations and how much they have in common.

    Chapters 4 and 5 show how to use data modeling concepts in the database design process. Chapter 4 is devoted to the direct application of conceptual data modeling in logical database design. Chapter 5 explains the transformation of the conceptual model to the relational model, and to Structured Query Language (SQL) syntax specifically.

    Chapter 6 is devoted to the fundamentals of database normalization through fifth normal form, showing the functional equivalence between the conceptual model (both ER and UML) and the relational model for the higher normal forms.

    The case study in Chapter 7 summarizes the techniques presented in Chapters 1 through 6 with a new problem environment.

    Chapter 8 describes the major logical database design issues in business intelligence—data warehousing, online analytical processing (OLAP) for decision support systems, and data mining.

    Chapter 9 discusses three of the currently most popular software tools for logical design: IBM’s Rational Data Architect, Computer Associates’ AllFusion ERwin Data Modeller, and Sybase PowerDesigner. Examples are given to demonstrate how each of these tools can be used to handle complex data modeling problems.

    The Appendix contains a review of the basic data definition and data manipulation components of the relational database query language SQL (SQL-99) for those readers who lack familiarity with database query languages. A simple example database with three tables is used to illustrate the SQL query capability.

    The database practitioner can use this book as a guide to database modeling and its application to database design for business and office environments and for well-structured scientific and engineering databases. Whether you are a novice database user or an experienced professional, this book offers new insights into database modeling and the ease of transition from the ER or UML model to the relational model, including the building of standard SQL data definitions. Thus, no matter whether you are using IBM’s DB2, Oracle, Microsoft’s SQL Server, or MySQL, the design rules set forth here will be applicable. The case studies used for the examples throughout the book are from real-life databases that were designed using the principles formulated here. This book can also be used by the advanced undergraduate or beginning graduate student to supplement a course textbook in introductory database management, or for a stand-alone course in data modeling or database design.

    Typographical Conventions

    For easy reference, entity and class names (Employee, Department, and so on) are capitalized from Chapter 2 forward. Throughout the book, table names (product, product_count) are set in boldface for readability.

    Acknowledgments

    We wish to acknowledge colleagues and students that contributed to the technical continuity of this book: James Bean, Mike Blaha, Deb Bolton, Joe Celko, Jarir Chaar, Nauman Chaudhry, David Chesney, Pat Corey, John DeSue, Yang Dongqing, Ron Fagin, Carol Fan, Jim Fry, Jim Gray, Bill Grosky, Wei Guangping, Wendy Hall, Paul Helman, Nayantara Kalro, John Koenig, Ji-Bih Lee, Marilyn Mantei Tremaine, Bongki Moon, Robert Muller, Wee-Teck Ng, Dan O’Leary, Kunle Olukotun, Dorian Pyle, Dave Roberts, Behrooz Seyed-Abbassi, Dan Skrbina, Rick Snodgrass, Il-Yeol Song, Dick Spencer, Amjad Umar, and Susanne Yul. We also wish to thank the Department of Electrical Engineering and Computer Science (EECS) at the University of Michigan for providing computer resources for writing and revising. Finally, thanks to Julie for offering Ludington and her unwavering support (TJT), and thanks for the generosity of my wife and children, who have permitted me the time to work on this text (SL).

    Solutions Manual

    A solutions manual to all exercises is available. Contact the publisher for further information.

    Table of Contents

    Cover

    Title

    Copyright

    Dedication

    Preface

    Chapter 1: Introduction

    Chapter 2: The Entity-Relationship Model

    Chapter 3: The Unified Modeling Language (UML)

    Chapter 4: Requirements Analysis and Conceptual Data Modeling

    Chapter 5: Transforming the Conceptual Data Model to SQL

    Chapter 6: Normalization

    Chapter 7: An Example of Logical Database Design

    Chapter 8: Business Intelligence

    Chapter 9: CASE Tools for Logical Database Design

    The Basics of SQL

    Glossary

    References

    Exercises

    Solutions to Selected Exercises

    About the Authors

    Index

    Introduction

    Database technology has evolved rapidly in the three decades since the rise and eventual dominance of relational database systems. While many specialized database systems (object-oriented, spatial, multimedia, etc.) have found substantial user communities in the science and engineering fields, relational systems remain the dominant database technology for business enterprises.

    Relational database design has evolved from an art to a science that has been made partially implementable as a set of software design aids. Many of these design aids have appeared as the database component of computer-aided software engineering (CASE) tools, and many of them offer interactive modeling capability using a simplified data modeling approach. Logical design—that is, the structure of basic data relationships and their definition in a particular database system—is largely the domain of application designers. These designers can work effectively with tools such as ERwin Data Modeler or Rational Rose with UML, as well as with a purely manual approach. Physical design, the creation of efficient data storage and retrieval mechanisms on the computing platform being used, is typically the domain of the database administrator (DBA). Today’s DBAs have a variety of vendor-supplied tools available to help design the most efficient databases. This book is devoted to the logical design methodologies and tools most popular for relational databases today. Physical design methodologies and tools are covered in a separate book.

    In this chapter, we review the basic concepts of database management and introduce the role of data modeling and database design in the database life cycle.

    1.1 Data and Database Management

    The basic component of a file in a file system is a data item, which is the smallest named unit of data that has meaning in the real world—for example, last name, first name, street address, ID number, or political party. A group of related data items treated as a single unit by an application is called a record. Examples of types of records are order, salesperson, customer, product, and department. A file is a collection of records of a single type. Database systems have built upon and expanded these definitions: In a relational database, a data item is called a column or attribute; a record is called a row or tuple; and a file is called a table.

    A database is a more complex object; it is a collection of interrelated stored data that serves the needs of multiple users within one or more organizations, that is, interrelated collections of many different types of tables. The motivations for using databases rather than files include greater availability to a diverse set of users, integration of data for easier access to and updating of complex transactions, and less redundancy of data.

    A database management system (DBMS) is a generalized software system for manipulating databases. A DBMS supports a logical view (schema, subschema); physical view (access methods, data clustering); data definition language; data manipulation language; and important utilities, such as transaction management and concurrency control, data integrity, crash recovery, and security. Relational database systems, the dominant type of systems for well-formatted business databases, also provide a greater degree of data independence than the earlier hierarchical and network (CODASYL) database management systems. Data independence is the ability to make changes in either the logical or physical structure of the database without requiring reprogramming of application programs. It also makes database conversion and reorganization much easier. Relational DBMSs provide a much higher degree of data independence than previous systems; they are the focus of our discussion on data modeling.

    1.2 The Database Life Cycle

    The database life cycle incorporates the basic steps involved in designing a global schema of the logical database, allocating data across a computer network, and defining local DBMS-specific schemas. Once the design is completed, the life cycle continues with database implementation and maintenance. This chapter contains an overview of the database life cycle, as shown in Figure 1.1. In succeeding chapters, we will focus on the database design process from the modeling of requirements through logical design (steps I and II below). The result of each step of the life cycle is illustrated with a series of diagrams in Figure 1.2. Each diagram shows a possible form of the output of each step, so the reader can see the progression of the design process from an idea to actual database implementation. These forms are discussed in much more detail in Chapters 2 through 6.

    I. Requirements analysis. The database requirements are determined by interviewing both the producers and users of data and using the information to produce a formal requirements specification. That specification includes the data required for processing, the natural data relationships, and the software platform for the database implementation. As an example, Figure 1.2 (step I) shows the concepts of products, customers, salespersons, and orders being formulated in the mind of the end user during the interview process.

    II. Logical design. The global schema, a conceptual data model diagram that shows all the data and their relationships, is developed using techniques such as ER or UML. The data model constructs must ultimately be transformed into normalized (global) relations, or tables. The global schema development methodology is the same for either a distributed or centralized database.

    a. Conceptual data modeling. The data requirements are analyzed and modeled using an ER or UML diagram that includes, for example, semantics for optional relationships, ternary relationships, supertypes, and subtypes (categories). Processing requirements are typically specified using natural language expressions or SQL commands, along with the frequency of occurrence. Figure 1.2 [step II(a)] shows a possible ER model representation of the product/customer database in the mind of the end user.

    Figure 1.1 The database life cycle

    Figure 1.2 Life cycle results, step-by-step

    b. View integration. Usually, when the design is large and more than one person is involved in requirements analysis, multiple views of data and relationships result. To eliminate redundancy and inconsistency from the model, these views must eventually be rationalized (resolving inconsistencies due to variance in taxonomy, context, or perception) and then consolidated into a single global view. View integration requires the use of ER semantic tools such as identification of synonyms, aggregation, and generalization. In Figure 1.2 [step II(b)], two possible views of the product/customer database are merged into a single global view based on common data for customer and order. View integration is also important for application integration.

    c. Transformation of the conceptual data model to SQL tables. Based on a categorization of data modeling constructs and a set of mapping rules, each relationship and its associated entities are transformed into a set of DBMS-specific candidate relational tables. We will show these transformations in standard SQL in Chapter 5. Redundant tables are eliminated as part of this process. In our example, the tables in step II(c) of Figure 1.2 are the result of transformation of the integrated ER model in step II(b).

    d. Normalization of tables. Functional dependencies (FDs) are derived from the conceptual data model diagram and the semantics of data relationships in the requirements analysis. They represent the dependencies among data elements that are unique identifiers (keys) of entities. Additional FDs that represent the dependencies among key and nonkey attributes within entities can be derived from the requirements specification. Candidate relational tables associated with all derived FDs are normalized (i.e., modified by decomposing or splitting tables into smaller tables) using standard techniques. Finally, redundancies in the data in normalized candidate tables are analyzed further for possible elimination, with the constraint that data integrity must be preserved. An example of normalization of the Salesperson table into the new Salesperson and SalesVacations tables is shown in Figure 1.2 from step II(c) to step II(d).

    We note here that database tool vendors tend to use the term logical model to refer to the conceptual data model, and they use the term physical model to refer to the DBMS-specific implementation model (e.g., SQL tables). Note also that many conceptual data models are obtained not from scratch, but from the process of reverse engineering from an existing DBMS-specific schema [Silberschatz, Korth, and Sudarshan, 2002].

    III. Physical design. The physical design step involves the selection of indexes (access methods), partitioning, and clustering of data. The logical design methodology in step II simplifies the approach to designing large relational databases by reducing the number of data dependencies that need to be analyzed. This is accomplished by inserting conceptual data modeling and integration steps [steps II(a) and II(b) of Figure 1.2] into the traditional relational design approach. The objective of these steps is an accurate representation of reality. Data integrity is preserved through normalization of the candidate tables created when the conceptual data model is transformed into a relational model. The purpose of physical design is to optimize performance as closely as possible.

    As part of the physical design, the global schema can sometimes be refined in limited ways to reflect processing (query and transaction) requirements if there are obvious, large gains to be made in efficiency. This is called denormalization. It consists of selecting dominant processes on the basis of high frequency, high volume, or explicit priority; defining simple extensions to tables that will improve query performance; evaluating total cost for query, update, and storage; and considering the side effects, such as possible loss of integrity. This is particularly important for Online Analytical Processing (OLAP) applications.

    IV. Database implementation, monitoring, and modification. Once the design is completed, the database can be created through implementation of the formal schema using the data definition language (DDL) of a DBMS. Then the data manipulation language (DML) can be used to query and update the database, as well as to set up indexes and establish constraints, such as referential integrity. The

    Enjoying the preview?
    Page 1 of 1