Artifacts > Analysis & Design Artifact Set > Data Model > Guidelines > Reverse-engineering Relational Databases

Topics

The following procedure can be followed to produce an object model from a relational database schema.

Replicating RDBMS Tables in the Design Model To top of page

Replicating the structure of the database in a class model is relatively straight-forward. The process listed below describes the algorithm for converting a relational database schema to an object model.

Create a Class for each Table To top of page

For each table the you wish to reverse-engineer, create a class to represent the table. For each column, create an attribute on the class with the appropriate data type. Try to match the data type of the attribute and the data type of the associated column as closely as possible.

Example

Consider the database table Customer, with the following structure, shown in the following figure:

Column Name Data Type
Customer_ID Number
Name Varchar
Street Varchar
City Varchar
State/Province Char(2)
Zip/Postal Code Varchar
Country Varchar

Table definition for Customer table

Starting from this point, we create a class, Customer, with the structure shown in the following figure:

Customer class definition

Initial Customer class

In this initial Customer class, there is an attribute for each column in the Customer table. Each attribute has public visibility, since any of the columns in the originating table may be queried.

Note, the publicicon listed to the left of the attribute indicates that the attribute is 'public'; by default, all attributes derived from RDBMS tables should be public, since the RDBMS generally allows any column to be queried without restriction.

Identify Embedded or Implicit Classes To top of page

The class that results from the direct table-class mapping will often contain attributes that can be separated into a separate class, especially in cases where the attributes appear in a number of translated classes. These 'repeated attributes' may have resulted from denormalization of tables for performance reasons, or may have been the result of an oversimplified data model. In these cases, split the corresponding class into two or more classes to represent a normalized view of the tables.

Example

Nearly immediately after defining the Customer class above, however, we can define an Address class which contains all address information (assuming that there will be other things with addresses in our system), leaving us with the following classes:

revised Customer class, with extracted Address class

The association drawn between these two is an aggregation, since the customer's address can be thought of as being part-of the customer.

Handle Foreign-Key Relationships To top of page

For each foreign-key relationship in the table, create an association between the associated classes, removing the attribute from the class which mapped to the foreign-key column. If the foreign-key column was represented initially as an attribute, remove it from the class.

Example

Assume the structure for the Order table listed below:

Column Name Data Type
Number Number
Customer_ID Varchar

Structure for the Order table

In the Order table listed above, the Customer_ID column is a foreign-key reference; this column contains the primary key value of the Customer associated with the Order. We would represent this in the Design Model as shown below:

Representation of foreign-key Relationships in the Design Model

The foreign-key is represented as an association between the classes Order and Item.

Handle Many-to-Many Relationships To top of page

RDBMS data models represent many-to-many relationships with what has been called a join table, or an association table. These tables enable many-to-many relationships to be represented using an intermediate table which contains the primary keys of two different tables which may be joined together. The reason join tables are needed is because a foreign key reference can only contain a reference to a single foreign key value; when a single row may relate to many other rows in another table, a join table is needed to associate them.

Example

Consider the case of Products, which may be provided by any one of a number of Suppliers, and any Supplier may provide any number of Products. The Product and Supplier tables have the structure defined below:

Product Table
 
 
Supplier Table
 
 
Column Name Data Type
 
Column Name Data Type
 
Product_ID Number
 
Supplier_ID Number
 
Name Varchar
 
Name Varchar
 
Description Varchar
 
Street Varchar
 
Price Number
 
City Varchar
 
 
 
 
State/Province Char(2)
 
 
 
 
Zip/Postal Code Varchar
 
 
 
 
Country Varchar
 

Product and Supplier Table Definitions

In order to link these two tables together to find the products offered by a particular supplier, we need a Product-Supplier table, which is defined in the table below.

Product-Supplier Table
Column Name Data Type
Product_ID Number
Supplier_ID Number

Product-Supplier Table Definition

This join table contains the primary keys of products and suppliers, linking them together. An row in the table would indicate that a particular supplier offers a particular product. All rows whose Supplier_ID column matches a particular supplier ID would provide a listing of all products offered by that supplier.

In the Design Model, this intermediate table is redundant, since an object model can represent many-to-many associations directly. The Supplier and Product classes and their relationships is shown in Figure 8, along with the Address class, which is extracted from the Supplier, according to the previous discussion.

Product and Supplier Class Representation

Introduce Generalization To top of page

Often, you will find tables which have some similar structure. In the Relational Data Model, there is no concept of generalization, so there is not a way to represent that two or more tables have some structure in common. Sometimes common structure results from denormalization for performance, such as was the case above with the 'implicit' Address table which we extracted into a separate class. In other cases, tables share more fundamental characteristics which we can extract into a generalized parent class with two or more sub-classes. To find generalization opportunities, look for repeated columns in several tables, where the tales are more similar than they are different.

Example

Consider the following tables, SoftwareProduct and HardwareProduct, as shown below:

SoftwareProduct Table
 
 
HardwareProduct Table
 
Column Name Data Type
 
Column Name Data Type
Product_ID Number
 
Product_ID Number
Name Varchar
 
Name Varchar
Description Varchar
 
Description Varchar
Price Number
 
Price Number
Version Number
 
Assembly Number

SoftwareProduct and HardwareProduct Tables

Notice that the columns highlighted in blue are identical; these two tables share most of their definition in common, and only differ slightly. We can represent this by extracting a common Product class, with SoftwareProduct and HardwareProduct as sub-classes of the Product, as shown in the following figure:

SoftwareProduct and HardwareProduct Classes, showing generalization to the Product class

Putting all of the class definitions together, figure below shows a consolidated class diagram for the Order Entry system (major classes only)

Consolidated Class diagram for the Order Entry System

Replicating RDBMS Behavior in the Design Model To top of page

Replicating behavior is more difficult, since typically relational databases are not object-oriented and do not appear to have anything analogous to operations on a class in the object model. The following steps can help re-construct the behavior of the classes identified above:

  1. Create operations to get and set each attribute. There needs to be a way to set, change and query the values of the attributes of objects. Since the only way to access the attributes of an object is through operations the class provides, such operations must be defined on the class. When creating the operations that set the value of an attribute, be sure to incorporate any validation constraints that may operate on the associated column. If there are no validation constraints, one may choose to simply represent the fact that the attributes can be get and set by marking them as having "public" visibility, as the has been done in the diagrams above (with the icon to the left of the attribute name).
  2. Create an operation on the class for each stored procedure which operates upon the associated table. Stored procedures are executable subroutines which execute within the DBMS itself. This logic needs to be translated into the Design Model. If a stored procedure operates only on one class, create an operation on the class with the same parameters and the same return type as the stored procedure. Document the behavior of the stored procedure in the operation, making sure to note in the method description that the operation is implemented by the stored procedure.
  3. Create operations to manage associations between classes. When there is an association between two classes, there must be a way to create, manage and remove associations. Associations between objects are managed through object references, so to create an association between an Order and a LineItem (i.e. to add the LineItem to the Order), an operation on Order would be invoked, passing the LineItem as an argument (i.e. Order.add(aLineItem)). There must also be ways to remove and update the association as well (i.e. Order.remove(aLineItem) and Order.change(aLineItem,aNewLineItem)).
  4. Handle Object Deletion. If the target language supports explicit deletion, add behavior to the class's destructor which implements referential integrity checking. In cases where there are referential integrity constraints in the database, such as cascade delete, the behavior needs to be replicated in the appropriate classes. For example, the database may define a constraint that says that whenever an Order is deleted, all associated LineItems should be deleted as well. If the target language supports garbage collection, create a mechanism by which rows can be removed from tables when the associated object is garbage-collected. Note that this is harder than it sounds (and it sounds hard), because you will need to implement a mechanism for ensuring that no database client has any references to the object which is to be garbage collected; it is not enough to rely upon the garbage collection capabilities of the execution environment/virtual machine since this is simple one client's view of the world.
  5. Handle Behavior Implied by Queries. Examine Select statements which access the table to see how information is retrieved and manipulated. For each column directly returned by a Select statement, set the public property of the associated attribute to true; all other attributes should be private. For each computed column in a Select statement, create an operation on the associated class to compute and return the value. When considering Select statements, also include the Select statements embedded in View definitions.

Copyright  © 1987 - 2001 Rational Software Corporation


Display Rational Unified Process using frames

Rational Unified Process