Database Structuring Conventions

FIELDS OF STUDY

Software Development; Coding Techniques; Computer Science

ABSTRACT

Database structuring conventions are used to ensure that a database design provides access to the required data quickly and accurately while making efficient use of system resources such as network bandwidth and database storage. Standard conventions are used in defining entities, attributes, keys, views, and triggers.

PRINCIPAL TERMS

ENTITIES AND THEIR ATTRIBUTES

Databases are built on entities. An entity is simply a single item of interest, such as a person, thing, event, or place, about which data is stored. For example, entities in a database might represent employees (people), products (things), concerts (events), or towns (places). Each entity has an associated set of attributes, which constitute the information stored in the database. For example, attributes of employees might include position, salary, and years of employment. Databases can be modeled as tables, with each entity represented by a row in the table and the various attributes represented by columns. Entities of different types are listed in separate tables. If a database contained records of both a company's employees and the different departments within that company, for example, one table would list the employees and their attributes, and the other would list the departments and their attributes.




Database Structuring Conventions

Relationships can be constructed between entities of different types. These relationships take one of the following main forms: one-to-one, one-to-many, or many-to-many. For example, multiple employees (many entities) might each have a relationship with the same department (one entity). To construct relationships, each entity must first be assigned an attribute called a primary key. This is a unique identifier whose value must not be repeated within the same table. An employee number might be used to uniquely identify each employee, and a unique department identification (ID) number could be created for each department. Next, the relationship between entities is defined by making the primary key of an entity in one table an attribute of an entity in the other table. This new attribute is called a foreign key. For example, the foreign key column of the employee table would hold the department ID of each employee's department, establishing a relationship between the employee and their department.

When data is updated in one table, it is often necessary to update data in a related table. For example, if the department ID changes, that information must also be changed in the employee table for all employees of that department. Triggers and constraints can be used to ensure that the correct updates occur.




Common conventions for building a database can be visualized in an entity relationship model. This model shows the relationships between users, blog posts, blogs, hits, and campaigns and lists some of the attributes of each of those entities.





Common conventions for building a database can be visualized in an entity relationship model. This model shows the relationships between users, blog posts, blogs, hits, and campaigns and lists some of the attributes of each of those entities.

Establishing a consistent naming convention is an important part of database design. When naming objects in a database, keep names as short, clear, and descriptive as possible. If abbreviations are needed, use standard ones. Avoid using terms that might change over time, as database objects can be difficult to rename. An organization's naming conventions, such as those used for file naming, should inform database naming. Various database systems also impose their own limits on object naming. In general, underscores, not spaces, are used to separate words. A prefix or suffix may identify views.

USING NORMALIZATION TO OPTIMIZE DATABASES

Once the necessary entities and attributes are determined, the database structure can be optimized through normalization. Normalization maximizes the use of keys to relate tables to reduce duplication. For example, if the database of a US-based company had one table for employees and another for customers, both tables would likely have an attribute for each entity's state of residence. As a result, data would be duplicated between the two tables, taking up unnecessary space. In addition, if the state names were abbreviated differently in the different tables—for example, if Texas were abbreviated as “Tex.” in one table and “TX” in the other—this could cause errors. During normalization, a state table would be created to store the name and abbreviation of each state. This table would then be linked to the employee and customer tables using keys, thus preventing data duplication.

Normalization also ensures that the correct attributes are stored in the correct tables. For instance, it is more logical for an employee's work hours to be stored in the employee table than in the department table. This makes searches, or queries, of the data easier to perform.

Normalization reduces the size of tables, which offers performance advantages. Updates and insertions are faster because there is no need to update or add data in multiple places. Deletions do not risk the loss of other, needed data. In general, normalized databases perform well when data is written to the database or changed more often than it is read.

In contrast, denormalized databases offer performance advantages when data is read from the database more often than it is written. This is because the data required can be retrieved from a single table, which is faster than accessing multiple tables, as is needed in a normalized database. Index usage is also more efficient if a database is denormalized. Normalization and denormalization must therefore balance the read and write performance for a particular database.

SAMPLE PROBLEM

A developer is designing a database to store information used by a university's payroll department. The database will contain at least three tables: one for the university's employees, one for its departments, and one for the different campus locations. The following is a partial list of the names that have been assigned to various database objects.

In addition, two views have been named: vw_humanities_departments and vw_down-town_maintenance_employees. Given these names, describe the naming conventions used by the developer.

Answer:

All names are in lowercase, and multiple words in names are separated by underscores. Few standard abbreviations have been used (e.g., in department_id, “id” is an abbreviation, but “department” is spelled out rather than abbreviated as “dept”). Entity and attribute names are singular. The primary keys for each table are assigned a short name (id), and foreign key names combine the name of the entities in the referenced table with the name of the primary key (e.g., department_ id). Views are identified with a prefix (vw_).

GOOD STRUCTURE MAXIMIZES PERFORMANCE

In order for a database to offer the maximum possible performance while using system resources efficiently, it must be well designed. The foundation of a successful design is a sound database structure. Established database structuring conventions help developers create the most efficient database based on the individual requirements of the end user. Following these conventions has proved to result in improved database design.

—Maura Valentino, MSLIS

Churcher, Clare. Beginning Database Design: From Novice to Professional. 2nd ed., Apress, 2012.

Harrington, Jan L. Relational Database Design and Implementation. 4th ed., Elsevier, 2016.

Hernandez, Michael J. Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design. 3rd ed., Addison-Wesley, 2013.

MacLennan, Bruce J. Principles of Programming Languages: Design, Evaluation, and Implementation. 3rd ed., Oxford UP, 1999.

Mullins, Craig S. Database Administration: The Complete Guide to DBA Practices and Procedures. 2nd ed., Addison-Wesley, 2013.

Scott, Michael L. Programming Language Pragmatics. 4th ed., Elsevier, 2016.

Van Roy, Peter, and Seif Haridi. Concepts, Techniques, and Models of Computer Programming. MIT P, 2004.