In its simplest form, database theory is the study of databases. But let’s back up and define that a database is a set information (or data) that is stored digitally and accessible in a variety of ways. The theory covers multiple aspects of databases, including types of database management systems and levels of data abstraction.
Understanding database theory will give you the foundation to determine which type of database is best for what you’re trying to accomplish with your data.
Concept and Purpose of the Database
A database is a repository of data, stored as a table made up of masses of data that have some connection to each other.
The database is organized in that it uses records and fields so that the data is easier to use – whether it’s to analyze, add, delete or otherwise manipulate it. Additionally, a database is structured so it is clear how the data points within it relate to one another. The structure of the data allows it to be accessible as needed by users and computer programs.
A database schema is a blueprint of how the data will look in a database, the type of database structure, if you will. It doesn’t house the data but describes the shape of the data and how it relates to other tables.
The purpose of using a database is to enforce standards and maintain integrity of the data. It also makes the data shareable and minimizes data redundancy. This is important because a database should be relied upon to be accurate and private. Accuracy is important so multiple applications with different data requirements can use the data.
3 Levels of Data Abstraction
So that databases can be leveraged more efficiently, database developers can hide irrelevant details as necessary. It simplifies the view of the data for users, which reduces complexity and makes it easier to retrieve data. There are three methods or “levels” for hiding this data, also called data abstraction.
1. Physical Level or Internal Schema
This refers to how the data is stored in memory – an external hard drive, files and folders. This is the lowest level in the database architecture. It also includes compression and encryption techniques used in the database.
2. Conceptual or Logical Level
This refers to data types and the relationships among data in the database. There is one conceptual view of the database because it includes all the data contained within the database.
3. External or View Level
This refers to only the relevant data in the database for the end user. This is the highest level of data abstraction, therefore closest to the user. It hides all data that is unnecessary for the user to interact with.
Data Independence
Understanding the importance of data independence allows you to manipulate the data within a database using the three levels of data abstraction previously described.
Database independence means that you can change the database schema at one of the three levels without changing the next higher level. Data independence allows you to keep data separate from all the programs that use it.
There are two types of data independence – physical data independence and logical data independence.
1. Physical Data Independence
Physical data independence is easier to achieve than logical data independence. It helps you separate conceptual or logical levels from the physical levels or internal schema. Examples of physical data independence include using a new storage device like a hard drive or magnetic tapes, changing the access method or change of location of data from a C drive to a D drive.
2. Logical Data Independence
Logical data independence is to change the logical level or conceptual schema without changing external views, API or computer programs. Examples of logical data independence include adding, modifying or deleting a new attribute, entity or relationship without rewriting existing application programs; merging two records into one; or breaking an existing record in multiple.
5 Types of Database Management Systems
A database management system is software that creates and manages databases. Many different types of database systems exist based on how they manage the database structure. Popular types of database management systems include:
1. Hierarchical
This type of database management system is like a tree structure, with one-to-one parent and child relationships. It’s just like the folder structure on your computer. A hierarchical database is not very versatile but has a use case where the primary focus of information gathering is on a hierarchy, like a list of people that all are associated with one department within an organization.
2. Network
The network database management system has a hierarchical structure, but the “child” tables can have more than one “parent.” This can be visualized with an upside-down tree where multiple member records can be linked to multiple owner files. This approach allows for more natural relationships between records and is more flexible than a hierarchical database. An example would be inventory for t-shirts where both sizes and colors can have multiple relationships.
3. Object-Oriented
Rather than being table-oriented, each piece of data information in the object-oriented database system is represented by individual objects, with relationships possible between two or more objects. Objects reference the ability to develop a project and then it’s defined and named once the objects have been combined. An object-oriented database is structured much like a car engine. It has several parts, like the main cylinder block, exhaust system and intake manifold. Each one of these stands on its own as a part but once together, it’s an engine.
4. Relational
As the name suggests, relational databases store “related” data information, with a defined relationship between database tables and these tables used in relation to other datasets. Rows are considered records, and keys are unique identifiers in each of the tables, which allow for the relationship between tables.
Unlike the hierarchal and network database systems relying on the “parent-child” relationship, the table-based relational database allows any file to be related to any other by means of a common field. A relational database system can be “fused” with an object-oriented system, creating an object–relational database with an object-oriented schema housed in a relational database system.
An example of a relational database would be a customer database where one table uses a customer ID number and customer name, while another table uses a customer ID number and order amount. A relational database pulls information from both tables based on the related field, the customer ID.
5. Non-Relational (NoSQL)
This type of database management system stores unstructured, or “non-related” data. Unlike relational databases, non-relational databases are not restricted to data being in tables and can host a range of different types of data in different formats, using specialized frameworks to store large amounts of diverse and complex data. Queries can be done faster, as the system doesn’t need to access several tables for data retrieval. Instead, non-relational databases utilize data manipulation techniques and processes to make the data usable for solutions.
An example of a non-relational dataset would be a retail store that has a unique document or file for each customer with data information, such as their name, email, phone number and credit card. Even with no structure or “relational” format to this information within each customer file, a non-relational database can store, search and pull the information based on criteria.
Database Theory in Health Informatics
Understanding the concept of database theory and related database management systems are vital as greater amounts of data information and their possible applications continue to transform multiple industries, including healthcare.
See how data warehousing, online analytical processing, data quality issues, emerging cloud databases and data mining techniques can impact the effectiveness and efficiency of healthcare.