The Star Schema: Key to Data Modelling in (power) BI

Table of Contents

This blog post will explain the Star Schema, its significance and how it simplifies data nalysis and how its applied to data warehouses, databases, data marts, and other tools.[1]

Introducing Star Schema

The Star Schema is an established data modelling technique adopted for relational data warehouses to streamline the extraction of data, a crucial feature for analytical and reporting functions. Its name derives from the star-like pattern formed in the entity-relationship diagram (ERD), where a central fact table is linked to multiple dimension tables radiating from it.[3]Its design is particularly conducive to Business Intelligence applications such as Power BI, making it a popular choice in contemporary data architecture for its efficiency in querying and reporting.

Star Schema Diagram

Computational Power

The Information Revolution, which started in the mid-1980s, has brought about a significant shift from manual to digital information storage and processing. This shift has primarily been driven by the rise of information-based jobs, the affordability of computing power, and the advent of the Internet.

The technological transformation has dramatically influenced the way organisations function and how people interact with each other. As a result, businesses are now automating several routine tasks and services, such as inventory management, payroll, and banking transactions. Online Transaction Processing (OLTP) databases, specifically designed to handle a high volume of short transactions, are utilised to support these services.

Online Transaction Processing

Imagine OLTP databases as bustling post offices, where each postbox represents a table row and contains various messages or records. To efficiently sort and organise these messages, the post office uses categories or columns, arranging them by postcode, recipient, or size. This system makes data retrieval easy. Oracle, IBM DB2, and Microsoft SQL Server are well-known for their OLTP capabilities.

Normalised Data

OLTP systems are relational databases that have undergone normalisation to allow for efficient cross-referencing and data linking between different tables. As a result, they are ideal for handling a large number of user transactions, especially in real-time scenarios. OLTP systems are excellent at handling simple queries and updates.[6]

Structured Query Language

OLTP databases use SQL (Structured Query Language) to interact with and manage this data. SQL uses queries (set of commands) to instruct the computer where to find, add, or change data within that database.

Data Analysis Needs

As businesses grew, there was an increasing need to analyse the vast amounts of transactional data OLTP systems collected. Although OLTP (Online Transaction Processing)[7]databases efficiently handle real-time, transactional data, they are structured primarily for operational tasks. Considering complex analysis, OLTP's operational focus limits its ability to conduct extensive data analytics.

OLAP Systems

OLAP systems were designed to facilitate data analysis. They are optimised for query and reporting and are particularly suitable for handling complex analytical calculations and multidimensional data analysis. Unlike OLTP systems, OLAP systems can quickly execute analytical and ad hoc queries.

OLAP databases can be built using relational and non-relational models. They often use multidimensional databases (like cubes in SQL Server Analysis Services) or relational databases designed in a star or snowflake schema to facilitate complex queries and aggregations. OLAP systems can utilise relational and non-relational (including multidimensional) data models, depending on the specific technology and requirements.

Data Warehouses

Data warehousing is a concept that emerged alongside OLAP systems. It was created to fulfill the requirements of OLAP systems for comprehensive data analysis. Data warehouses are centralised repositories that aggregate data from various sources, including OLTP databases. This aggregated data is important for the multidimensional analysis capabilities of OLAP.

To store the data in a format optimised for analysis and querying, the data often undergoes processing and transformation through ETL (Extract, Transform, Load). This structure enables organisations to analyse data comprehensively, generate reports, and derive insights from extensive datasets.

While OLAP and OLTP share some similarities, the two have fundamental differences. The table below highlights the differences in usage, database type, nature of queries, speed, modification, storage, and backup frequency. It is important to note that while these are general characteristics, specific implementations may vary based on technology and architecture.

The table below shows the key differences between OLAP and OLTP regarding usage, database type, nature of queries, speed, modification, storage and backup frequency. It is important to note that while these are general characteristics, specific implementations may vary based on technology and architecture.

OLAP vs OLTP Databases
  OLAP - Online Analytical Processing OLTP - Online Transaction Processing
Usage Business Warehouse Fast Transactions, e.g., ATMs
Type Multidimensional Database Relational Database
Queries Complex queries involving Time intelligence (current and past data) Simple queries involving few records
Speed Generally slower, but quick processing is not a requirement Fast, as required for transactional processing
Edits Data rarely gets modified Frequent due to its transactional nature
Workload Primarily read-oriented Both reading and writing data
Storage Large data sets that require substantial storage Relatively modest storage requirements
Backups Backups are less frequent due to read-oriented use and infrequent data modifications Backups are frequent, and concurrent backups are necessary due to high transaction volumes and data sensitivity

Role of the Star Schema

Moving from the specific OLAP and OLTP systems functions, how does the Star Schema fit in? Well, the Star Schema solves the problem of the complex data held in OLTP. It simplifies the data model, making it useable for analytical querying. [8] It achieves this by separating transactional data, held in fact tables, from descriptive data stored in dimension tables. This makes the data more suitable for querying and analysing for business intelligence purposes.

The Star schema consists of a central fact table. What is a fact table?

  • A fact table contains data typically at the most detailed level, often called the "granular" or "transactional" level. (measures like sales, revenue)
  • A fact table usually has one or more date or time-related columns used for analysis.
  • A fact table allows for data aggregation based on numerical values. (sales amount, profit, units sold, etc.)
  • A fact table contains detailed records at the transactional level.
  • A fact table is on the "many" sides of a one-to-many relationship with dimension tables. (see next paragraph)

Dimension Tables

Multiple dimension tables surround the fact table. Dimension Tables in a Star Schema provide context and detailed categorisation for the data within the Fact Table. [1]

  • The dimension tables provide context (descriptive, textual, or categorical data) and detail for data in the fact table.
  • The dimension tables might include attributes like time (dates, weeks, months) or customer details (ID, name, address).
  • The dimension tables are linked to the fact table through foreign keys or unique identifiers.
  • The dimension table facilitates more intuitive and efficient querying for comprehensive data analysis.
  • The dimension table is on the "one" side of a one-to-many relationship with the fact table.

Dimensionality

The dimension tables embody the concept of Dimensionality. [10] Dimensionality reflects the various perspectives from which data can be analysed. It defines the different angles or dimensions (such as time, geography, or product categories) from which you can slice and dice data.

  • Dimensionality refers to the number and complexity of attributes in the dimension tables.
  • Dimensionality defines the different perspectives or dimensions (like time, geography, or product categories) for analysing data.
  • Higher Dimensionality can provide more angles for analysis but may complicate the data model and query execution.

Granularity

Granularity primarily describes the level of detail or depth of data in tables, especially data warehouse fact tables. It indicates the depth or fineness of the data stored in these tables. For example, sales data could be recorded at the level of individual transactions (high granularity) or as daily summaries (lower granularity).

Granularity can also apply to dimension tables, as it influences their attributes' details. For instance, in a time dimension table, granularity can range from years, months, and days to hours or minutes. The chosen level of granularity directly impacts the precision and potential insights that can be derived from the data.

  • High granularity allows for more detailed analysis but increases storage needs and can slow down query performance.
  • Low granularity offers quicker query performance and reduced storage needs but limits the depth of analysis.

Understanding both concepts helps design a data model that balances detail, storage, and performance to meet specific business analysis needs.

Conclusion

The Star Schema data model is a business intelligence and analytics solution. It addresses the limitations inherent to relational databases. The Star Schema presents a concise, easily navigable representation of complex data structures.

This schema presents data in an easily understandable format, enhancing query performance while reducing errors through transparent data relationships.

Compared to conventional relational models, the Star Schema is more efficient in handling large volumes of data. As data architectures continue to evolve and confront new challenges, the adaptability of the Star Schema is likely to maintain its relevance. It could be integrated with emerging technologies like AI and machine learning. It can transform data analysis and business intelligence strategies if that happens.

Diagramming Tools

These tools offer a range of functionalities and support you in database diagramming. Some focus on simplicity and educational use, while others provide more robust features for professional and enterprise environments.

Tool Name URL Summary and Pricing
Diagrams.net app.diagrams.net Free charges for Atlassian add-ons.
dbdiagram dbdiagram.io Free for 10 diagrams, then $9/month.

ERD Plus

erdplus.com

Completely free, designed for educational purposes​​.

Lucidchart

www.lucidchart.com

Free for 3 documents, then $7.95-$9/month​​.

QuickDBD

www.quickdatabasediagrams.com

Free for 1 diagram, Pro plan at $14/month​​.

MySQL Workbench Community Edition

www.mysql.com

Free for MySQL Community Edition​​.

SqlDBM

sqldbm.com

Free trial plans start at $25/month​​.

dbForge Studio for SQL Server

www.devart.com

ERD tool for Microsoft SQL Server, pricing not specified​​.

Visual Paradigm's Database Design Tool

www.visual-paradigm.com

ERD tool for various DBMS, pricing not specified​​.

DeZign for Databases

www.datanamic.com

Database modelling tool, pricing not specified​​.

Dataedo

dataedo.com

Database documentation tool with ERD, pricing not specified​​.

Cacoo

cacoo.com

Free for up to 6 sheets, then $5/user/month​​.

DrawSQL

drawsql.app

Free for one user, public diagrams, 15 tables/diagrams, from $15 for more​​.

Miro

miro.com

Free for unlimited team members, 3 boards, then $8/user/month​​.

Gliffy

www.gliffy.com

Free for Confluence and Jira apps for 10 users, online app free for students for 4 years, then from $8/user/month​​.

Creately

creately.com

dbdiagram dbdiagram.io Free for 10 diagrams, then $9/month.
  1. Fitz Insights, "Understanding the Star Schema in Business Intelligence." Link to article

  2. Oxford Reference, "Data Analytics." Link to article

  3. Microsoft Azure, "Online Analytical Processing (OLAP)." Link to article

  4. IBM, "What is OLTP?" Link to article

  5. SetScholars, "Comprehensive Comparison of OLTP and OLAP: Key Differences, Use Cases, and Choosing the Right Solution." Link to article

  6. LearnVern, "What is a Relational Database Management System? - MySQL Tutorial." Link to article

  7. Microsoft, "Power BI Guidance: Star Schema." Link to article

  8. IBM, "Dimension Tables and Entities." Link to article

  9. IBM, "Step: Identify Grain." Link to article

Vorige
Vorige

Digital literacy for modern employment in the future era

Volgende
Volgende

Case: Procurement Strategies using Microsoft 365