Fact Table Granularity
The first step in designing a fact table is to determine the
granularity of the fact table. By granularity, we mean the lowest
level of information that will be stored in the fact table. This
constitutes two steps:
1. Determine which dimensions will be included.
2. Determine where along the hierarchy of each dimension
the information will be kept.
The determining factor usually goes back to the requirements.
Which Dimensions to include?
Determining which dimensions to include is usually a
straightforward process, because business processes will often
dictate clearly what the relevant dimensions are.
For example, in a retail business scenario, the dimensions for a
sales fact table are usually time, geography, and product. This
list, however, is by no means a complete list for all retailers. A
supermarket with a Rewards Card program, where customers
provide some personal information in exchange for a rewards
card, and the supermarket would offer lower prices for certain
items for customers who present a rewards card at checkout, will
also have the ability to track the customer dimension. Whether
the data warehousing system includes the customer dimension
will then be a decision that needs to be made.
What level within each dimensions to include ?
Determining which part of hierarchy the information is stored
along each dimension is a bit trickier. This is where user
requirement (both stated and possibly future) plays a major role.
In the above example, will the supermarket wanting to do
analysis along at the hourly level? (i.e., looking at how certain
products may sell by different hours of the day.) If so, it makes
sense to use ‘hour’ as the lowest level of granularity in the time
dimension. If daily analysis is sufficient, then ‘day’ can be used
as the lowest level of granularity. Since the lower the level of
detail, the larger the data amount in the fact table, the granularity
exercise is in essence figuring out the sweet spot in the tradeoff
between detailed level of analysis and data storage.
Note that sometimes the users will not specify certain
requirements, but based on the industry knowledge, the data
warehousing team may foresee that certain requirements will be
forthcoming that may result in the need of additional details. In
such cases, it is prudent for the data warehousing team to design
the fact table such that lower-level information is included. This
will avoid possibly needing to re-design the fact table in the
future. On the other hand, trying to anticipate all future
requirements is an impossible exercise, and the data
warehousing team needs to fight the urge of the "dumping the
lowest level of detail into the data warehouse" symptom, and
only includes what is practically needed. Sometimes this can be
more of an art than science, and prior experience will become
Fact Table Granularity