Join us and take your data modeling skills to the next level with the October 26-30 Data Modeling Boot Camp. Attend five extended half-day sessions virtually (1:00-5:00 pm New York time), where each day focuses on different skills taught by a data modeling legend. A small group setting, lively discussions and Q&A, and exercises to reinforce skills, will make this week a must-attend unforgettable event for data modelers and data architects.
– Monday –
Harness the Power of the Rosedata Stone with the Business Terms Model
– Tuesday –
Semantic Knowledge Graphs and the Meaning-First Modeling Approach
– Wednesday –
Four Steps to Data Architecture for Modern Analytics
– Thursday –
The Unified Star Schema Approach to Data Warehouse Design
Bill Inmon and Francesco Puppini
– Friday –
A Database Professional’s Guide to Normal Forms
C. J. Date
With more and more data being created and used, combined with intense competition, strict regulations, and rapid-spread social media, the financial, liability, and credibility stakes have never been higher and therefore the need for a Common Business Language has never been greater.
Follow along with data modeling expert Steve Hoberman and appreciate the value of a common set of business terms and how a data model can visually depict these terms with precision.
No modeling experience necessary or expected!
However, if you are a seasoned data modeling professional, this talk will enable you to position the conceptual data model more strategically within your organization, aligned with other essential business deliverables.
Steve Hoberman has trained more than 10,000 people in data modeling since 1992. Steve is known for his entertaining and interactive teaching style (watch out for flying candy!), and organizations around the globe have brought Steve in to teach his Data Modeling Master Class, which is recognized as the most comprehensive data modeling course in the industry. Steve is the author of nine books on data modeling, including the bestsellers The Rosedata Stone and Data Modeling Made Simple. Steve is also the author of Blockchainopoly. One of Steve’s frequent data modeling consulting assignments is to review data models using his Data Model Scorecard® technique. He is the founder of the Design Challenges group, creator of the Data Modeling Institute’s Data Modeling Certification exam, Conference Chair of the Data Modeling Zone conferences, director of Technics Publications, lecturer at Columbia University, and recipient of the Data Administration Management Association (DAMA) International Professional Achievement Award.
Just imagine what it will be like the following day when you are able to draw accurate parallels between your current ridged, Structure-First modeling approaches and the flexible, Meaning-First semantic modeling you learned in this course.
- Semantic Technology: Where did it come from?
- Enabling Semantic Technologies: What are they?
- The Semantic Mechanisms: How do they work?
Highly skilled data professional with over three decades experience developing data strategies and implementing enterprise-level data management solutions. What started with relational approaches leveraging the full data technology stack across many industries, both private and public, has driven the conclusion that semantic technologies own the future. The Meaning-First data management approach is the cornerstone to reducing barriers to semantic technology adoption and is the on ramp to a semantic future.
Nearly every organization today is facing the need to rethink and refresh their data architecture, yet most organizations continue to work with turn-of-the-century architecture from the BI era. Patching new components onto the surface of obsolete architecture—a band aid and duct tape approach—is not sustainable and does a poor job of supporting modern analytics use cases. Still, many avoid stepping up to modern data architecture because it is complex and difficult. Join us to learn a four-step approach to manage the complexities and overcome the difficulties of modern data architecture.
You Will Learn:
- About the pressing need to modernize data architecture
- To understand problems with old BI architecture in the age of modern analytics
- To overcome the complexities and challenges of defining data architecture
- How to begin with a look at needed business capabilities
- How to translate business capabilities into representative analytics use cases
- How to apply business capabilities and analytics use cases to identify needed data capabilities
- How to adapt a reference architecture into your custom data management architecture for modern analytics
- How to test and validate your data management architecture
Dave Wells is an experienced data management professional who works in multiple roles as an architect, educator, and industry analyst. More than forty years of information systems experience combined with over ten years of business management give him a unique perspective about the connections among business, information, data, and technology. Knowledge sharing and skills building are Dave’s passions, carried out through consulting, speaking, teaching, and writing.
There are today three main approaches for designing a DWH:
- Inmon (3NF)
- Kimball (Dimensional Modeling)
- Data Vault
If we want to understand the strengths and the weaknesses of these three approaches, we need to make a step backwards, and ask ourselves: what do we really expect from a DWH?
Let’s try to see a DWH as a black box that has an INPUT and an OUTPUT
- INPUT: Capturing data from the sources (Ingestion)
- OUTPUT: Making such data available for consumption (Presentation)
The Kimball approach has been designed around the “Presentation”: it prepares a bus of fact tables with conformed dimensions, ready to be consumed. On the other hand, the Inmon approach as well as the Data Vault approach have chosen to focus on the “Ingestion”: they build a solid foundation that captures all the data in its most general, unbiased and versatile form.
As a result, Kimball is easy to consume but it is likely to be “incomplete” and not resilient to the changes of business requirements. Conversely, Inmon can offer completeness of information and more versatility in case of changes of requirements, but it is weaker in the aspect of data consumption. The Data Vault approach can be seen as the modernized version of the Inmon approach, adapted to the most recent challenges: it is resilient to the more and more frequent changes of data structure of the sources, it captures all the data changes over time, it is compatible with big data, it handles massive parallel processing, and more. But also Data Vault is weak when it comes to data consumption. Both Inmon and Data Vault, in the presentation layer, need to get inspiration from the Dimensional Modeling of Kimball.
What is the PHANTOM LAYER?
Every DWH architecture has multiple layers (such as the staging area, the core layer, and the presentation layer). But when it comes to data consumption, there is almost always one additional hidden layer: the PHANTOM LAYER. This is true not only for Inmon and Data Vault, but also for Kimball. The presentation layer of a DWH is supposed to be ready consumption, but this is usually not the case: every project of data delivery (reports, dashboards, cubes, machine learning models, etc..) usually requires an additional step of transformations, aggregations, integrations, adaptations, etc. This may be implemented as a set of database views, or using the transformation capabilities within the BI tools , or with an additional ETL tool that reads from the DWH, or in R, in python, etc. The PHANTOM LAYER is hidden: nobody talks about it. But it becomes real when a project exceeds the planned duration, resources and budget.
Today the projects of data delivery are very expensive and slow. They often follow the traditional modeling path of conceptual, logical, and physical data modeling, and require the data modeler to choose between relational and dimensional data modeling. But what if there was a better modeling and architectural approach? What if most of the new business requirements were already satisfied by a pre-built “Unified Star Schema”?
The Unified Star Schema (USS) is a data mart that sits in the presentation layer of a Data Warehouse. The main advantage of the USS is that it minimizes (or completely eliminates) the need of a PHANTOM LAYER. No matter if the deliverable is a report, a dashboard, a cube, or a machine learning model: the need of transformations with the USS is drastically reduced. The USS is compatible with every existing approach of Data Warehouse (Inmon, Kimball or Data Vault), and it is compatible with every existing technology of data storage: from relational databases to CSV files on the cloud.
The Unified Star Schema is in fact a generalization of the Dimensional Modeling. It is a “multi-fact” solution, capable of working with “non-conformed dimensions”. Imagine that you have 50 fact tables and a bunch of dimensions: well, they will all fit into one single star schema!
The USS does not produce duplicates, and it has an impressively good performance. It is compatible with every type of database, and with every existing BI tool.
You will learn:
- Weak points of the full denormalization
- Weak points of the traditional dimensional modeling
- Oriented Data Models
- Definition of “Generalized Fan Trap”
- Definition of Chasm Trap
- How to handle the Multi-Fact queries
- The Union
- Union combined with Aggregation
- Why the USS is a universal solution
The session will be highly interactive, and the live demo will be performed as a game! There will be questions, there will be a score, and there will be a winner! The exercises will be performed on the presenter’s laptop, with Tableau Public.
Bill Inmon – the “father of data warehouse” – has written 57 books published in nine languages. Bill’s latest adventure is the building of technology known as textual disambiguation – technology that reads raw text in a narrative format and allows the text to be placed in a conventional database so that it can be analyzed by standard analytical technology, thereby creating unique business value for Big Data/unstructured data. Bill was named by ComputerWorld as one of the ten most influential people in the history of the computer profession. Bill lives in Castle Rock, Colorado. For more information about textual disambiguation refer to www.forestrimtech.com.
Francesco Puppini is an Italian freelance consultant in Business Intelligence and Data Warehousing. He has worked on over 30 different projects across 10 Countries of Europe, for clients from several industry sectors. He is currently working as a Qlik specialist, after 18 years spent on Business Objects, SQL, Teradata and Data Modeling.
- What’s the difference between 3NF and BCNF?
- Is it true that if a table has just two columns, then it’s in 6NF?
- Is it true that if a table has just one key and just one other column, then it’s in 5NF?
- Is it true that if a table is in BCNF but not 5NF, then it must be all key?
- Is it true that 5NF tables are redundancy free?
- What precisely is denormalization?
- What’s Heath’s Theorem, and why is it important?
- What’s dependency preservation, and why is it important?
All of these questions have to do with (database) design theory—in particular, with normalization, which, though not the whole of that theory, is certainly a large part of it. Design theory is the scientific foundation for database design, just as the relational model is the scientific foundation for database technology in general. And just as anyone professionally involved in database technology in general needs to be familiar with the relational model, so anyone involved in database design in particular needs to be familiar with design theory. But design theory has its problems … and one of those problems, from the practitioner’s point of view at any rate, is that it’s riddled with terms and concepts that are hard to understand and don’t seem to have much to do with design as actually done in practice.
Now, nobody could claim designing databases is easy; but a sound knowledge of the theory can only help. In fact, if you want to do design properly—if you want to build databases that are robust and flexible and accurate—then you really have to come to grips with the theory. There’s just no alternative: at least, not if you want to claim to be a design professional. Proper design is so important! After all, the database lies at the heart of much of what we do in the computing world; so if the database is badly designed, the negative impacts can be extraordinarily widespread.
Attend this presentation, then, and learn the answers to questions like those above, as well as much, much more. To be specific, the presentation will:
- Review, but from a possibly unfamiliar perspective, aspects of normalization you should already be familiar with
- Explore in depth aspects you’re probably not already familiar with
- Provide clear and accurate explanations and definitions of all pertinent concepts
- Not spend a lot of time on well known material such as 2NF and 3NF
On completion of this class, attendees will:
- Understand, and be able to apply, the scientific principles of normalization that underlie design practice
- Know which normal forms are important, how they differ from one another, and how to achieve them
- Understand dependencies and the concept of dependency preservation
- Generally, understand the contributions (and the limitations) of normalization theory
C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database technology. He is best known for his book An Introduction to Database Systems (eighth edition, Addison-Wesley, 2004), which has sold some 900,000 copies and is used by several hundred colleges and universities worldwide. Mr Date was inducted into the Computing Industry Hall of Fame in 2004. He enjoys a reputation that is second to none for his ability to communicate complex technical subjects in a clear and understandable fashion.