Join us and take your data modeling skills to the next level with the October 26-30 Data Modeling Boot Camp. Attend five extended half-day sessions virtually (1:00-5:00 pm New York time), where each day focuses on different skills taught by a data modeling legend. A small group setting, lively discussions and Q&A, and exercises to reinforce skills, will make this week a must-attend unforgettable event for data modelers and data architects.
– Monday –
Harness the Power of the Rosedata Stone with the Business Terms Model
– Tuesday –
Semantic Knowledge Graphs and the Meaning-First Modeling Approach
– Wednesday –
Object-Role Modelling: Fundamentals and Advanced Features
– Thursday –
The Unified Star Schema Approach to Data Warehouse Design
Bill Inmon and Francesco Puppini
– Friday –
A Database Professional’s Guide to Normal Forms
C. J. Date
With more and more data being created and used, combined with intense competition, strict regulations, and rapid-spread social media, the financial, liability, and credibility stakes have never been higher and therefore the need for a Common Business Language has never been greater.
Follow along with data modeling expert Steve Hoberman and appreciate the value of a common set of business terms and how a data model can visually depict these terms with precision.
No modeling experience necessary or expected!
However, if you are a seasoned data modeling professional, this talk will enable you to position the conceptual data model more strategically within your organization, aligned with other essential business deliverables.
Steve Hoberman has trained more than 10,000 people in data modeling since 1992. Steve is known for his entertaining and interactive teaching style (watch out for flying candy!), and organizations around the globe have brought Steve in to teach his Data Modeling Master Class, which is recognized as the most comprehensive data modeling course in the industry. Steve is the author of nine books on data modeling, including the bestsellers The Rosedata Stone and Data Modeling Made Simple. Steve is also the author of Blockchainopoly. One of Steve’s frequent data modeling consulting assignments is to review data models using his Data Model Scorecard® technique. He is the founder of the Design Challenges group, creator of the Data Modeling Institute’s Data Modeling Certification exam, Conference Chair of the Data Modeling Zone conferences, director of Technics Publications, lecturer at Columbia University, and recipient of the Data Administration Management Association (DAMA) International Professional Achievement Award.
Just imagine what it will be like the following day when you are able to draw accurate parallels between your current ridged, Structure-First modeling approaches and the flexible, Meaning-First semantic modeling you learned in this course.
- Semantic Technology: Where did it come from?
- Enabling Semantic Technologies: What are they?
- The Semantic Mechanisms: How do they work?
Highly skilled data professional with over three decades experience developing data strategies and implementing enterprise-level data management solutions. What started with relational approaches leveraging the full data technology stack across many industries, both private and public, has driven the conclusion that semantic technologies own the future. The Meaning-First data management approach is the cornerstone to reducing barriers to semantic technology adoption and is the on ramp to a semantic future.
In natural language, individual things are typically referenced by proper names or definite descriptions. Data modeling languages differ considerably in their support for such linguistic reference schemes. This seminar concludes with a comparative review of reference scheme modeling within the Unified Modeling Language, the Barker dialect of Entity Relationship modeling, Object-Role Modeling, relational database modeling, the Web Ontology Language, and LogiQL (an extended form of datalog). We identify which kinds of reference schemes can be captured within these languages as well as those reference schemes that cannot be captured. Our analysis covers simple reference schemes, compound reference schemes, disjunctive reference and context-dependent reference schemes.
Dr. Terry Halpin, BSc, DipEd, BA, MLitStud, PhD, is a data modeling consultant and former professor of computer science. His industrial experience includes years of employment in the USA at Asymetrix Corporation, InfoModelers Inc., Visio Corporation, Microsoft Corporation, and LogicBlox, as well as contract work as a data modelling consultant for several industrial organizations, including the European Space Agency. His prior academic experience includes employment as a senior lecturer in computer science at the University of Queensland, as well as professorships in computer science at various universities in the USA, Malaysia, and Australia. His doctoral thesis formalized Object-Role Modeling (ORM/NIAM), and his current research focuses on conceptual modeling and rule-based technology. He has authored over 200 technical publications and nine books, including Information Modeling and Relational Databases and Object-Role Modeling Fundamentals, and has co-edited nine books on information systems modeling research. He is a member of IFIP WG 8.1 (Information Systems), is an editor or reviewer for several academic journals, was a regular columnist for the Business Rules Journal, and is a recipient of the DAMA International Achievement Award for Education and the IFIP Outstanding Service Award.
There are today three main approaches for designing a DWH:
- Inmon (3NF)
- Kimball (Dimensional Modeling)
- Data Vault
If we want to understand the strengths and the weaknesses of these three approaches, we need to make a step backwards, and ask ourselves: what do we really expect from a DWH?
Let’s try to see a DWH as a black box that has an INPUT and an OUTPUT
- INPUT: Capturing data from the sources (Ingestion)
- OUTPUT: Making such data available for consumption (Presentation)
The Kimball approach has been designed around the “Presentation”: it prepares a bus of fact tables with conformed dimensions, ready to be consumed. On the other hand, the Inmon approach as well as the Data Vault approach have chosen to focus on the “Ingestion”: they build a solid foundation that captures all the data in its most general, unbiased and versatile form.
As a result, Kimball is easy to consume but it is likely to be “incomplete” and not resilient to the changes of business requirements. Conversely, Inmon can offer completeness of information and more versatility in case of changes of requirements, but it is weaker in the aspect of data consumption. The Data Vault approach can be seen as the modernized version of the Inmon approach, adapted to the most recent challenges: it is resilient to the more and more frequent changes of data structure of the sources, it captures all the data changes over time, it is compatible with big data, it handles massive parallel processing, and more. But also Data Vault is weak when it comes to data consumption. Both Inmon and Data Vault, in the presentation layer, need to get inspiration from the Dimensional Modeling of Kimball.
What is the PHANTOM LAYER?
Every DWH architecture has multiple layers (such as the staging area, the core layer, and the presentation layer). But when it comes to data consumption, there is almost always one additional hidden layer: the PHANTOM LAYER. This is true not only for Inmon and Data Vault, but also for Kimball. The presentation layer of a DWH is supposed to be ready consumption, but this is usually not the case: every project of data delivery (reports, dashboards, cubes, machine learning models, etc..) usually requires an additional step of transformations, aggregations, integrations, adaptations, etc. This may be implemented as a set of database views, or using the transformation capabilities within the BI tools , or with an additional ETL tool that reads from the DWH, or in R, in python, etc. The PHANTOM LAYER is hidden: nobody talks about it. But it becomes real when a project exceeds the planned duration, resources and budget.
Today the projects of data delivery are very expensive and slow. They often follow the traditional modeling path of conceptual, logical, and physical data modeling, and require the data modeler to choose between relational and dimensional data modeling. But what if there was a better modeling and architectural approach? What if most of the new business requirements were already satisfied by a pre-built “Unified Star Schema”?
The Unified Star Schema (USS) is a data mart that sits in the presentation layer of a Data Warehouse. The main advantage of the USS is that it minimizes (or completely eliminates) the need of a PHANTOM LAYER. No matter if the deliverable is a report, a dashboard, a cube, or a machine learning model: the need of transformations with the USS is drastically reduced. The USS is compatible with every existing approach of Data Warehouse (Inmon, Kimball or Data Vault), and it is compatible with every existing technology of data storage: from relational databases to CSV files on the cloud.
The Unified Star Schema is in fact a generalization of the Dimensional Modeling. It is a “multi-fact” solution, capable of working with “non-conformed dimensions”. Imagine that you have 50 fact tables and a bunch of dimensions: well, they will all fit into one single star schema!
The USS does not produce duplicates, and it has an impressively good performance. It is compatible with every type of database, and with every existing BI tool.
You will learn:
- Weak points of the full denormalization
- Weak points of the traditional dimensional modeling
- Oriented Data Models
- Definition of “Generalized Fan Trap”
- Definition of Chasm Trap
- How to handle the Multi-Fact queries
- The Union
- Union combined with Aggregation
- Why the USS is a universal solution
The session will be highly interactive, and the live demo will be performed as a game! There will be questions, there will be a score, and there will be a winner! The exercises will be performed on the presenter’s laptop, with Tableau Public.
Bill Inmon – the “father of data warehouse” – has written 57 books published in nine languages. Bill’s latest adventure is the building of technology known as textual disambiguation – technology that reads raw text in a narrative format and allows the text to be placed in a conventional database so that it can be analyzed by standard analytical technology, thereby creating unique business value for Big Data/unstructured data. Bill was named by ComputerWorld as one of the ten most influential people in the history of the computer profession. Bill lives in Castle Rock, Colorado. For more information about textual disambiguation refer to www.forestrimtech.com.
Francesco Puppini is an Italian freelance consultant in Business Intelligence and Data Warehousing. He has worked on over 30 different projects across 10 Countries of Europe, for clients from several industry sectors. He is currently working as a Qlik specialist, after 18 years spent on Business Objects, SQL, Teradata and Data Modeling.
- What’s the difference between 3NF and BCNF?
- Is it true that if a table has just two columns, then it’s in 6NF?
- Is it true that if a table has just one key and just one other column, then it’s in 5NF?
- Is it true that if a table is in BCNF but not 5NF, then it must be all key?
- Is it true that 5NF tables are redundancy free?
- What precisely is denormalization?
- What’s Heath’s Theorem, and why is it important?
- What’s dependency preservation, and why is it important?
All of these questions have to do with (database) design theory—in particular, with normalization, which, though not the whole of that theory, is certainly a large part of it. Design theory is the scientific foundation for database design, just as the relational model is the scientific foundation for database technology in general. And just as anyone professionally involved in database technology in general needs to be familiar with the relational model, so anyone involved in database design in particular needs to be familiar with design theory. But design theory has its problems … and one of those problems, from the practitioner’s point of view at any rate, is that it’s riddled with terms and concepts that are hard to understand and don’t seem to have much to do with design as actually done in practice.
Now, nobody could claim designing databases is easy; but a sound knowledge of the theory can only help. In fact, if you want to do design properly—if you want to build databases that are robust and flexible and accurate—then you really have to come to grips with the theory. There’s just no alternative: at least, not if you want to claim to be a design professional. Proper design is so important! After all, the database lies at the heart of much of what we do in the computing world; so if the database is badly designed, the negative impacts can be extraordinarily widespread.
Attend this presentation, then, and learn the answers to questions like those above, as well as much, much more. To be specific, the presentation will:
- Review, but from a possibly unfamiliar perspective, aspects of normalization you should already be familiar with
- Explore in depth aspects you’re probably not already familiar with
- Provide clear and accurate explanations and definitions of all pertinent concepts
- Not spend a lot of time on well known material such as 2NF and 3NF
On completion of this class, attendees will:
- Understand, and be able to apply, the scientific principles of normalization that underlie design practice
- Know which normal forms are important, how they differ from one another, and how to achieve them
- Understand dependencies and the concept of dependency preservation
- Generally, understand the contributions (and the limitations) of normalization theory
C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database technology. He is best known for his book An Introduction to Database Systems (eighth edition, Addison-Wesley, 2004), which has sold some 900,000 copies and is used by several hundred colleges and universities worldwide. Mr Date was inducted into the Computing Industry Hall of Fame in 2004. He enjoys a reputation that is second to none for his ability to communicate complex technical subjects in a clear and understandable fashion.