May
19
Written by:
brettright
Wednesday, May 19, 2010 10:54 AM
The Concept of Object Persistance and Object identity have been around for as long as I've been programming (14 years) and probably longer. Unfortunately most systems and most developers in the Microsoft world have been developed using a database centric approach. The flow to Domain Driven Designs has thankfully finally started to flow strong and hard and I find myself spending less time having to argue the merits of DDD with other developers.
However when Consulting I am regularly faced with Database Designs that use AutoIncrementing Fields or mutable properties or even worse composite mutable properties as the Primary Key in the Database.
Example of AutoIncrementing Primary Key. Person has a PersonID : int as the Primary Key and the PersonID is set by the Database when the Person is first inserted into the DB.
Example of a Composite mutable property as a Primary Key. Person has FirstName: string and Surname : string as the Primary Key in the Database. If these are mutable then some sort of Cascade Update is set so as to ensure that all other related tables FK's are maintained as valid.
Now 12 years ago I used to see a lot more of this type of Database Design I still see quite a lot of this today. The reasons are numerous and sometimes inescapable e.g. I am developing with a legacy database, I have an .xxx.. DBA who is stuck in the 1960's. The company has programming standards that were written in the 80's etc.
In this blog I will discuss why I prefer and rcommend using Guids whenever possible or an application generated Integer (when the DBA has an unjustified fear of GUID's) and also how
The term Object Identity arose in the Early 1980's as OOPS (Object oriented Programming Systems) became the hotest new buss word for the cooolest programmers of the 80's.
The realisation that there where a class of objects that needed to be persisted and used accross multiple instances of the application led rise to many a Doctoral Thesis ;). Today fortunately ORM's have removed this mind twisting complexity from our lives and we hardly ever have to think about the Object Relational Impedence Mismatch (Hot Coffee machine chat in the 90's).
That is untill we suddenly run into a Database that is a throw back to those glorious days. Inevitably this database will use AutoIncrementing Primary Keys or even worse mutable meaningfull properties as the Primary Key (such as Person.FirstName and Surname).
The problems for these types of Database designs for Business Objects as relates to Loading and Saving of these Business Objects are also numerous. For this reason many (most ?) ORM's do not deal with these scenarious. Habanero however has arisen out of a practical world and has been used in 1000's of projects over 10 year and in some of these projects we have had to deal with Legacy Databases that used these types of database designs.
In this blog I will very briefly describe what the limitations of not using a GUID or an Application Generated Object Identity are to understand this one needs to first of all consider the issues involved with Object Persistance. For a more detailed explanation check out Chapter 3 of the Habanero Book especially the section titled Implementing Business Object Identity.
AutoIncrementing Evil
The Limitations of using AutoIncrementing Primary Keys (and hence Object Identities) revolve around 1) Inserting object graphs and 2) Managing distributed systems.
1) Inserting Object Graphs: Lets take the example of creating an Invoice which has many invoice lines. The invoice owns its Invoice Lines (Composition) and as such is not valid without its invoice lines. If you persist an Invoice to the Database without its invoice lines then the Invoice is in an invalid state in the database. If you are creating Invoices for an entire shipment then you may well find all the invoices for the shipment with all their invoice lines must be created and persisted to be in a valid state. This implies that the all the Invoices and their Lines need to be persisted in a single transaction which will pass or fail together. The InvoiceLine in the Databse will have the InvoiceID as its foreign key. The problem with AutoIncrementing now arises. To be able to set the InvoiceID FK on the InvoiceLine you have to persist the Invoice. You can then only persist the InvoiceLines. Ouch. That hurts.
2) Distributed Systems: This used to be a buzz word 7 years ago and is not such a hot topic anymore. Primarily because the big issues with it have been resolved and the increasing internet bandwidths have moved things back to centralised databases. To continue with the above example what happens when you are creating invoices at remote locations that for some reason have their own local Database. These databases are then synchronised to a central DB. Now your InvoiceID just got a whole lot worse you can easily get two different objects with the same Identity. Ouch Crash and Burn.
So I don't have a distributed system you argue, so I don't have a compositional relationship and I can afford to always persist the parent prior to any of its children. Are you sure will you ever have any of these scenarious will the application grow over years?
Mutable Composite Keys Evil
The limitations of Mutable Composite Primary Keys (and hence Object Identities) revolve around 1) Updating an Object Graph.
The problem is as follows: The Person is Uniquely Identified by FirstName, Surname all the Foreign Keys (FK's) in all the related Business Objects are therefore composite Foreign Keys. Now for some reason you need to change the Persons Surname (from Powll to Powell). Maybe you spelt it incorrectly the first time. What this implies is that you have to update ever single object in the Database that has a Foreign Key that is related to this Person. It is usually not sensible to do this in the Domain since a simple thing like correcting the spelling in a persons surname may involve loading hundreds of thousands of objects and updating all of their Foreign Key Properties. The other option is that you use the Database to do this update for you most modern RDBMSes have this cascading functionality.
So we have a solution why do I hate this so much. Well quite simply neither solution works very well. The Objects identity is the way that that the In Memory Object interacts with the Persisted Object (DB). In a single user application habanero handles all this for you fluently. However if you have a multi user system (How many of us do not in this era) then you have to consider that the other users may have this person loaded and may in fact have edited the Person and hence may need to save their edits. What will happen. Well quite simply you have an object loaded in memory that is identified as Brett Powll. It wants to persist itself using its object id (Brett Powll). If you are using a decent Enterprise Application Framework such as Habanero the Business Object will know that it is dirty and not new and will therefore know that it needs to update its dirty prooperties. It will try to update the existing Person in the database. Unfortunately since some other user changed the Obejct Identity from (Brett Powll to Brett Powell) you have lost this object and Habanero will intelligently throw an ObjectDeleted concurrency control exception.
None of this may seem so bad to you but why bother when you have this wonderfull mechanism called a Globally Unique Identifier (GUID). If you use a guid as your object identity you will never have any of these problems you will not have to consider them. You will not have to wonder what if when. If you have a DBA who has an irrational fear of GUIDs I have run into a few Guid-phobes then Habanero has a counple of integer and non integer number generation options with all sorts of locking mechanisms. You can also create your own NumberGenerator to suite your needs for that particular application. Search for 'NumberGenerator' in Habanero.
Hope this blog makes some sense
Brett
3 comment(s) so far...
Re: To Guid or not to Guid the question of Object Identity
There is quite a cool article and another kinda introductory article about the principles of object persistance. Off course Fowlers - Patterns of Enterprise Application Architecture is a must read for an in debth study of the issues, concerns and patterns involved here.
brett
By brettright on
Wednesday, May 19, 2010 2:56 PM
|
Re: To Guid or not to Guid the question of Object Identity
There is quite a cool article and another kinda introductory article about the principles of object persistance. Off course Fowlers - Patterns of Enterprise Application Architecture is a must read for an in debth study of the issues, concerns and patterns involved here.
brett
By brettright on
Wednesday, May 19, 2010 2:59 PM
|
Re: To Guid or not to Guid the question of Object Identity
Mitchell responded to a forum question with an excellent link to some of the religious Guid - auto incrementing - natural keys wars. My summary is simple use what is easiest in your scenario. If you are using an Business Objects (DDD) you need to consider that fact that your in memory objects are not kept in sync with the database so it is simplest (apply occams razor) to use Guids since it eliminates a whole set of problems. If you use an existing database with natural keys or AutoIncrementing Identity use these but be aware of the potential pitfalls. If you are a hardcore sql ado coder then use what suits you. If you have performance issues with guids and are doing DDD then use a number generator but only go there once you have tested and proven that you have a performance issue. I have been involved in project with huge transactional volume and have not had to move from GUIDs due to unacceptable performance.
By brettright on
Thursday, May 20, 2010 12:53 PM
|