* On p.191 of the book "The Data Warehouse Lifecycle Toolkit" it says
  "All data warehouse keys must be meaningless surrogate keys.
   You must not use the original production keys."
  
  There is a data warehouse schema, sh, that comes with ORACLE 9i. The schema contains the followings:
  Fact tables:
       SALES, COSTS
  Dimension tables
       CUSTOMERS: 50000 rows
	   PRODUCTS: 10000 rows
	   TIMES: 365*3 + 366 = 1461 rows
	   CHANNELS: 5 rows
	   PROMOTIONS: 501 rows
	   COUNTRIES: 19 rows
  SALES and the 6 dimension tables form a snow flake schema, with CUSTOMERS referencing COUNTRIES.
  - There is NO primary key on the fact tables.
  - Fact tables are partitioned on the TIME_ID.
  - CUSTOMERS, PRODUCTS, PROMOTIONS have surrogate primary key.
  - TIMES has natural key.
  - COUNTRIES, CHANNELS have intelligent key.
  
  Let's take a closer look...
  (One must be aware of the impact data partitioning has on this issue)

* Surrogate key for the dimension table
  There is little doubt that all dimension tables must have surrogate primary keys.
  Refer to Kimball's book for the reasons.
  
* Surrogate key for the Time dimension table
  Should surrogate key be used for the TIMES dimension? The ORACLE sample schema does not use 
  surrogate key, but uses the natural key instead. Kimball's book is against this.
  
  Reasons for NOT using surrogate date key:
    The only reason seems to be that, by denormalizing date key into the fact table, you can
	directly query the fact table with date constraint without joining to the TIMES dimension.
	All other situations when other attributes in the TIMES dimension are needed, you still need
	to do a join.
	
  Reasons for using surrogate date key:
    Keeping historical dimension records (or multi-version dimension records) still applies here.
	Each record in the TIMES dimension has calendar attirbutes which can change. Different companies
	may have different fiscal calendar system. If companies merge, 1 natural date key can have
	several different interpretations. So, TIMES dimension is really no different from any other
	dimensions.
	
  So, TIMES dimension with "date" (not time) key is really no different than any other dimensions.
  Surrogate key should be created. Why was surrogate time key not created in this ORACLE sample
  DW schema? I believe the reason is the partitioning factor discussed below.
  
  It is critically important to note that "date" is different from "time". "Time of day" should be
  split from the "date" key into its own fact attribute. "AM", "PM", exact hour of the day should be
  treated as "facts" and stored as attributes in the fact table.
  
* Surrogate key for the COUNTRY dimension table
  The COUNTRY dimension table has a sort of 'intelligent key', meaning that you can actually infer
  country name from the country code. If there is an international standard for country codes, this
  might be OK. The chance of changing an international standard on country code is fairly remote.
  Hard to imagine there will be more than 200-300 countries in the world. So JP probably will always
  be referring to Japan. If companies use proprietory country codes and the companies merge, data 
  transformation during ETL can be implemented to conform the dimension attributes.
  
* Surrogate key, primary key for the fact table?
  Should there be a surrogate, primary key in the fact table?
  No. No versioning issue here. The attributes are "facts" and by nature will never change.
  
  Should there be a primary key in the fact table? 
  From performance point of view, no. You will not be querying against the fact table with the primary
  key in the where clause, that's for sure. "select ... where PK=1" just does not make sense. You will
  not have "natural" primary key in the fact table. So you will NOT have queries like
  "select sum(fact) ... where cutomers=... and products=...".
  
  From constraint enforcement point of view, PK has some merits. You would create PK or UK on the 
  combination of all the FKs. This prevents the same fact record from being inserted twice into the
  fact table. But if the data loading process is already doing the cehcking and you have confidence
  in the loading process, PK or UK do not nee to be created, since these constraints and their
  associated indexes have negative impact on the data loading performance.
  
* The data partitioning factor
  Fact table
  ----------
  The fact table should be partitioned. What should be used as the partitioning key?
  If the fact table contains only facts and meaningless surrogate foreign keys, which key should
  be used as the partitioning key? In the ORACLE sample schema, TIME_ID is used as the partitioning
  key. This may be why ORACLE did not use surrogate key for the ITMES dimension. It is difficult
  to partition on the surrogate key in a meaningful way, since the surrogate key itself does not
  have any meaning. What to do then?
  
  The most important benefits of partitioing are:
	Partition pruning during query.
	Partition independence during partition maintainence (data loading).
  
  Types of partitioing:
    Range partitioning
	Hash Partitioing
	List partitioing
	Range/hash composite partitioing
	Range/list composite partitioing
  
  How does partitoning affect these 2 types of queries?
	Queries involve just the fact table.
	Queries require joining fact table with dimension tables.

  To take advantage of star transformation, all FK columns in the fact table must have bitmap indexes.
  All dimension table must have primary keys.
  
  Consider the above aspects of partitioning, let's take a closer look:
  Assuming that there is no primary key in the fact table, which is a reasonable assumption
  1. If all the FKs are surrogate keys, how do you partition the fact table? List partitioing is a 
  rather special type of partitioing and will not be considered here. Hash partitioning has some
  serious drawbacks, for example you will not be able to merge, split partitions. And you can not 
  exchange a hash partition with a table. Most of the time, you will be dealing with range partitioning.
  Same argument can be applied to range/hash and range/list partiitoing. If HASH partitioning is to be 
  used, this surrogate key issue really does not exist. You have no control over the ORACLE hashing 
  routine, so it does not really matter if partitioning column constains surrogate key or natural key.
  
  For the most oftenly used range partitioning, what happens if you partition on one of the surrogate
  FKs? First of all, it can not be partition in a meaningful way. And, as long as this is true, you
  will have problems creating a temp table and try to exchange it with a partition in the table.
  Second, you will not be able to run single table query on the fact table - all meaning less FKs here.
  When you join to the diemnsion table, partition pruning may not even happen (example to come...).
  
  2. If one of the natural keys is used in the fact table and it is used as the partitioning key,
  all the problems listed above disappear. The partitioning is meaningful, partition DDL can be
  carried out and partition pruning during query is more likely to happen.
  
  Conclusions:
  It is necessary to exame this surrogate key and partitioning issue on a case by case basis.
  For the most common scenario, when the fact table is range partitioned on the date key, there is
  trade-off between using surrogate and natural date key. The pros of using surrogate key is the
  ability to handle the possility of multi-version date key in the future. The cons come from the
  data partitioning factor. Unless HASH partitioning, a not very desiable partitioning method, is used,
  partitioning on surrogate key is problemetic at the very least.
  
  
  Dimension table
  ----------------
  How should the dimension table be partitioned, if it gets too big? Take the CUSTOMERS table for 
  example. For a big company, this dimension can be huge.
  
  The CUSTOMERS dimension table in the sample DW schema is not partitioned.
  This is a snow flake schema.
  Partitioning it on the country code seems to be reasonable, when partitioning is needed.
  The argument against partitioning on the surrogate key in the fact table applies here also.
  Partitioning on the 'intelligent key' of country code makes more sense.
  The PK index will be a global index or local non-prefixed index. For definition of these indexes
  refer to "Properties of an index".
  Refer to "Partitioned PK index" on how to create local prefixed index.
  
* What happen to all the indexes?
  From the above discussions, we can draw conclusions on the indexes in a data warehouse environment.
  1. For star tranformation to happen, bitmap indexes must be created on the FKs in the fact table.
     If the fact table is partitioned, most of these bitmap indexes will be local non-prefixed.
	 In the ORACLE sample DW schema, the bitmap index on the time_id would have been the only index
	 that is local prefixed.
  2. The PK on the dimension table would have been either a non-partitioned global index (a bad choice)
     or a local non-prefixed index - if the dimension table is partitioned.