Planning a Data Warehouse - Systems Planning
(Page 3 of 6 )
The Systems Planning Phase of the life cycle communicates an overall vision for the data warehouse activity and its role in the organization's daily and weekly life. Decisions made during this phase have significant impact on the implementation, scope, and size of the effort. It begins with the identification of a need and then defines timeliness, tasks and deliverables. Then it proceeds through the following key planning decisions:
Select an Implementation Strategy
Generally, a Top Down approach is useful for projects where the technology is mature and well understood, as well as where the business problems that must be solved are clear and well-understood. With this approach, the business requirements to be met by the proposed data warehouse solution are identified first. These are the primary drivers for the implementation of the data warehouse.
A Bottom Up approach, on the other hand, is useful in making technology assessments and is a good technique for organizations that are not leading-edge technology implementers. This approach is used when the business objectives that are to be met by the data warehouse are unclear, or when the current or proposed business process will be affected by the data warehouse.
Select a Development Methodology
A Development Methodology describes the expected evolution and management of the engineering system. One of the most important principles of Systems Engineering is evaluating a system from a Life-Cycle perspective. Establishing a methodology will also provide a strategy for the project manager and the project team as they execute the data warehouse project throughout all phases of development.
- Waterfall Model
The waterfall model is a linear sequence comprised of the following basic stages:
* Requirements Definition
* System Design
* Detailed Design
* Integration and Testing
* Operations and Maintenance
This model is used when the system requirements and objectives are known and clearly specified.
- Spiral Model
The Spiral model is a sequence of waterfall models which corresponds to a risk oriented iterative enhancement, and it recognizes that requirements are not always available and clear when the system is first implemented.
Since designing and building a data warehouse is an iterative process, the spiral method is the best development methodology.
Develop Business Objectives
Develop a list of business objectives that the system must fulfill using the following questions as a checklist:
- Who is the potential audience?
- What are the immediate uses of planned platforms?
- What are the planned capabilities in terms of features and functions?
- What data sources can and/or must be integrated into the data warehouse?
- When is the system needed?
- What is the expected life-span of the data warehouse?
This step is often the most difficult in the planning phase because the potential users of the data warehouse cannot specifically describe the type of information they will want out of the warehouse. A good approach in this circumstance is to determine what kind of ad hoc analytical processing they do now and what data sources they use to generate such analytical reports. Using that information as a starting point, question the decision-makers about what type of additional information they would find useful. With these "wish-lists" from the users, the physical design of the warehouse becomes clearer as the certain elements of data are requested more than others.
Collect Metadata
The final part of the planning phase is the need to initially capture the various items of design relating to metadata. Metadata will actually serve as a blueprint for constructing the data warehouse. The Metadata is collected during the planning phase from the following sources:
- Enterprise Models based on E-R (Entity Relationship) Diagrams
- Repositories and data dictionaries of the data sources
- Syndicated data - i.e. Dow Jones/third-party information sources
Collecting metadata early in the planning phase is important for building a data warehouse. Unlike typical databases which usually have one coherent homogeneous data source which is structured with input rules and integrity constraints, a data warehouse is combining many different data sources which each have their own associated set of "business rules" which govern the database. It is at this stage where you begin to establish the patterns which trace data from the Source, to the Warehouse, to the Applications. Capturing and documenting this metadata is the only way to logically track this pattern which becomes critical later in the life-cycle when data elements at the sources change over time. When such changes occur, these links from Source to Warehouse to Application will allow the warehouse to be more readily maintained and updated, thereby ensuring its viability as a reliable OLAP resource.
Next: System Requirements >>
More ASP.NET Articles
More By Jagadish Chaterjee