The recent Windows Azure release (June) introduced a new Web Site features, but there are a bit confusion in communities regarding what are the difference between new Web Sites and the Web Roles we all used to host web sites. In this post I’m going to describe the key differences and application scenarios for Web Sites. Special thanks to my college Dennis Mulder for providing information materials
I these days claims become more and more popular, and I reckon that in few years all Microsoft products will support claims as out-ot-the-box (OOTB) functionality and would be build with claims foundation.
However, I’ve noted that claims are still not very well understood by many people and people are confused what the claims are. In this post I would like to describe in simple language what are the claims and how to use them with SharePoint.
A statement that one subject makes about itself or another subject. For example, the statement can be about a name, identity, key, group, privilege, or capability. Claims provide a powerful abstraction for identity.
When a digital identity is transferred across a network, it’s just a bunch of bytes. It’s common to refer to a set of bytes containing identity information as a security token or just a token. In a claims-based world, a token contains one or more claims, each of which carries some piece of information about the user it identifies
Claims can represent pretty much anything about a user. In this example, for instance, the first three claims in the token contain the user’s name, an identifier for a role she belongs to, and her age.
Claims are issued by a provider, and they are given one or more values and then packaged in security tokens that are issued by an issuer; commonly known as a security token service (STS). For a full list of definitions of terms associated with claims-based see "Claims-Based Identity Term Definitions" at http://msdn.microsoft.com/en-us/library/ee534975.aspx.
STS can be owned by some identity provider (IdP)
An Identity Provider-STS (IP-STS) is a service that handles requests for trusted identity claims. An IP-STS uses a database called an identity store to store and manage identities and their associated attributes. The identity store for an identity provider may be a simple, such as a SQL database table. An IP-STS may also use a complex identity store, such as Active Directory Domain Services (AD DS) or Active Directory Lightweight Directory Service (AD LDS).
Set of applications, URLs, domains, or sites for which a token is valid. Typically a realm is defined using an Internet domain such as "microsoft.com", or a path within that domain such as "microsoft.com/practices/guides". A realm is sometimes described as a "security domain" because it encompasses all applications within a specified security boundary
Receive tokens that were generated outside of your own realm, and accept them if you trust the issuer. It allows users to sign on to applications in different realms without needing to enter realm-specific credentials. Users sign on once to access multiple applications in different realms.
Is another name for “Single Sign-on” in claims area.
Relying party application
Any client application that supports claims.
- Decouple your applications from the details of identity - It's no longer the application's responsibility to authenticate users
- Authentication flexibility
- Single sign-on
- No VPN access
- Federating with other companies
- Federating with non AD services
Claim consist of the following elements:
- SharePoint STS
Claims Protocols / Tokens
That’s probably the area of the biggest confusion – when people are talking about Claims they usually misunderstand that there are two different types of tokens are used with claims not all applications support both of them. You should be very clear about the protocol you are going to use.
Security tokens that are passed over the Internet typically take one of two forms:
- Security Assertion Markup Language (SAML) tokens are XML-encoded structures that are embedded inside other structures such as HTTP form posts and SOAP messages.
- Simple Web Token (SWT) tokens that are stored in the HTTP headers of a request or response (WS-Federation)
(An advantage of the SAML-P protocol over WS-Federation is that it supports initializing the authentication process from the IdP instead of the RP, which avoids the requirement for either the relying party (RP) or the FP to perform home realm discovery)
Different type of token are created depending on authentication mechanisms. For example, if you are using claims with Windows Sign-in, SharePoint 2010 converts UserIdentity object into ClaimsIdentity object and SP does claims augmentation and handles the resulting token. Take into account that this type of token in not SAML token.
The only way to get the SAML token is to use a designated SAML providers (for example, the Windows Live ID claims provider, or Active Directory Federation Services). Take into account that SAML token is issued all time when you log on by using a security token service (STS), regardless any provider used.
For more info demystifying what is SAML and non-SAML claim token, see:
Claims-based Application Architecture
Direct Hub Model
Direct Trust Model
| || |
- It's easier to manage multiple trust relationships in ADFS rather than SharePoint.
- It's simpler to manage a single trust relationship in SharePoint and it avoids the requirement for multiple custom claims providers.
- You can reuse the trust relationships in the FP with other relying parties.
- You can leverage ADFS features such as integration with auditing tools to track token issuing.
- ADFS supports the SAMLP protocol in addition to WS-Federation
- ADFS issuer can extract LDAP attributes from AD
- ADFS can add arbitrary SQL statements rules to extract data from custom SQL database.
- Requires user account in AD or trusted stories
Recently, an old friend of mine has asked me about what is the mandatory documentation should be create for the project. He was confused by the way projects are running in his organization and wants to improve a situation and make the entire process more mature.
In our organization we are using MSF massively (MFS at a glance), but I personally think that there is no “silver bullet” for the project documentation, regardless the process methodology you are using (Agile, MFS, RUP, CMMI or its combinations), and the right answer is “it-depends” on your organization and requirements.
However, I strongly believe that the purpose of documentation is to provide a traceability of your project – financial and requirements traceability, and in this post I will provide an example that provides a value for creating project documentation.
When you have a project opportunity and working on the business case you usually operate with the following artifacts:
- Project budget
And all of this translates in the following artifacts, when you finally start the project:
where each artifacts clearly captures information required for each project step.
But, just capturing information is not enough, because you need to validate each document for:
- captured documentation is correct and sufficient
- right information is captured (what business/client wants)
One of the solutions is to create a counterpart that validates each artifact, for example you can create tests as described on the following picture, where for each artifact on the left you have the linked element on the right that validates your artifact (i.e, Business Requirements can be validated using User Acceptance Tests, etc.)
This approach allows tracing your requirements and understand the impact of your changes on other elements, because when a specific business requirement is approved for a change to the scope then it becomes very difficult to estimate the scope and cost of the change without traceability.
In the essence, traceability reduces the cost of the research and analysis because the information is already organized into an immediate hierarchy.
And I reckon that combination of MS Project + Project Server + TFS Server is the right stack to have things done properly :)
PS: I suggest to research the “Traceability strategy” topic for these who are interested in in-depth information on this topic
PPS: You might ask where are architectural documents in this diagram, and the answer is - I deliberately omitted it, because architecture is just a transition phase between requirements and tasks. However, validating architecture is a quite huge task and is outside this blog post (research “Enterprise Quality Attributes” topic if you are interested in approaches of how to validate architecture). For simplicity, consider creating "High-Level Architecture”, and then “Business/Application/Infrastructure” architecture documents (might use TOGAF for this)
Today I found a quite good document about dreams and realities of paper free office http://www.aiim.org/Research/Industry-Watch/Paper-free-capture-2012 and it’s a good reading.
Regardless the facts provided there (your can find some of them below) I reckon that turning office into paper free area is quite challenging by two reasons, that are not technical:
- Using paper documents is very handy and I find working with them more productive rather than with digital copies – you can make notes, your can highlight the text, etc. All of these can be achieved with the technology, but you loose the productivity because you are braking the natural habits what people get used to (however it might not be a problem for Generation Y)
- Psychology nature – when we read documents our brain makes mental notes linking images, titles, paragraphs to its place and position in the document we are reading. You might remember situations when you read something and you cannot recall the exact place in the book, but what you might remember that it was the top of the page on the left page of the book, and you leaf through the book trying to find that place. This behavior won’t be possible with electronic documents, because your brain cannot make that mental anchor to the pages.
I haven’t seen any solutions to these problems yet, and reckon that digitizing your office should be carefully planned leveraging benefits of technology and using the natural habits that we get used to.
Some facts to support the paper free office:
- Electronic-only filing would halve the storage space needed for paper in 5 years. The average proportion of
office space taken up by paper is now 15.3%, and it would drop to 7.4% with an all-electronic filing policy, a saving
of nearly 8% in overall office costs.
- On average, 45% of documents that are scanned are 100% “born digital” – just as they came from the printer.
And many of the rest would be all-digital if not for the added signatures.
- 77% of invoices that arrive as PDF attachments get printed. 31% of faxed invoices get printed and scanned
back in. On average, 30% of invoices arrive as PDF attachments, and 15% as faxes.
- 20% of organizations scan half or more of their inbound mail at or before entry. A further 20% are more likely to
scan at the point-of-process, and 29% scan-to-archive after the process.
- Improved sharability and searchability is the biggest driver for investment in scanning and capture. Followed
by improved productivity and reduced storage space.
- On average, respondents using scanning and capture consider that it improves the speed of response to
customers, suppliers, citizens or staff by 6-times or more. 70% estimate an improvement of at least 3-times, and
nearly a third (29%) see an improvement of 10-times or more.
- 42% of users have achieved a payback period of 12 months or less from their scanning and capture
investments. 57% are posting a payback of 18-months or less.
I have been upgrading my working machine to Windows 8 recently and did an HDD cleansing when I found old document I was working on a few years ago, trying to visualise the Solution Delivery Framework. Sharing its content here. Hope it would be useful for these who is trying to establish this process in their organisations (IT Consulting)
This post will summarise some considerations that should be taken into account for Onboarding SQL Server Private Cloud Environment
- Energy efficient hardware
- Hardware consolidation
- Intelligent allocation of hardware resources
- Energy efficient data center design
- Intelligent placement of data center facilities
- Elimination of unnecessary hardware
- Enabling power management where practical
- Reduce operating and capital expenses. New hardware has far greater computing power than hardware that is nearing end-of-life. Newer hardware generally requires less power and cooling.
- Environmental sustainability. For example, lower power and cooling requirements are the primary areas where the SQL Server Utility addresses environmental concerns, but an overall reduction in data center space will also contribute over the long term.
- Provide business continuity, scalability, and availability. Requirements do not change for the SQL Server Utility; the goal is to find opportunities to standardize and to improve best practices in these areas.
- Provide a standardized server build library. New technologies such as Hyper-V open new opportunities for standardization. Part of the vision of the SQL Server Utility is to eliminate or streamline many of the manual steps needed to build an application environment. This can be achieved by establishing Hyper-V guests which are built with standard software configuration, including the operating system, SQL Server, tools, and approved configurations which can be provided for use in different phases of the software development life cycle.
- Increase servers utilization. Data center servers are underutilized, with an overall average CPU utilization of ~9.75 percent.
- Assess existing environment and analyse the performance using MAP toolkit
- Plan servers consolidations
- A maximum of 4 Virtual IDE disks are supported on each machine
- A maximum of 256 SCSI disks are supported on each virtual machine (maximum of 4 virtual SCSI controllers and each controller)
- Separate LUNs or spindles for the Host Operating System, guest OS VHDs and SQL data
- The guest OS must reside on a fixed size VHD.
- SQL storage must be configured as Fixed VHD, SCSI pass-through or iSCSI.
- Dedicated management NIC for each Hyper-V host
- For mirroring – principal and mirror virtual nodes have one additional dedicated link used for the transport of the synchronization data
- Synthetic network adapters must be used instead of the Legacy network adapters for Virtual NICs
- High Availability
- iSCSI Migration
- Disaster Recovery
- use backup software that supports a SQL Server-aware software-based VSS solution (The Hyper-V Volume Shadow Copy Service (VSS) Writer allows for the backup of Virtual Machines and their state information but it is not integrated with the SQL Server VSS Writer)
- VSS in Hyper-V does not support host-based backups for pass-through disks or iSCSI disks that are connected to an iSCSI initiator inside the guest virtual machine –> VSS backups of a SQL Server guest virtual machine that are performed from within the host machine are not supported in this scenario. Therefore, hardware VSS or VDS solution are not able to back up SQL Server in a Hyper-V
- Private Cloud Onboarding Strategy
- Virtual Conversions
- New Virtualized SQL Server Instance
- Onboarding SQL Server Private Cloud Environment (http://download.microsoft.com/download/7/A/3/7A3278D7-C660-4DFB-91FD-0414E3E0E022/Onboarding_SQL_Server_Private_Cloud_Environments.docx)
- Green IT in Practice: SQL Server Consolidation in Microsoft IT (http://msdn.microsoft.com/en-us/architecture/dd393309.aspx)
New McKinsey research estimates the impact of Internet search in the global economy, pinpointing the sources of value and the beneficiaries.
August 2011. Source: McKinsey Global Institute
Searching the Internet has become an almost reflexive act. Each day, tens of millions of global citizens fire up their personal computers and handheld devices to troll for information on products or to glean insights that will help guide business decisions. Yet the economic value of this enormous current of search activity remains largely unknown. Current estimates rely mainly on brute measures, such as the number of searches performed or advertising revenues reported by search companies themselves. These estimates fail to take account of how trillions of clicks combine to boost productivity, open new pathways to problem solving, or simply make life easier.
A new McKinsey study, The impact of Internet technologies: Search, takes a more comprehensive view of this phenomenon and its rising value. We looked at five key developed and developing economies—Brazil, France, Germany, India, and the United States—identifying nine activities that are primary sources of search value, as well as 11 private, public, and individual constituencies that reap the benefits.
Some interesting facts from that report:
- The return on investment (ROI) for those that deploy search is of between 7-17:1 on average.
- Some 90 percent of online users use search engines—that means 1.7 billion people.
- Search represents 10 percent of the time spent by individuals on the Web, totaling about four hours each a month
- Approximately 25 percent of the traffic to the Websites of mainstream content creators results is referred by search engines.
- Knowledge workers in enterprises spend on average five hours per week, or 12 percent of their time, searching for content
- 30 percent of total queries on Web search engines are for such topics as entertainment, adult content, games, or sports.
I’m glad to announce that two new SharePoint 2010 books have been successfully reviewed and will be published October 2011.
PS: I’m doing a Technical Reviewing of SharePoint 2010 books for Packt Publishing at my spare time
Recently we had a discussion about search strategies in large organizations when customers don’t know what information is out there and how to start discover the right content.
I previously used the following approach to introduce discoverability of unstructured data, which I would like to share:
- Define the organisation taxonomy (there are a lot of predefined standards to be used, depending on organisation type http://wandinc.com/prod_tax.aspx)
- Use pattern matching (FAST Search) to extract the common elements from the content
- Find duplicates to perform content cleansing (FAST Search feature)
- Identify all content owners
- Interactive approach to tag content (according the defined taxonomy) by owners
Feel free to add you ideas and suggestions to this process.
In this post I decided to summarise all finding that can be helpful to estimate the ROI of SharePoint and understand the true cost of that can be summarised as following
- Server software
- Virus protection & Backup
- Hardware and Infrastructure
that is separated across the following categories
- Hard ROI (tangible outcome) - the cost savings that will result
- Soft ROI (less tangible outcome) – efficiency improvements
- Risk Mitigation
that will allow to
- Seek to reduce the total cost of ownership (TCO) of your IT infrastructure.
- Seek to maximize the return on investment (ROI) from your enterprise solutions
- reducing IT costs and complexity
- reducing development costs,
- simplifying management and training
- improving employee productivity
- enhancing the effectiveness of customer service and sales teams
- How to Determine the True Cost of Microsoft SharePoint
- How to Build a Business Case For SharePoint
- Office SharePoint Server 2007: Five Ways SharePoint Can Save You Money
The following table summarizes how BPOS Subscriptions will translate to Microsoft Online Services offerings after transition.
Current BPOS Subscription
BPOS Standard Suite
Office 365 (Plan E1)
BPOS Deskless Worker Suite
Office 365 (Plan K1)
Exchange Online (Plan 1)
Exchange Online Deskless Worker
Exchange Online Kiosk
SharePoint Online (Plan 1)
SharePoint Online Deskless Worker
SharePoint Kiosk (K1)
Live Meeting Standard
Lync Online (Plan 2)
Office Communications Online
Lync Online (Plan 1)
More details are in this document Transition Guide
As enterprises continue to grow, information volume is growing as well with constant rate. Forester Research asserts that content volume is growing at the rate of 200% annually. With such growth rate the content easily riches the “too much information” point, when finding information without analytical tool is impossible. Giving people the right information is a key factor for employees who work with the business information to be productive. However, a large amount of content leads to “information overload” situation when people cannot find the content, and this impacts organization productivity. In such situation organizations should consider implementing Enterprise Search to discover all information company produced and continue producing.
However, the implementation search cost (information analysis, hardware, software and licencing) is usually quite large, so executives needs to see the justification of investing into the search, where the ROI of Enterprise Search is required. Enterprise Search ROI is factored by a number of parameters that depends on the collaborative scenarios that are established in organizations.
A simple calculation can help organizations see the potential ROI (based on effective Google search) is:
(# of workers) X (average annual salary/2,080 hours) X (hours saved/worker) = $ total productivity savings/year
This calculation estimates ROI from saved time. IDC estimates that effective enterprise-search could save about 53% of time (based on the research, that average employee spends 10hours per week searching necessary information)
To take into ROI account for the potential revenue increase, delivering products to market faster and other business aspects the following cost factors should be added to the formula:
- Information duplication (cost of re-creating the document that was created by someone else before)
- Lost opportunity of not knowing right information at right time (for example employees skills matrix to support sales pipeline)
- Knowledge is not shared (leading practices, known issues and etc)
- Time spent by customer to find the right product
Sources: 1, 2
In these days Internet Search became inherited element of our life. We are all “bing-ing” or “googl-ing” stuff dozen times per day, and we know how to search and what we get as a result. However, when we start talking about Enterprise Search (for example, SharePoint Search), there is a large fundamental difference from Internet Search.
Enterprise provides us capabilities to discover and deliver intranet content. However, Enterprise Search discovers content very differently. The following list summarizes the Enterprise Search differences:
- Right document exists – Internet Search returns thousands of documents to anything you search, but hundreds documents can be skipped because they are irrelevant. Enterprise Search returns fewer highly relevant documents, but nothing is missed. “If the Internet misses a few hundreds documents, few people notice; if your Enterprise Search misses one, users may consider it a failure”.
- Security is critical – Internet Search results are public for everybody, while Enterprise Search should apply security trimming to correspond to corporate requirements of exposing confidential, legal and limited documents to specific users or before/after a specific date.
- Taxonomy and vocabulary – corporations usually use different taxonomy and vocabulary to organise content, with the specific metadata that classifies differently among user types. Enterprise search should retrieve results according to taxonomies and classifications schemes aligned to the user’s context.
- Metadata – Internet Search has a lack of metadata support, due to content nature (no author, date, and etc), while corporate data is very well tagged and Enterprise Search takes into account all discovered metadata.
This post is a summary of the LightSwitch webcast http://bit.ly/fagND9
What is it?
LightSwitch is a “Business applications generator” – a tool than provides an easy way to build business applications for desktop and cloud.
It’s based on Visual Studio 2010 and uses Silverlight, MVVM and RIA Services.
What is it for?
- LightSwitch is a tool for technically savvy business persons (these who are not developers, but still build apps)
- LightSwitch for developers to build form/data-centric applications.
The LightSwitch Development Experience
Components of LightSwitch
External Data Support
- WCF Services
Where to start?
According to research 91% of companies are looking to increase their usage of Microsoft SharePoint in 2011.
The hottest areas are:
- increase user adoption
- better management/administration
- better visibility of SharePoint ROI
- integrate Facebook, LinkedIn and Youtube
Unfortunately, cloud hosting adoption is quite poor, respondents only "would consider using SP in cloud".
Last year I have used PerfomancePoint 2010 across few clients and in this post I decided to summarize some limitation and tricks I have discovered:
- You can use SharePoint Lists data source, but you will loose Analytics Charts and Decomposition Tree functionality. Consider to use OLAP cubes to have all features of PerformancePoint
- You can’t have KPIs with hyperlinks, however you can add custom colum with the link http://radekle.wordpress.com/2007/12/01/performancepoint-server-creating-hyperlinks-from-scorecards-kpi-to-other-related-reports/
- You can use Textual columns for the KPI as a FACT, but you need to use “First Occurrence” aggregation only, and thus you will loose all data roll-ups
- PerformancePoint is able to see SharePoint Lists columns that exist in the default view only
- PerformancePoint Workspace (.ddwx) is just an XML file, thus it’s easy to redeploy workspace across environments. Everything you need to do is just open the file in text editor and search/update the server urls in ddwx file.
According SDPS (https://iwsolve.partners.extranet.microsoft.com/sdps/) the key business challenges SharePoint is designed to address are:
- Islands of information and applications
- Slow responsiveness to business and user needs
- Costly custom development and maintenance
- Poor sharing inside and outside the organization
- Difficult to find the right content, data, and people
- Increasing information management risk
SharePoint 2010 Planning guide contains a good sample of SharePoint SLA
Establish service-level agreements for each service level. A service-level agreement should include the following items at a minimum:
- The length of time and approvals that are necessary to create a site (What's the process to create a site? Who's involved?)
- Information about the costs for the service to users or departments (Who gets charged, and for what?)
- Operational-level agreements that specify the teams that perform operations and the frequency (Which team applies updates? Which team performs backups? How frequently do these occur?)
- Policies about problem resolution through a help desk (When a user gets stuck, who do they call? What's the escalation path?)
- Negotiated performance targets for the first load of a site, subsequent loads, and performance at remote locations (Recovery, load balancing, and failover strategies)
- Customization policies for the service
- Storage limits for content and sites
- Multi-language support (Which languages will be installed and supported?)
For the ECM systems a lot of organizations (especially government) require solutions to be compliant with AS ISO 15489 standard that defines Records Management requirements for the following procedures:
- Records creation and capturing
- Active Records management (centralized/decentralized, filling records, maintenance, barcoding)
- Records appraisal and disposal planning
- Policy and procedures documentation
- Training programmes
- Management of inactive records
- Vital records protection
Each Record Management system consists from the following components (in general):
- Individuals who create or maintain the records
- policies, procedures and practices
- documentation presenting policies, procedures and practices, including procedures manuals and guidelines
- records themselves
- specialised information and records systems used to control the records
- software, hardware, and other equipment and stationery used in recordkeeping
with the following lifecycle of records
- Active record storage
- Record transferring
- Inactive record storage
- Record disposion
when “record” is characterized by:
- Authenticity (can be proven)
- Reliability (can be trusted, accurate and attested)
- Integrity (complete and unaltered)
- Useability (can be located, retrieved, presented and interpreted)
Designing and Implemented of Record Managements System
One of the most critical aspects of creating Record Management system is Requirements Gathering and Analysis to produce Business Classification Schema. A requirements gathering for RM consists in following steps:
- Gather information on the business activities of the organisation
- Map them into a classification scheme (detailing functions, activities and transactions)
- Establish what records of business activity need to be created
- Consider legislative and other external record keeping requirements
all of them are focused on business functions, activities and transactions, with the analysis phase as:
- Analyse and summarise collected data to develop and evaluate alternative solutions
- Techniques range from flowcharting to sophisticated modelling
- Requires an objective fresh approach to the task
- Choice of ‘best’ alternative depends on wide range of factors
- cost-benefit analysis
The overall process of designing and implementing of Record Managements system is very formalized approach, called as “DIRS", and is summarized on the following diagram:
Business Classification Schema
AS ISO 15489 standard relies on classification schemas to improve the records retrieval and to limit the searching process. The recommended schema is Business Classification Schema that assist records management by:
- Providing linkages between individual records which accumulate to provide a continuous record of activity
- Ensuring records are named in a consistent manner over time
- Assisting in the retrieval of all records relating to a particular function or activity
- Determining security protection and access appropriate for sets of records
- Allocating user permissions for access to, or action on, particular groups of records
- Distributing records for action
- Determining appropriate retention periods and disposition actions for records
Such classification provides a basis for arranging and retrieving records, when records should be organized in hierarchy of Series->Sub-series->Files->Documents and have the minimum set of metadata (unique ID, data and time of registration, title or description, author) with additional properties as data and time of communication, sender, recipient, link to related document, system where the record has been captured, standard, access, retention period and etc.
Creation of Business Classification Schema should follow next steps:
- Gather documentary information and conduct interviews
- Understand overall mission/objectives of the organisation
- Derive and list the functions needed to achieve objectives
- Identify hierarchies of activities which support each function
- Identify the transactions which operationalize each activity
- Identify processes/activities common across functions
- Produce a map of the hierarchies for each function
Effective retrieval requires knowledge of classification and indexing techniques and a thorough understanding of the organisation’s activities
Most Common Problems of Records Management
1) The most affecting part of Record Management system is legislation requirements that require compliance with:
- Statutes of limitations
- Tax legislation
- Corporations legislation
- Financial institutions legislation
- Trade practices legislation
- Health legislation
- Evidence legislation
- Privacy and freedom of information legislation
- Electronic transactions legislation
- Legislation governing the disposal of public records
2) Cultural problems
- People not inclined to share information
- People think relying on others reduces their own reputation
- People see themselves as experts and refuse collaboration
3) Records Maintenance
- a lack of understanding of cost of storing useless records or inappropriate storage procedures
- Locating lost records
- Locating incorrectly filed records
- Duplication of records
- Records not available when needed
- Inconsistency of filing, retrieval, charge out or follow up of records
For these who start in Business Intelligence, such words as ETL and OLAP might be already known, but it’s quite hard to find really good description of WHY do we need these activities for Business Intelligence and what’s the conceptual process of building a BI system.
Let’s review BI artefacts and their relation between each other. The most common BI system evolve through the following steps:
This can be represented by the picture below (taken from “Microsoft SharePoint 2010 PerformancePoint Services Unleashed” book)
Step 1 – Database storage. All information is saving into relation database in the normalized format (usually), that allows to manipulate data effectively. Information stored in database is represented in flat format, that is very efficient for the data storage and retrieval.
Step 2 – Measures and Dimensions. Database systems store terabytes of data, which is just a raw data and not usable from the business perspective. For the business level we need to have “information” instead of data. There is a quite significant difference between “data” and “information” - data is an information, but information *is not* a data. For analysis purposes we need to converted data to information.
Definition: “Information is an organised data, in the form relevant for us”. It means that to get “Information” we need to extract the most valuable data and transform it to the usable form (this process is called ETL – extract, process and load). In the result of this transformation we get “Measures”. Measures are exactly that information you want to analyse. Measures examples: sales, defective product, staff retention. However, measures need to be specified by range of something. We need to create “Dimension” to group all our Measured by relevant values. For example, dimension can be identified as: period of time, region, category, thus dimension allows us to group measures by context (in other words, group similar values to filter your information). Take into account that dimensions can be organized in hierarchies. For example, sales measures can be dimensioned by country, then by state, then by city and then by price.
Step 3 – Cube. Organizing data in measures which are categorized by dimensions is resulting into the OLAP cube (Online analytical processing) created from a specific schema (star or snowflake). Cube representation allows to analyse data using different approach to drill, aggregate, decompose and report data for the forecasting, budgeting, planning, and other purposes.
More Posts Next page »