August 2006 - Posts

Difficult time

I will be unavailable for some days. Going through difficult times. I had to postpone my US travel due to family situation so those who were expecting me there, I am sorry to say I will be late in arriving.

Posted by omar with no comments
Filed under:

Cost effective, failsafe, 2 server load balanced web & DB servers

We were very poor when we first started Pageflakes so we needed a really cheap but fully reliable deployment configuration. The requirement was:

  1. Fail safe. If a server goes down, another one can take over without users noticing
  2. Database must be mirrored in multiple servers
  3. Web servers will be in Load Balanced mode so that incase one goes down, the other takes over immediately
  4. Do it with least hardware and cost

So, we thought a lot and could not make a reliable solution without at least three servers. Then we found one with two servers only. Here's the plan:

We made the above plan for our hosting service provider.

We have 2 windows server both with IIS 6.0 and SQL Server 2005. Let's call these servers Web server and DB server.

Web server gets 60% web traffic configured via Network Load Balancing. We used Windows Network Load Balancing in order to save the cost of buying a separate Load balancer. We also used Windows Firewall instead of external firewall. You can imagine how poor we were. DB Server gets 40% web traffic and hosts the database in its SQL Server 2005. We started with SQL Server 2005 Workgroup Edition because it was the only version we could afford. So, we could not use the new DB Mirroring feature instead we had to use Log Shipping. The web server also had SQL Server 2005 but it was used for a standby database so we did not have to pay for licensing fee. Remember: You do not need to pay for SQL server license if the server is hosting standby databases only. The DB Server also runs the web site in its IIS.

So, we have 2 servers running the website in NLB. So, web server is properly load balanced and failsafe. Then we have database on the DB Server which is log shipped to the web server. So, incase DB server goes down, we can divert all traffic to the web server and bring up web server's standby database in order to run the site solely from web server. When DB server comes back online, we configure log shipping the opposite way and divert most of the traffic to the DB server. Thus DB server becomes the web server and previously web server becomes the DB server. It requires some manual work and not fully automated. But the cheapest solution for a reliable configuration.

So when you add more servers in future, it's very easy to add new ones.

Posted by omar with 1 comment(s)
Filed under:

Things you must do for ASP.NET 2.0 Membership Provider before going live

Some tweakings you should do on your web.config if you are using ASP.NET 2.0 Membership Provider before you go live on your production server:

  1. Add "applicationname" attribute in Profile Provider. IF you do not add a specific name here, Profile provider will use a GUID. So, on your local machine you will have one GUID and on production server you will have another GUID. If you copy your local DB to production server, you won't be able to reuse the records available in your local DB and ASP.NET will create a new application on production. Here's where you need to add it:
  1. < profile enabled ="true" > < providers > < clear /> < add name ="..." type ="System.Web.Profile.SqlProfileProvider" connectionStringName ="..." applicationName ="YourApplicationName" description ="..." /> </ providers >
  2. Profile provider will automatically save the profile whenever a page request completes. So, this might result in unnecessary UPDATE on your DB which has significant performance penalty. So, turn off automatic save and do it explicitly from your code using Profile.Save();
    < profile enabled ="true" automaticSaveEnabled ="false" >
  3. Role Manager always queries database in order to get the user roles. This has significant performance penalty. You can avoid this by letting Role Manager cache role information on cookie. But this will work for users who do not have a lot of roles assigned to them which exceeds the 2 KB limit of Cookie. But it's not a common scenario. So, you can safely store role info on cookie and save one DB rountrip on every request to .aspx and .asmx.
    < roleManager enabled ="true" cacheRolesInCookie ="true" >
  4. See my previous post on optimizing two important SPs used by Profile Provider.
Posted by omar with 4 comment(s)
Filed under:

Redirect Traffic from old to new server when you change hosting service

How do you redirect your users when you change hosting service? Currently your domain maps to the IP which is with the current hosting provider. When you change to a new hosting provider, you get a new IP range for your new servers. So, even if you change the DNS mapping, it will take at least 4 days to get refreshed all over the world. So, during these 4 days, some user will go to new IP and some will go to old IP.

The problem here is redirecting all users to the new IP without letting themknow anything.

At Pageflakes, We have done this many times. We had all sorts of problem with our hosting providers and have changed servers almost once every month. So, we had to come up with a solution which works transparently and without any downtime.

Here's what we do. First we map a new subdomain like new.pageflakes.com to the new server IP. Then we create a new web site (not virtual directory) on old web server called "Redirectors". It maps to a folder which has nothing but a global.asax and web.config. Then we go to Web site Properties -> Home Directory -> Configuration and map ASP.net to receive all web requests. This includes all URLs including .html, .gif, .css. Js etc.

Next, we write the following code in Global.asax which redirects all traffic to the new server.

    
protected 
void Application_BeginRequest(Object
sender, EventArgs e)
    {     
        
        
string url =
HttpContext.Current.Request.Url.AbsolutePath;
        
string QueryParams =
HttpContext.Current.Request.QueryString.ToString();
   
if (QueryParams != 
"") {
          Response.Redirect(
"http://new.pageflakes.com" + url + 
"?"+ QueryParams);
   } 
else {
          Response.Redirect(
"http://new.pageflakes.com" + url );
   }
    }

So, anyone trying to go to www.pageflakes.com/aboutus.html gets redirected to new.pageflakes.com/aboutus.html. The redirector keeps the query string and logical path intact. So, complicated URLs like www.pageflakes.com/flakes/amit/notepad/notepad.html?haha=huhu&hehe=hoho gets converted nicely.

After doing this, we stop the default web site which is listening to port 80 and turn on the redirector. By this time, new server is already up and running on the new subdomain.

Now we do the DNS change and map the new server's IP to www.

So, those users who still have the old IP in their DNS cache goes to the old server and then gets redirected to the new server. But after a while when their DNS cache gets refreshed and they get the new IP, all their requests go to the new server, so we have nothing to do here. After 4 or 5 days, we can safely wipe out the old server and start kicking old hosting providers backend.

Anyone has a better idea?

 

Posted by omar with 4 comment(s)
Filed under:

Pageflakes - the AJAX king

Here's what a Digg article says:

Pageflakes and YouOS - not Google - are the Ajax Kings

InformationWeek took a close look at Google and about 20 startups to find out who's the Ajax King. In the "webtop" or "personalized startpage" category - arguably the most important of all categories - the winners are Pageflakes and YouOs, ahead of Google and Microsoft.

 

I am very happy to see we again got recognized as the best Webtop ahead of Microsoft and Google. The first one was on Web 2.0 Awards. Pleeeeeeeeeease give us a Digg and don't miss the article on InformationWeek.

Posted by omar with 2 comment(s)
Filed under:

Forms authentication failed for the request. Reason: The ticket supplied was invalid. (Solution)

When you turn on web garden (multiple process per application pool) or go into multi server load balance deployment where servers are serving the same website, you will have Forms Authentication problem. Users will get automatically logged out or see the Yellow screen of death (ASP.NET error page) frequently. This happens because ASP.NET encrypts the login information in cookie. But the encryption key is unique for each machine and for each process. So, if one user hits Server #1 and gets an ecnrypted key, and then the next hit goes to Server #2, it will fail to decrypt the cookie and log user out or throw user the asp.net general error message.

In order to prevent this on your production server, you need to remember this before you go live:

The reasons for a forms auth ticket failing are normally that either the validation key or the decryption key are not in sync across all servers in a web farm. Another potential reason can be if both ASP.NET 1.1 and ASP.NET 2.0 applications are issuing forms auth tickets with the same domain and path.

For the first case, setting the validationKey and decryptionKey attributes explicitly on <machineKey /> on each web server will solve the problem.

For the second case, setting the validationKey and decryptionKey attributes explicitly in <machineKey /> for *both* the ASP.NET 1.1 and ASP.NET 2.0 applications is necessary. Additionally on the ASP.NET 2.0 apps, the "decryption" attribute in <machineKey /> should be set to "3DES".

This is what I learned from Stephan Schackow (Microsoft Atlas team). Here's how the machine.config should look like:

< system .web > < processModel autoConfig ="true" /> < machineKey validationKey ="..." decryptionKey ="..." validation ="SHA1" />

<machineKey> is the node that you need to introduce inside the <system.web> node if it does not exist.

Catch No 1:

How do you generate the machine key? You need to use a utility to produce the key for your PC. I have made a .exe which can generate such keys for you. Here's how you run it:

SecurityKey.exe 24 64

Download here

It takes two parameters and these are the exact values you have to provide.

For super cautious IT guys like me, don't worry, it's a .NET 2.0 binary, no virus. Test the strength of your anti-virus software on this .exe if you like. If it can't find any virus then either I'm smarter than you and have hidden a virus successfully or there's no virus at all and I am not that smart as I sound like.

Catch No 2:

You have put the machine keys in both machine. Restarted IIS. Even restarted your server. But you still see lots of Event Log error entries which shows users are still getting the dreaded "Forms authentication failed for the request. Reason: The ticket supplied was invalid". So, what did you do wrong? You call Microsoft support. Go to Forums and make post. Everyone says what you did is correct.

Here's what you need to do: wait. Wait for 2 or 3 days until all those users come back to your website at least once. Those users will have cookie encrypted with previously assigned encryption key pair. Naturally, it will fail to decrypt with the new key pair you have just specified in machine.config. So, until all those users get a new key, you will keep on having the error message. Don't be alarmed if you see this even after one week or month. This just means some user visited you after a long time and you are not doing much good in attracting users to your site. So, if you see such event log entries after a week or two, call you marketing team and ask what kind of marketing they are doing.

It would be really good if there was something like "Checklist for Going Live with ASP.NET" which stated all these issues.

Posted by omar with 12 comment(s)
Filed under:

How to change user name in ASP.NET 2.0 Membership Provider

Profile.UserName is a readonly field. So, how do you change a user's name? Many use email address as user name (like us, Pageflakes) and we want to allow users to change their email address which in turn changes their user name. So, how do you do it?

It looks like there's no easy way to do it. You have to do the following:

  1. Create a new user using the new email address
  2. Get the password of the old account and set it to the new account. If you can't get the old password via Membership provider, then ask user.
  3. Create a new profile for the new user account
  4. Copy all the properties from the old profile to the new profile object.
  5. Log out user from old account
  6. Auto sign in to the new account so that user does not notice what an incredible thing just happened.

Here's how I do it:

if (Profile.UserName !=
newUserName)
{
    
// Changing email address of user.
Delete current user account and create
    
// a new one using the new email address
but the same password
    
if (
null != 
Membership.GetUser(newUserName))
        
throw 
new 
ApplicationException(
"There's another user with the same
email. Please enter a different email.");
 
    
MembershipUser newUser = 
Membership.CreateUser(newUserName,
currentPassword);
 
    
// Create profile for the new user and
copy all values from current profile
    
// to new profile
    
ProfileCommon newProfile = 
ProfileCommon.Create(newUserName, 
true) 
as 
ProfileCommon;
 
    newProfile.IsInvited = Profile.IsInvited;
    newProfile.IsRealUser = Profile.IsRealUser;
    newProfile.Name = newUserName;
 
    newProfile.Save();
 
    
if (
Membership.ValidateUser(newUserName,
currentPassword))
    {
        
FormsAuthentication.SignOut();
        Session.Abandon();
 
        
// Delete the old profile and
user
        
ProfileManager.DeleteProfile(Profile.UserName);
        
Membership.DeleteUser(user.UserName);
 
        
//FormsAuthentication.SetAuthCookie(newUserName,
true);
        
FormsAuthentication.RedirectFromLoginPage(newUserName,

true);
    }                    
}
Posted by omar with 74 comment(s)
Filed under:

How to use ASP.NET 2.0 Profile object from web service code

You have started to use ASP.NET 2.0 Profile Provider and you are very happy with it. All your .aspx pages are full of "Profile.Something". You also introduced a lot of new properties on the profile object via web.config. Then you added a new web service. There you want to access the Profile object. You realize, you are doomed.

You cannot access the Profile object from Web Service.

At runtime, ASP.NET generates a class looking at the specification provided in web.config, which becomes the "Profile" object in .aspx pages. But this object is not available in Web service (.asmx.cs) and you cannot see the custom properties you have added in the profile object. Although HttpContext.Current.Profile will give you reference to Profile object, but it's type is ProfileBase which does not show your custom properties. Makes sense, because the class is generated at runtime. But if it can be made available in .aspx.cs, then it should also be available in .asmx.cs.

In order to overcome this problem, you have to hand code that profile class in your App_Code folder and then configure web.config so that it does not auto generate the class instead use your one. Here's what you do in web.config:

<profile enabled="true" inherits="UserProfile">

I have added a new attribute UserProfile. Now go to App_Code and make a UserProfile class like this:

public class UserProfile : System.Web.Profile.ProfileBase
{
[SettingsAllowAnonymousAttribute(true)]
public virtual int Timezone
{
get
{
return ((int)(this.GetPropertyValue("Timezone")));
}
set
{
this.SetPropertyValue("Timezone", value);
}

}

Declare all the properties like this. Don't forget to add the [SettingsAllowAnonymousAttribute(true)] on the properties which you want to be made available to anonymous users.

At the end of the class, add this method:

public virtual ProfileCommon GetProfile(string username)
{
return ((ProfileCommon)(ProfileBase.Create(username)));

}

Here's an easy way to avoid hand coding this class and generating it automatically. Before you make the changes in web.config and create the UserProfile class, run your web project as it was before. But before running it, turn off SQL Server. This will make Asp.net execution to break on first call to some Profile object's property. For ex, if you have a custom property TimeZone in the Profile object, execution will break on this line:

public virtual int Timezone
{
get
{

return ((int)(this.GetPropertyValue("Timezone")));

It will fail to load the profile object values from database because database is down. If you scroll up, you will see this is the class that ASP.NET generates at run time. You will see all the properties are declared on this class already. So, you can just copy & paste it in your own class easily!

But after copying, you will realize there's no [SettingsAllowAnonymousAttribute(true)] attribute. So, you will have to put them manually. Also after making your own custom class, you will have to remove all the custom properties declared inside <properties> node in the web.config.

Now that you have your own Profile class, inside web service, you can cast (HttpContext.Current.Profile as UserProfile) and then you can use all the custom properties.

If you don't want to enjoy strongly typed coding on web service, then you can always use the old way of accessing Profile properties via: Profile.GetPropertyValue("TimeZone"). But it's no fun.

Posted by omar with 30 comment(s)
Filed under:

Optimize ASP.NET 2.0 Profile Provider before you go live

Do you know there are two important stored procedures in ASP.NET 2.0 Profile Provider that you can optimize significantly? If you use them without doing the necessary optimization, your servers will sink taking your business down with you during heavy load. Here're a story:

During March, Pageflakes was shown on MIX 2006. We were having a glamorous time back then. We were right on Showcase of Atlas web site. The first company, not the second or third. Number of visits per day were rising sky high. One day we noticed, the server was no more. We restarted the server, brought it back, again it died within an hour. After doing a lot of postmortem analysis on the remaining of server's body parts, we found that it was having 100% CPU and super high IO usage. The hard drives were over heated and turned themselves off in order to save themselves. This was quite surprising to us because we were very intelligent back then and we profiled every single web service function. So, we went through hundreds of megabytes of logs hoping to find which webservice function was taking the time. We suspected one. It was the first function that loads a user's page setup. We broke it up into smaller parts in order to see which part is taking most of the time.


    
private


GetPageflake(
string

source, 
string

pageID, 
string

userUniqueName
)

    {

if( Profile.IsAnonymous ) {
using ( new TimedLog (Profile.UserName, "GetPageflake" ))

      {

You see, the entire function body is timed. If you want to learn how this timing works, wait until I make a post on it. We also timed smaller parts which we suspected were taking the most resource. But we could find not a single place in our code which was taking any significant time. Our codebase is always super optimized (after all, you know who is reviewing it, me).

Meanwhile, users were shouting, management was screaming, support staff was complaining on the phone. Developers were fainting one by one. Remaining ones were furiously sweating and blood vessels on their forehead were coming out etc. Nothing special, just a typical situation we have couple of times every month.

Now you must be shouting, "You could have used SQL Profiler, you idiot!" We were using SQL Server workgroup edition. It does not have SQL Profiler. So, we had to hack our way through to get it running on a server somehow. Don't ask how. After running the SQL Profiler, boy, were we surprised! The name of the honorable SP which were giving us so much pleasure was none but the famous and the most pretigious one, which we all know by name due to its incredible contribution to the development of human civilization. Please welcome, the great stored procedure "dbo.aspnet_Profile_GetProfiles"!

We used (and still use) Profile provider extensively.

Here's the SP:


CREATE PROCEDURE
        
[dbo].[aspnet_Profile_GetProfiles]

    @ApplicationName   
                
nvarchar
(256),

    @ProfileAuthOptions   
        
int
,

    @PageIndex   
                      

                int
,

    @PageSize   
                       

                int
,

    @UserNameToMatch   
                
nvarchar
(256)
= 
NULL
,

    @InactiveSinceDate   
              
datetime


  
                    = 
NULL

AS

BEGIN

    
DECLARE

@ApplicationId


                        uniqueidentifier

    
SELECT

@ApplicationId
= 
NULL

    
SELECT

@ApplicationId
= ApplicationId
                
FROM

aspnet_Applications
                        

WHERE LOWER (@ApplicationName) = LoweredApplicationName

    
IF

(@ApplicationId

IS
NULL
)

       
        
RETURN
 
 

   
        
--
Set the page bounds

    
DECLARE

@PageLowerBound

int

    
DECLARE

@PageUpperBound

int

    
DECLARE

@TotalRecords
  
int

    
SET

@PageLowerBound
= @PageSize
                    * @PageIndex

    
SET

@PageUpperBound
= @PageSize
                    - 1 + @PageLowerBound
 

    
--
Create a temp table TO store the select results

    
CREATE
TABLE 
#PageIndexForUsers

    (

       
            IndexId 
int
IDENTITY 
(0,
1) 
NOT
NULL
,

       
            UserId 
uniqueidentifier

    )
 

    
--
Insert into our temp table

    
INSERT
INTO 
#PageIndexForUsers
                    (UserId)

       
        
SELECT

u.UserId

       
        
FROM


  dbo.aspnet_Users
                u, dbo.aspnet_Profile p

       
        
WHERE


 ApplicationId = @ApplicationId

       
                
AND

u.UserId
                    = p.UserId

       
                
AND

(@InactiveSinceDate
                
IS
NULL OR 
LastActivityDate
                        <= @InactiveSinceDate)

       
                
AND

(
  
                    (@ProfileAuthOptions = 2)

       
                       

                OR 
(@ProfileAuthOptions
= 0 

                        AND 
IsAnonymous
= 1)

       
                       

                OR 
(@ProfileAuthOptions
= 1 

                        AND 
IsAnonymous
= 0)

       
                    )

       
                
AND

(@UserNameToMatch
                
IS
NULL OR 
LoweredUserName
                    
LIKE
LOWER
(@UserNameToMatch))

       
        
ORDER
BY 
UserName
 

    
SELECT

u.UserName,
u.IsAnonymous, u.LastActivityDate,
                    p.LastUpdatedDate,

       
                
DATALENGTH
(p.PropertyNames)
                    + 
DATALENGTH
(p.PropertyValuesString)

+
DATALENGTH (p.PropertyValuesBinary)

    
FROM


  dbo.aspnet_Users
                    u, dbo.aspnet_Profile p, #PageIndexForUsers
i

    
WHERE


 u.UserId = p.UserId 
AND

p.UserId
= i.UserId 
AND
                                

i.IndexId >= @PageLowerBound AND i.IndexId <= @PageUpperBound
 
 

    
DROP
TABLE 
#PageIndexForUsers
 

    
END
 

END 

First it looks up for ApplicationID.


    
DECLARE

@ApplicationId


                    uniqueidentifier

    
SELECT

@ApplicationId
= 
NULL

    
SELECT

@ApplicationId
= ApplicationId
            
FROM

aspnet_Applications
                    

WHERE LOWER (@ApplicationName) = LoweredApplicationName

    
IF

(@ApplicationId

IS
NULL
)

       
    
RETURN

Then it creates a temporary table (why not table data type?) in order to store profiles of users.

    -- Create a temp table TO store the select results


    
CREATE
TABLE 
#PageIndexForUsers

    (

        IndexId 

        int IDENTITY 
(0,
1) 
NOT
NULL
,

        UserId 

        uniqueidentifier

    )
 

    
--
Insert into
        our temp table

    
INSERT
INTO 
#PageIndexForUsers
(UserId)

If it gets called very frequently, there will be too high IO due to the temporary table creation. It also runs through 2 very big tables - aspnet_Users and aspnet_Profile. The SP is written in such a way that if one user has multiple profiles, it will return all profiles of the user. But normally we store one profile per user. So, there's no need for creating a temporary table. Moreover, there's no need for doing LIKE LOWER (@UserNameToMatch). It is always called with a full user name which we can match directly using the equal operator.

So, we opened up the stored proc and did a open heart bypass surgery like this:


IF 
@UserNameToMatch

IS
NOT NULL 

   
        
BEGIN
 

   
        
SELECT

u.UserName,
u.IsAnonymous, u.LastActivityDate, p.LastUpdatedDate,

   
                    
DATALENGTH
(p.PropertyNames)
                    + 
DATALENGTH
(p.PropertyValuesString)
+ 
DATALENGTH
(p.PropertyValuesBinary)

   
        
FROM


  dbo.aspnet_Users u

   
        
INNER
JOIN 
dbo.aspnet_Profile
p 
ON

u.UserId
= p.UserId
                

   
        
WHERE

   
            u.LoweredUserName = 
LOWER
(@UserNameToMatch)
 

   
        
SELECT

@@ROWCOUNT
 

   
        
END

   
        
ELSE

    
BEGIN

-- Do the original bad
things

It ran fine locally. Now it was time to run it on the server. This is an important SP which is used by the ASP.NET 2.0 Profile Provider, heart of ASP.NET framework. If we do something wrong here, we might not be able to see the problem immediately, but may be after one month we will realize users profile is mixed up and there's no way to get it back. So, it was a pretty hard decision to run this on a live production server directly without doing enough test. We did not have time to do enough test anyway. We are already down. So, we all gathered, said our prayer and hit the "Execute" button on SQL Server Management Studio.

The SP ran fine. On the server we noticed from 100% CPU usage it came down to 30% CPU usage. IO usage also came down to 40%.

We went live again.

Here's another SP which gets called on every page load and webservice call on our site because we use Profile provider extensively.


CREATE PROCEDURE 
[dbo].[aspnet_Profile_GetProperties]

    @ApplicationName      
nvarchar
(256),

    @UserName           

nvarchar
(256),

    @CurrentTimeUtc       
datetime

AS

BEGIN

    
DECLARE

@ApplicationId

uniqueidentifier

    
SELECT

@ApplicationId
= 
NULL

    
SELECT

@ApplicationId
= ApplicationId 

                FROM 
dbo.aspnet_Applications


                        WHERE LOWER
(@ApplicationName)
= LoweredApplicationName

    
IF

(@ApplicationId

IS
NULL
)

        

        RETURN
 

    
DECLARE

@UserId

uniqueidentifier

    
SELECT

@UserId
= 
NULL
 

    
SELECT

@UserId
= UserId

    
FROM


 dbo.aspnet_Users

    
WHERE

ApplicationId
= @ApplicationId 

                AND 
LoweredUserName
= 

                        LOWER
(@UserName)
 

    
IF

(@UserId

IS
NULL
)

        

        RETURN

    
SELECT
TOP 
1
PropertyNames, PropertyValuesString, PropertyValuesBinary

    
FROM


       dbo.aspnet_Profile

    
WHERE


      UserId = @UserId
 

    
IF

(@@ROWCOUNT
> 0)

    
BEGIN

        

        UPDATE 
dbo.aspnet_Users

        

        SET 

  LastActivityDate=@CurrentTimeUtc

        

        WHERE 
UserId
= @UserId

    
END

END

When you run the SP, see the statistics:


Table 
'aspnet_Applications'
.
Scan 
count

1,
logical reads 2, physical reads 0, 

                        read
-ahead
reads 0, lob logical reads 0, lob physical
                            reads 0, lob 
read
-ahead
reads
                                    0.
 

(1 row(s) affected)

Table 
'aspnet_Users'
.
Scan 
count

1,
logical reads 4, physical reads 0, 

                        read
-ahead
reads 0, lob logical reads 0, lob physical
                            reads 0, lob 
read
-ahead
reads
                                    0.
 

(1 row(s) affected)
 

(1 row(s) affected)

Table 
'aspnet_Profile'
.
Scan 
count

0,
logical reads 3, physical reads 0, 

                        read
-ahead
reads 0, lob logical reads 0, lob physical
                            reads 0, lob 
read
-ahead
reads
                                    0.
 

(1 row(s) affected)

Table 
'aspnet_Users'
.
Scan 
count

0,
logical reads 27, physical reads 0, 

                        read
-ahead
reads 0, lob logical reads 0, lob physical
                            reads 0, lob 
read
-ahead
reads
                                    0.
 

(1 row(s) affected)
 

(1 row(s) affected)

This stored proc is called whenever you access some property on Profile object.

First it does a SELECT on aspnet_application to find out the application ID from application name. You can easily replace this with a hard coded application ID inside the SP and save some effort. Normally we run only one application on our production server. So, there's no need to lookup application ID on every single call.

From client statistics, you can see how expensive it is:

 

Then look at the last block where aspnet_users table is updated with LastActivityDate. This is the most expensive one.

This is done in order to ensure Profile provider remembers when was the last time a user's profile was accessed. We do not need to do this on every single page load and web service call. May be we can do it when user first logs in and logs out. In our case, a lot of web service is called while user is on the page. There's only one page anyway. So, we can easily remove this in order to save a costly update on the giant aspnet_users table on every single web service call.

Posted by omar with 6 comment(s)
Filed under:

100% CPU, 100% IO, a near death experience for SQL Server 2005 and us

For last two weeks we were having pretty hard time at Pageflakes. The database server was having 100% CPU, and 100% IO usage most of time. It was running hot, almost about to go to a coma. It's a 64 bit Dual Core Dual Xeon Dell server with 2 GB RAM running around 30 GB database on 4 SCSI drives. So, it's more or less the best hardware money can buy (except for RAM ofcourse). But still the performance counter looked like this:

Pretty horrible situation isn't it? Users are having connection timeout on their browser. Event Log is full of "SQLConnection: Timeout". User's can't see their page. Those who can see their pages, have serous poor performance and slow response from the server. You can imagine the rest. Email flood, phone calls, management screaming on your ears etc etc.

After a lot of diagnostics, we came to conclusion that SQL Server 2005 is the culprit. So, we ran SQL Profiler on a different server (don't ever run SQL Profiler on the same server where you database is running). We saw there's an SP which is taking thousands of read and cpu cycles.

So, we were pretty much sure that SP was the culprit. So, we took one of the long executing ones and ran it through Execution Plan to see what goes wrong.

So, we first did this in order to see IO usage during the SP execution:


set statistics IO on

GO

Here's the output after we run the SP:

Table 'RSSItem'. Scan count 1, logical reads 40, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Nothing special. Runs pretty quick, very low read count. Looks like no read count at all, completely cached and returned from memory. So, SQL Server 2005 is doing pretty good caching on it. After this, we verified whether the index was being used properly or not:

'Clustered Index Seek' the best possible thing. If it was Clustered Index Scan, then we would have thought there's something wrong. So, we were making best use of Index also. We were pulling our hairs out.

So, I decided to look at the SP in order to see if I can find out something. I stared at this for half an hour:


ALTER PROCEDURE 
[dbo].[prcRSSItemGetByChannelIDPageSize]

    @ChannelID 
int
,

    @StartIndex 
int
,

    @PageSize 
int

AS
 

WITH 
SelectedRSSItems(
[ID],[ChannelID],[Hash],...)

AS

(

    
SELECT


        [ID],

        [ChannelID],

        [Hash],

        [Title],

        [Guid],

        [Description],

        [EncodedContent],

        [Link],

        [PublishDate],

        [XML],

        [SavedCount],

        ROW_NUMBER() 
OVER

(
ORDER
BY 
[PublishDate]

DESC
)

AS

[RowNumber]

    
FROM

        dbo.[RSSItem]

    
WHERE

        [ChannelID] = @ChannelID 

)
 

SELECT 

        *

FROM

        SelectedRSSItems

WHERE

        RowNumber 
BETWEEN

(@StartIndex
+ 1) 
AND

(@StartIndex
+ @PageSize)

ORDER BY 
PublishDate

desc

Those who are wondering what's with the "WITH" block. This is called Common Table Expression (CTE). This is a new feature in SQL Server 2005 which is the best so far for paging. Previously we had to page rows by doing any of the following:

  1. Copy primary key in a temporary table after sorting and use IDENTITY column to generate row number. Then select rows by joining primary key from temporary table to the actual table. The paging is done based on row number on the temporary table. Very expensive. You end up creating temporary table on every call to the SP. Too many IO on tempdb.
  2. Use subquery. Very complicated SQL. Not so good result.
  3. Use cursor to skip rows and select only those which fall within the page. It was the best solution so far.

Now you have the mighty ROW_NUMBER() function which we all have been waiting for 10 years!

Back to our disaster, you see the SQL is nice and clean, nothing suspicious. But SQL Server 2005 is drowning taking us and a thousands of users all over the world with it.

Finally, it struck my mind! SQL Server is doing something on the PublishDate field because it needs to sort the rows based on it.

        ROW_NUMBER() OVER ( ORDER BY [PublishDate] DESC ) AS [RowNumber]

What if I make a new Non-Clustered index on ChannelID and PublishDate? The query filters using ChannelID and sorts using Publishdate. So, it should get everything it needs from the index.

I created a brand new index on ChannelID and PublishDate. It took 12 mins to create the index. Then the CPU usage looked like this:

We are saved! We are back in business!

So, what we learn from this experience? We learn nothing.

This blog post is dedicated to my friend Shahed who wrote the above SP two weeks ago and gave us a lot of fun all these days.
Posted by omar with 3 comment(s)
Filed under:

Blogging Tools

Today I am trying to use Windows Live Writer Beta. Previously I have tried using One Note 2007, MS Word 2007, InfoPath template. None of these were good enough. Although MS Word 2007 is really good for writing and quick publishing, but it generates bloated html, makes you page heavy and also codes look horrible. None of these have the feature to upload pictures with post. So, you have to go to the blog website, upload pictures and then edit the post using the web editor and insert pictures in it. If you do all these from the web, then the desktop blogging experience does not improve at all.

Now Windows Live Writer seems to have solved all these problems. Let's try publishing a picture:

It's a pretty interesting picture. For last one week, our servers running Pageflakes were almost dying with super high CPU and IO usage. Here you see a 64bit Dual Core Dual Xeon DELL server is at its knees. We finally resolved the problem last night. It was a near death experience for us. I will soon write about it. I tried a long time to put some picture. Did not find any. So, just testing with this.

Let's try the Google Map embedding. This is my country and Dhaka is the city where I live.

Let's try some codes from Visual Studio:

    [WebMethod]
    [WebOperationAttribute(true, ResponseFormatMode.Json, false)]
    public int GetPageVersionNo(int pageId)
    {
        using (TheFacade facade = new TheFacade())
        {
            return facade.ThePageflakes.GetPageVersionNo(pageId, Profile.UserName);
        }
    }

OK, here's a catch. You cannot paste html to this editor. You need to switch to Html View in order to paste the html. But this is much better than going to web editor and doing it.

So, far it really felt the best editing experiencing. Going to publish it now. Hope it works... 

Posted by omar with no comments
Filed under:

Reply to all emails from unique sender in an Outlook folder

You have a folder full of mails from your friends. Now you want to reply to each of them sending some fixed message. One person will get only one reply no matter how many mails are there from that user.  Here's how you can do this:
 
  1. First write an email in Text Format.
  2. Save the email. It will go to "Drafts" folder. Now drag that email to the folder which contains all the emails you want to reply to.
  3. Now press ALT+F11 and paste the macro at the end of the post.
  4. Hit F5 and wait for a long time. You will see, Outbox has all the reply emails.
 
The macro runs through each and every email and then checks if the sender is already replied to. If not replied to, then it gets the body of the message and then makes a reply by combining your message with the body of the message.
 
The benefit of this approach is you are replying to mails sent from someone which has low probability of getting filtered out by spam filters.

Sub ReplytoALLEmail()
    Dim objFolder As Folder


Dim objTemplate As MailItem
Set objTemplate = Application.ActiveExplorer.Selection.Item(1)

Set objFolder = objTemplate.Parent

Dim objItem As MailItem
Dim objReply As MailItem

Dim dic As New Dictionary

Dim strEmail As String

Dim strEmails As String

Dim strBody As String
strBody = objTemplate.Body

For Each objItem In objFolder.Items
If Not (objTemplate = objItem) Then

strEmail = objItem.SenderEmailAddress

If Not dic.Exists(strEmail) Then
strEmails = strEmails + strEmail + ";"

dic.Add strEmail, ""

Set objReply = objItem.Reply()

objReply.Body = strBody + vbCrLf + vbCrLf + objItem.Body

On Local Error Resume Next
objReply.Send
End If
End If

Next

Debug.Print strEmails
End Sub
 
Posted by omar with 6 comment(s)
Filed under:

Web Application performance optimization tips

Yesterday I took a class on website performance optimization. I would like to share what I did in the class.
 
I took the class in a company which has a local blog site. It has serious performance issue because of both hardware and software limitations. The homepage takes around 3 sec to prepare on the server side which is absolutely unacceptable. So, it was an ideal candidate to digg into the architecture and code of the site and try to improve part by part.
 
The blog site has the following tables: Blog, Post, Comment, Category. The database is mySQL.
 
Blog table is like this which represents one blog of a user:
ID
BIGINT
Title
text
UserID
BIGINT FK to user table
 
Post table contains posts in a blog and looks like this
ID
BIGINT
BlogID
BIGINT FK to Blog table
Title
Text
Section1
Char(1)
Section2
Char(1)
Section3
Char(1)
.
.
.
 
 
 
Here Section1…9 are global sections for blog posts like Politics, Jokes, Entertainment etc. Each blog belongs to some section.
If Section1 contains '1', then the post belongs to Politics. (I know what you are thinking, hold on)
 
Comment table contains comments submitted against posts:
ID
BIGINT
PostID
BIGINT FK to Blog table
Title
Text
 
Category table contains category of a blog where posts belong. Each blog can have their own category list. Each post belongs to one or more category.
ID
BIGINT
BlogID
BIGINT FK to Blog table
Title
Text
 
Optimization step 1 - Table design and data type
First step is to change the table design. You see all those BIGINTs? Why do you need BIGINT instead of INT? INT is 32 bit which gives you a range from 0 to 4294967295. So, if you make 13 posts per second, in 10 years this number will run out. It's a good plan to think ahead of 10 years, unfortunately we don't have 10 years advanced hardware. When you have 32bit processor, every time you make you processor work with a 64bit number like comparing numbers (e.g. WHERE BlogID=1121233388765543246) it has to compare two 32 bit chunks and combine their results in order to reach to a decision. So, your 32bit processor is doing double work. If your CPU is 60% busy with such computation, converting to INT will reduce CPU usage to nearly 40% easily because comparing 32bit with another 32bit for a 32bit processor is just one operation for the processor. 32bit data structure is the fastest data structure for 32bit processors. But if you have a 64bit processor hardware, OS is 64bit, Database engine is 64bit, then you can use 64bit data structure and then 64bit data structure will be the fastest one. However, if you have 64bit hardware, but your OS is 32bit and Database Engine is also 32bit, you do not gain much speed improvement.
 
So, we converted all the BIGINTs to INT. 64bit to 32bit.
 
We also need to think about storage optimization. 64bit things will take double space than 32bit things. But that's a negligible addition. We need to think about bigger stuffs like Text data types. Sometimes we use the common "Text" data type of variable length every where in order to avoid future database changes. So, trying to save efforts in changing database design in next 10 years ultimately results in poor database performance and storage problems for next 10 years. Here are some tips on choosing the right field type and size:
  • If you don't need unicode support (multiple language) in fields, use the  non-unicode types. For ex, use varchar instead of nvarchar. All the text data types that start with "n" support unicode and thus take 2 bytes per character. That's double the size of the non-"n" counter parts. Normally we need unicode only for fields which somehow gets into the UI e.g. First Name, Title, Description etc. But fields like "FileName", "Path", "Url" etc do not need unicode support at all (when your server is in some English speaking country, not in Japan, China etc). So, you can use varchar in these cases and save 50% space.
  • Try to avoid using text data types as primary key. Searching over a text data type is lot more expensive than searching on integer data types. I have seen tables with primary key on "Url", "FileName", "Full Name" type fields. Think about what happens when you make a text field primary key. Database Engine has to match characters in the field data whenever you do queries like "WHERE FileName='something'". It has to match "all" the characters in order to find a complete match. That's way more expensive than just doing "WHERE ID=10" which only takes one 32bit comparison which the processor can do billions of times per second. You can easily get around using text fields as primary key by adding an integer "ID" column which is an auto number and using that ID column on hyperlinks and page navigations. For example, instead of using hyperlinks like www.pageflakes.com/downloadfile.aspx?filename=TestFile.html you can do it as www.pageflakes.com/downloadfile.aspx?id=10
  • Fixed length text data types are much faster to compare and store than variable length data types. Whenever you use fixed length data types like char(255), database engine allocates fixed 255 bytes in the row and it knows there's no way it's going to increase. So, it need not maintain a 'length' indicator, nor does it need to increase/decrease row size whenever the content changes. This results in less fragmentation. Less fragmentation means less hard drive usage and less moving around in different parts of the hard drive. So, use fixed length data types whenever possible and especially to those fields which appear in WHERE clause. But beware that when field length fields are compared, you need to account for the trailing spaces. If you are comparing WHERE FirstName = 'Omar' and first name field is defined as char(15) then you actually need to do WHERE FirstName = 'Omar' + (11 spaces).
  • Do not use expression on the left side of the WHERE clause. For example, you could do the above like this: WHERE Rtrim(FirstName) = 'Omar'. But that makes the database engine to trim every single first name it runs through. Similarly, if you do WHERE Age+10 > 20 it has to do the sum for every row it runs through. The idea is to use the expression on the right side so that database engine can run that expression once and then compare the resultant with the field data on the left. So, do this: WHERE Age > 10
 
Optimization Step 2 - Using proper index
Normally rows are added sequentially and the last row is added at the end of the table. So, blog posts are stored in this way in database:
 
Post table sample data
ID
BlogID
Title
1
1
2
1
3
2
4
1
5
2
6
4
7
2
8
4
...
Now if you ask your database engine to find all the posts that belong to BlogID = 2, it has to run through the entire table to find which rows have BlogID = 2 and select those rows. So, if you have 1 million rows, it runs through 1 million rows to select 10 or 20 rows from it. If you see your server's CPU usage is very high and hard drive activity is also very high, then database is running throughout your hard drive in order to find rows. Here's an interesting way to find how busy your hard drive is:
High disk usage
Go to Control Panel->Administrative Tools->Performance
Add the above performance counters. You will see hard drive usage pretty well. See how busy both hard drive is from the above picture? Read time is more than it can handle.
% Disk Read Time is way high. Which means it's trying its best to find rows from DB and going out of its limit. Also the alarming counter is "Current Disk Queue Length". This indicates that there are 22 requests waiting for reading something from your hard drive and they are currently waiting for the last request to complete. This means your hard drive is so busy running here and there that request for fetching data from hard drive is getting piled up continuously.
 
By using Index, you can resolve this. Index are like Binary Trees. Binary Tree stores data in such a way that, they can be searched very quickly. If you have 1M rows in a table, it can find an entry in say 100 steps. In order to learn details about Binary Tree, please search in Google.
 
However, you need to do proper indexing in order to get the correct result. Improper indexing will make this situation worse. Here's what I discovered in an index in the "Post" table:
 
Index
BlogID, ID
 
The index is on BlogID field and ID field. We definitely need BlogID field to be indexed because we need to find rows using BlogID. But what happens when you make one index with both BlogID and ID? It creates each entry in the index using a combination of BlogID and ID. So, an entry in the index only matches when you query rows with both BlogID "AND" ID. Something like "WHERE BlogID=1 AND ID=12312". But no one will do it for sure because if you know the ID, you know a particular row already. You don't need BlogID at all. Usually the query is "WHERE BlogID=something". So, you need one index which has BlogID only. However, sometimes you need to find an individual post using ID. So, you also need "another" index which has ID field only. So, there should be 2 separate indexes, one on BlogID and one on ID.
 
After doing this, table scan stopped, indexes were used properly and performance counter looked like this:
Low Disk Usage
 
Peace at last.
 
Some handy tips for making proper index:
 
  • You MUST NOT run a query which has fields in WHERE clause but those fields are not part of any index. As soon as you run a query which has some field in the WHERE cause not part of any index, database engine has to run through ENTIRE table to find a match. That's when CPU and Hard drive usage goes sky high.
  • Look at your WHERE clause fields. If you have AND, then make an index which contains all the fields in ANDs. For ex, WHERE A=1 AND B=2 AND C=3, then make an index based on A, B, C and it must be in this exact order.
  • If you have multiple queries on the same table and each has different fields in WHERE clause, create different indexes. For ex, if you have a query which does WHERE A=1 and then there's another query which does WHERE B=1, create two separate index. One with field A and one with field B.
  • The above applied for clauses which has OR. For example, WHERE A=1 OR B=2
 
Optimization step 3 - Caching rows in memory or text file
Let's look at how the page is rendered, we need to decide on performance optimization strategy based on how things really look:
 
Logo
Page title
Date
Top Blogger list
Blog posts
Recent comment list
Blogger1
Blogger2
Blogger3
Blogger4
Post 1 Title
Post 1 description…….
# of comments
 
Post 2 Title
Post 2 description…….
# of comments
 
Post 3 Title
Post 3 description…….
# of comments
 
Post 4 Title
Post 4 description…….
# of comments
 
<Prev 10      Next 10>
Comment 1
Comment 2
Comment 3
Comment 4
 
There are 3 major parts, list of Top bloggers who has the highest number of posts, show the last 10 blogs per page at the middle and at the right side, show last 10 comments
 
Left part blogger list is generated based on this query:
  • Find number of posts made by each user
  • Order them in descending order. Highest number of posts become Rank 1.
  • Get the top 10 rows.
 
You can guess, it's a complicated query and requires the database engine to do a lot of running here and there on comments table, post table, blog table and user table. So, when we run this query on every visit to homepage, it makes the entire server pretty slow. There are 10 to 15 requests per second on the homepage which makes the database engine run this query 10 to 15 times per second! That's around 864,000 per day! It's a miracle that the server is still alive don't you think? Now think about the scenario, when does bloggers' rank change? When a blogger makes a post. So, how often does a blogger make a post? Say twice or thrice per day. So, why do we run this query 10 times per second instead of twice or thrice per day? Now that is the right question, Neo.
 
Remember this, the frequency of changing data is much less than the frequency of reading data.
 
So, what does this mean? INSERT, UPDATE and DELETE happens much less than SELECT. Normally SELECT happens 100s or 1000s times more than INSERT/UPDATE/DELETE. So, we do not need to run the same SELECT again and again when the underlying rows are not changing at all. We can SELECT once, and store the result somewhere so that when we SELECT the same thing again and the data in the table has not changed, we can serve the same result from memory instead of going to the database.
 
So, here's the plan, we run the complicated query which takes some time to find out the top 10 bloggers and then get the result. Then we store the result either in memory or in some text file in the web server. The homepage will ALWAYS read from that text file and render the list. It will never go to the database and fetch the list. So, this already saves the database from running a complicated query and reduces the execution time of the homepage.
 
The same thing can be done for the "Recent Comments" list. This list gets updated only when a new comment is posted. So, it can also be cached and stored either in memory or in a text file.
 
Making these 2 most expensive queries cached, we can save 40% of the execution time of the home page. Now that's some improvement!
 
Now comes the most interesting part, the center part which shows 10 posts at a time. Post table is a pretty big table. It can have millions of rows within couple of months. So, doing queries on this table will become pretty expensive unless you do the right indexing. Even indexing won't help you much if there's too much traffic on your site. So, we need to do intelligent caching here.
 
We apply the principle of updating cache only when something change in this place too. The post list shows recent posts in paging mode which shows 10 posts at a time. Normally people see the top 10 posts every time they visit the homepage. So, that's something we need to cache seriously. Also, this cache gets refreshed frequently too. May be once every 20 seconds when someone adds or edits a post. So, the lifetime of this cache is pretty short also.
 
But if you cache top 100 posts in the cache, you can provide cached content to a user for up to 10 clicks on the 'Next 10 posts' link. Normally 80% users will click 'Next 10 posts' once, 60% will click twice, 40% will click thrice. Some persistent ones will click it 10 times but that's pretty rare. So, you can get around by caching top 100 or 200 posts easily. This saves you from doing a SELECT on the large "Post" table and then ordering it by "PostDate" and then selecting the top 10. You do it only once when you build the "Top100Posts" cache table or text file and store the data in the table in sorted order. This way, when paging occurs, you only select Top 10 rows from Nth position from the "Top100Posts" table or the text file. This happens blazingly fast compared to doing this on the actual "Post" table.
 
Optimization Step 4 - Page output rendering
Here's a nice formula you can use for caching:
 
Output = code(a) + code(b)
Where, code(a) = db(a) + db(b)
code(b) = db(c) + db(d)
 
This means, some code uses some part of the database and some other part of the code uses some other part of the database. The final output that user sees is from both codes' combined hard effort.
 
So, if db(a) and db(b) gets cached, the output of code(a) can also be cached. Their is no way result of code(a) is going to change until either db(a) or db(b) changes. Think about the blog example. If top 10 bloggers are calculated by code(a) and it uses two database operations to prepare the list db(a) and db(b), then until db(a) or db(b) returns a different value, the result of code(a) is never going to change. So, when we cache it, the equation becomes:
 
x = code(a) = db(a) + db(b)
 
So, the formular is: output = x + code(b)
 
Let's say code(b) generates the 'Recent comments' list. If that is also cached then the formula becomes:
 
y = code(b) = db(c) + db(d)
 
So, the formula is: output, z = x + y
 
Now you see, the right side is entirely cached. So, until either x or y changes, z is always the same. That means, if you and I hit the home page of the blog site, we will see the exact output until someone posts a blog or comment.
 
Now think, why do we even need to execute code(a) and code(b) and generate the output at all? If we do it once in a while and store the entire output in a static html file, then we can just deliver that static html file to the browser. Is there any need to execute .php code, or .aspx code at all for the homepage? The entire homepage can become just a static html file which web server can cache it in memory. Thus the entire page can be served from memory without requiring a single IO operation or database call!
 
Conclusion
We started with super high CPU and IO usage, thousands of database calls, super slow web site and came down to zero database call, zero IO usage and a super fast homepage.
 
Posted by omar with 16 comment(s)

Blogging from InfoPath

I'm trying to Blog from InfoPath. There's a fantastic InfoPath template available here:
 
 
I have tried Word 2007's new blogging feature. It works fine for posts without code. But if you insert code, the code looks super horrible when the blog gets published. Even the "Copy as HTML" can't give you good output. Also no image upload support.
 
I am thinking what if there was a nice addin for MS Word which can make a document publish as blog by automatically uploading pictures in it and nicely formatting code blocks, it could make us so productive. Mankind has invented so many great tools, but a really good Blogging tool that works for all is still missing.
 
 
Posted by omar with no comments
Filed under:

Client side Page Fragment Output cache, reduce page download time significantly

When you have a page full of lots of HTML, it will be best if the whole page can be cached on the browser. You can do this using HTTP Response caching header either by injecting them manually or by using @OutputCache tag directive on ASPX pages.

<% @ OutputCache Location ="Client" Duration ="86400" VaryByParam ="*" VaryByHeader ="*" %>

But if part of the page is dynamic and part of the page is static (like header, left side menu, foother, right side menu, bottom part) etc where static parts of the page occupy a significant amount of html, then if you could cache those parts on the browser, you could save a lot of bytes that gets downloaded every time the page downloads. On most of the websites, you will find the header, navigation menu, footer, bottom part, advertisements are mostly static and thus easily cacheable. If the whole page size is say 50KB, at least 20KB is static and 30KB might be dynamic. If you can use client side caching of page fragments (not ASP.NET’s server side page output cache), you can save 40% download time easily.

ASP.NET offers you page fragment caching using @Outputcache which is good, but that caching is on server side. It cache the output of user controls and serves them from server side cache. You cannot eliminate the download of those costly bytes. It just saves some CPU process. Nothing much for users in it.

The only way to cache part of page is by allowing browser to download those parts separately and making those parts cacheable just like images/CSS/javascripts get cached. So, we need to download page fragments separately and make them cached on the browser’s cache. IFRAME is an easy way to do this but IFRAME makes the page heavy and thus not follow CSS of the page. There are many reasons why IFRAME can’t work. So, we have a better way, we will use Javascript to render content of the page and javascript will get cached on the browser’s cache.

So, here's the idea:

  • We will split the whole page into multiple parts
  • We will generate page content using Javascript. Each cacheable part comes from javascript and javascript renders the HTML of it.
  • The parts which are cachable gets cached by the browser and thus never downloaded again (until you want them to be). But those parts which are non-cachable and highly dynamic, does not get cached by browser.

So, let's think of a page setup like this:  

Logo

Header

Left navigation Menu

Dynamic part of the page

Footer


Here only one part is dynamic and the rest is fully cacheable. So, the Default.aspx which renders this whole page looks like this:

<%@PageLanguage="VB"AutoEventWireup="false"%>
<%@OutputCacheNoStore="true"Location="None"%>
<!DOCTYPEhtmlPUBLIC"-//W3C//DTDXHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<htmlxmlns="http://www.w3.org/1999/xhtml">
<headrunat="server">
    <title>MyBig Fat Page</title>
</head>
<body>
 
<formid="form1"runat="server">
<tablewidth="100%"border="1">   
<tr>
<td>Somelogo here</td>
<td><scriptid="Script1"src="Header.aspx"type="text/javascript"></script></td>
</tr>       
<tr>
<td><scriptid="LeftMenu"src="LeftMenu.aspx"type="text/javascript"></script></td>
<tdbgcolor="lightgrey"><div>
This is the dynamic part which gets changed on every load. Checkout the time when
it was generated: <%=DateTime.Now %></div></td>
</tr>       
<tr>
<tdcolspan="2"><scriptid="Footer"src="Footer.aspx"type="text/javascript"></script></td>
</tr>
</table>
</form>
 
</body>
</html>

The page looks like this:

You see, the cached parts are 30 mins older. Browser has not downloaded those parts at all and thus saved a significant amount of data transfer. The only part that was downloaded was the dynamic part.

When you load the page first time, all 4 files are downloaded. But the last 3 files get cached and never downloaded until browser’s cache expires. So, on second visit, only one file downloaded and thus saves a significant amount of data transfer.

Let’s look at one of the files Header.aspx which gets cached. Nothing fancy here, it’s a regular ASPX page:

 

The interesting thing here is the “ContentType” which I have set to “text/html/javascript”. This is not something built-in, I have introduced this type.

When you put an ASPX inside a Script tag, it surely does not work because < script id ="Script1" src ="Header.aspx" type ="text/javascript"> expects javascript output, not html output. If html output is provided, browsers simply ignores it. So, we need to convert the output of Header.aspx into Javascript which when downloaded and executed by the browser, emits the original html that was generated when ASP.NET executed the page.

We use HTTP Module to intercept all .aspx calls and when the page is about to be written to the output, we check if the content type is “text/html/javascript”. If it is, this is our cue to convert the page output to javascript representation.

If you want to know details about HTTP Module and how to use Response Filter to modify page output, please read this wonderful article:

http://www.aspnetresources.com/articles/HttpFilters.aspx

It really explains all the things. I would recommend you read this article first and then continue with the rest.

We have made a response filter named Html2JSPageFilter.js (available in the code download), which overrides the Write method of Stream and converts the entire HTML of the page to javascript representation:

    publicoverridevoidWrite(byte[]buffer, intoffset, intcount)
    {
        stringstrBuffer = System.Text.UTF8Encoding.UTF8.GetString(buffer, offset, count);
 
        //---------------------------------
        //Wait for the closing </html> tag
        //---------------------------------
        Regexeof = newRegex("</html>",RegexOptions.IgnoreCase);
 
        if(!eof.IsMatch (strBuffer))
        {
            responseHtml.Append (strBuffer);
        }
        else
        {
            responseHtml.Append (strBuffer);
            string finalHtml = responseHtml.ToString ();
 
            //extract only the content inside the form tag tag ASP.NET generatesin all .aspx
            intformTagStart = finalHtml.IndexOf("<form");
            intformTagStartEnd = finalHtml.IndexOf('>',formTagStart);
            intformTagEnd = finalHtml.LastIndexOf("</form>");
 
            stringpageContentInsideFormTag = finalHtml.Substring(formTagStartEnd + 1,formTagEnd - formTagStartEnd - 1);

First we get the entire page output, then we get only what is inside the <form> tag that ASP.NET generates for all .aspx pages.

Next step is to remove the viewstate hidden field because this will conflict with the view state on the default.aspx.

            //Remove the __VIEWSTATE tag because page fragments don't needviewstate
            //Note this will make all ASP.NET controls in the page fragments gomad which 
            //needs viewstate to do their work.
            Regexre = newRegex("(<input.*?__VIEWSTATE.*?/>)",RegexOptions.IgnoreCase);
            pageContentInsideFormTag =re.Replace(pageContentInsideFormTag, string.Empty);

Now we convert the entire html output to javascript string format:

            ///Convert the HTML to javascript string representation
            stringjavascript2Html = 
                pageContentInsideFormTag.Replace("\r","")
                .Replace("\n","")
                .Replace("   ","")
                .Replace(" ","")
                .Replace("  ","")
                .Replace("\\","\\\\")
                .Replace("'","\\'");
 

Final touch is to put that javascript string inside a “document.write(‘...’);” call. When you call document.write to emit html, it gets part of the page html:

            //Generate the document.write('...') which adds the content in thedocument
            stringpageOutput = "document.write('"+ javascript2Html + "');";
 

This is basically the trick. Use a Response filter to get the .aspx output, and then convert it to Javascript representation.

For convenience, I have used a HttpModule to hook into ASP.NET pipeline and wait for .aspx files which try to emit content type of “text/html/javascript”. Again this content type is nothing special, you could use “text/Omar Al Zabir”.

    voidIHttpModule.Init(HttpApplicationcontext)
    {
        context.ReleaseRequestState += newEventHandler(InstallResponseFilter);
    }
 
    privatevoidInstallResponseFilter(objectsender, EventArgse) 
    {
     HttpResponseresponse = HttpContext.Current.Response;
 
     if(response.ContentType == "text/html/javascript")
     {
         response.ContentType = "text/javascript";
         response.Filter = newHtml2JSPageFilter(response.Filter);
     }
    }

And finally in web.config, we have to register the HttpModule so that it gets called:

        <httpModules>
            <addname="Html2JSModule"type="Html2JavascriptModule"/>
        </httpModules>

The entire source code is available in this URL:

Download Source code of: Client side Page Fragment Output cache, reduce page download time significantly

Enjoy. Use this approach in your aspx and html files and save significant amount of download time on users end. Although it slightly increases first time visit download time (200+ms for each script tag), but it makes second time visit a breeze. See the performance difference yourself. First visit www.pageflakes.com. Then close your browser, open it again and enter www.pageflakes.com. See how fast it loads. If you use a HTTP debugger to monitor how much data is transferred, you will see it's only 200 bytes!

Posted by omar with no comments
Filed under:

IIS 6 Compression - quickest and effective way to do it for ASP.NET compression

IIS 6 has builtin gzip compression ability which can compress output of dynamic webpages (.aspx) and webservices (.asmx). The compression is really good and can easily reduce 60% download time. You should always turn this feature on in your production server. The CPU usage is not that high compared to the reduction of download time for users. Your users will love the significant download time reduction when you turn it on.

Now, on internet you will find a lot of solutions. I tried all of them which appears in first 30 Google search results. But failed to make any of them work properly. Finally I was able to make it work, but I realized you have to do it in a very specific way and in specific order. Here it goes:

  • Go to IIS Manager from Administrative Tools
  • Right click on your computer name (not on websites or Default Web Site)
  • Choose All Tasks-> Restart IIS
  • From the drop down, choose "Stop IIS" and click OK.
  • IIS is not stopped. Make sure it's not still running
  • Now go to C:\WINDOWS\SYSTEM32\INETSRV
  • Make a copy of the file Metabase.xml. This is a dangerous file, don't play around with it. Make sure you have a backup before you do what I am going to tell you now.
  • Open the metabase.xml in Notepad. Don't use any other editor besides Notepad, Notepad2 or Visual Studio.
  • Find using "<IIsCompressionScheme"
  • You will find a match which looks like this: "<IIsCompressionScheme Location ="/LM/W3SVC/Filters/Compression/deflate"
  • Now t
  • here are 2 nodes which are <IISCompressionsScheme> and one node which is <IISCompressionsScheme s > Note: the 3rd one is plural.
  • Delete these nodes. After deleting you will see the next node is "<IIsWebInfo Location ="/LM/W3SVC/Info"
    Once you have deleted the 3 nodes, paste the text from this link in in their position:
Now start IIS and hit your site once. When it runs for the first time, it will send uncompressed output, but it will compress it behind the scene. So, next hit will give you the compressed output.

Go to www.pipeboost.com and enter the URL to ensure you are getting compressed content. Before you do so, make sure you have visited your site for a while in your local browser so that the pages got the chance to get themselves compressed.

Posted by omar with 39 comment(s)

Send a message individually to all recipients using Outlook

Say you have written a message which has lots of users in the recipient list. Now you don't want to send the mail to all those users who can see each others email address. You want to send the message one by one. Here's what you do:

  • After composing the message save it to make it go to Drafts folder.
  • Select the message
  • Press ALT+F11 to bring the Visual Basic Editor.
  • Now paste the following code:

Sub SendIndividually()   

Dim objItem As MailItem
Set objItem = Application.ActiveExplorer.Selection.Item(1)

Dim objTo As Recipient
Dim objClonedMail As MailItem

On Local Error Resume Next

For Each objTo In objItem.Recipients

Set objClonedMail = objItem.Copy()
objClonedMail.To = ""
objClonedMail.Recipients.Add objTo
objClonedMail.Send

Next

End Sub

  • Hit F5 and wait for a while. You will see Outbox is full of copy of the message with individual recipient in the "To"

Read my previous post on how to generate a message with all the sender's email address from all the message in a folder. You can combine both approach to send everyone important notice.

Posted by omar with 5 comment(s)
Filed under:

Get email address of all users from all mails in Outlook Folder

Sometimes you want to send some important notice to everyone who has ever mailed you. Let's say you have a folder named "Friends" in Outlook where you store all the emails from your friends. Now you want to get all of their email addresses. Pretty difficult work if you have thousands of such mails. Here's an easy way.

  • Select the folder in Outlook and press ALT+F11. It will open Visual Basic Editor.
  • Double click on ThisOutlookSession from the Project tree.
  • Paste the following function:

Sub GetALLEmailAddresses()    

Dim objFolder As Folder
Set objFolder = Application.ActiveExplorer.Selection

Dim dic As New Dictionary
Dim strEmail As String
Dim strEmails As String

Dim objItem As MailItem
For Each objItem In objFolder.Items

strEmail = objItem.SenderEmailAddress
If Not dic.Exists(strEmail) Then
strEmails = strEmails + strEmail + ";"
dic.Add strEmail, ""
End If

Next

Debug.Print strEmails
End Sub

Hit F5 and it will run for a while. Then press Ctrl+G. You will see the email addresses in the "Immediate Window".    

Copy the whole string and you have all the email addresses from all the emails in the selected Outlook folder. There will be no duplicate address in the list.

Posted by omar with 14 comment(s)
Filed under:

Mac OSX vs. Windows Vista

Today we purchased an iMac (Intel Dual Core) in order to test Safari 2.0 support for Pageflakes. My first impression, I am totally blown away by its performance. The UI is unbelievable well done and the visual effects you see are 10 years ahead of time. I was comparing OS features of Mac OS X 10.4.X.X with Windows Vista. To be honest, I am suffering from inferiority complex now. Windows Vista is so lame compared to Mac OS X. Vista copied all OS X's concepts for which at least I feel ashamed whether Microsoft feels or not. But it did not make anything better after copying. Normally Microsoft copies others ideas and makes them really better in all possible ways. But Mac OS X is so seriously better than any other OS. The widgets you see with the OS X are mind blowing. Vista side bar goes nowhere near to it. Also the concept of switching to regular mode and widget mode is by far the best idea. We normally don't want to occupy desktop space with widgets. That's why in OS X, the whole screen switches to widget view when you press middle button.

OSX kernel is made from Unix. So, it's seriously powerful and stable. The command line tools are very feature rich. Those who love *nix command line, will feel at home in OSX. Vista still offers those poor command line tools from Windows XP/2003. Nothing improved much.

If somehow OSX can run Windows apps like Visual Studio, MS Word etc, I will undoubtedly switch to OSX. Even Windows Vista won't be able to compromise me.

Best of all, you can buy an iMac with Intel Processor and use dual boot to run both Windows and Mac OSX. So, when you work, boot on Windows. When you want to have fun, go to OSX.

Posted by omar with 3 comment(s)
Filed under:

Blog from One Note

I am blogging from One Note 2007. This is so super cool. By the way, those who haven't seen One Note 2007 should definitely try it. It's really good!

Posted by omar with no comments
More Posts Next page »