Building the Data and Business Layers Using .NET 3.5 - Cleaning Up Inactive User and Related Data
(Page 4 of 5 )
An Ajax web portal has a unique challenge when it comes to cleaning up unused data that is generated by anonymous users who never return. Every first visit creates one anonymous user, a page setup, widgets, etc. If the user doesn’t come back, that information remains in the database permanently. It is possible that the user might come back within a day, or a week or a month, but there’s no guarantee. Generally, sticky users—users who return to your site frequently—make up 30 to 50 percent of the total users who come to an Ajax web portal. So, you end up with 50 to 70 percent unused data. Dropthings requires daily data cleanup to keep the database size down—user accounts expire, RSS feeds get old, anonymous sessions expire, and users never come back.
This is a huge cleanup operation once a web portal becomes popular and starts receiving thousands of users every day. Think about deleting millions of rows from 20 or 30 tables, one after another, while maintaining foreign key constraints. Also, the cleanup operation needs to run while the site is running, without hampering its overall performance. The whole operation results in heavily fragmented index and space in the MDF file. The log file also becomes enormous to keep track of the large transactions. Hard drives get really hot and sweat furiously. Although the CPU keeps going, it’s really painful to watch SQL Server go through this every day. But there is no alternative to keep up with SQL Server’s RAM and disk I/O requirements. Most importantly, this avoids counting users in monthly reports that are not valid users.
When a user visits the site, the ASP.NET membership provider updates theLastActivityDateof theaspnet_userstable. From this field, you can find out how long the user has been idle. TheIsAnonymousbit field shows whether the user account is anonymous or registered. If it is registered, then there is no need to worry. But if it is anonymous and more than 30 days old, you can be sure that the user will never come back because the cookie has already expired. However, we can’t avoid creating an anonymous user because the user might want a fresh start (see the “Implementing Authentication and Authorization” section in Chapter 3). Another scenario is a user logging out on a shared computer (e.g., a cyber café) and the next person using it as an anonymous user.
Here’s how the whole cleanup process works:
- Find out the users that are old enough to be discarded and are anonymous
- Find out the pages the user has
- Delete all of the widget instances on those pages
- Delete those pages
- Remove rows from child tables related toaspnet_userslikeaspnet_profile,aspnet_UsersInRoles, andaspnet_PersonalizationPerUser
- Remove rows for users to be deleted
- Remove the users fromaspnet_users
Example 4-16 is the giant DB script that does it all. I have added inline comments to explain what the script is doing.
Example 4-16. Cleaning up old anonymous users and their related data
-- Number of days after which we give users the 'bye bye'
DECLARE @Days int
SET @Days = 29
-- Number of users to delete per run. If it's too high, the database will get stuck
-- for a long time. If it's too low, you will end up having more trash than
-- you can clean up. Decide this number based on how many anonymous users are
–- created per day and how frequently you run this query. The correct formula
-- for this number is: @NoOfUsersToDelete > AnonUsersPerDay / FrequencyOfRun
DECLARE @NoOfUsersToDelete int
SET @NoOfUsersToDelete = 1000
-- To find other tables, create temporary tables that hold users and pages to delete
-- as the user and page are used.
-- Having them in a temp table is better than repeatedly running SELECT ID FORM ... IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo]. [PagesToDelete]') AND type in (N'U'))
DROP TABLE [dbo].[PagesToDelete]
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo]. [aspnetUsersToDelete]') AND type in (N'U')) DROP TABLE [dbo].[AspnetUsersToDelete]
create table PagesToDelete (PageID int NOT NULL PRIMARY KEY)
create table AspnetUsersToDelete (UserID uniqueidentifier NOT NULL PRIMARY KEY)
-- Find inactive anonymous users and store the UserID in the temporary
-- table
insert into AspnetUsersToDelete
select top(@NoOfUsersToDelete) UserID from aspnet_Users where
(isAnonymous = 1) and (LastActivityDate < (getDate()-@Days))
order by UserID -- Saves SQL Server from sorting in clustered index again
print 'Users to delete: ' + convert(varchar(255),@@ROWCOUNT)
GO
-- Get the user pages that will be deleted insert into PagesToDelete
select ID from Page where UserID in
(
select UserID from AspnetUsersToDelete
)
print 'Pages to delete: ' + convert(varchar(255),@@ROWCOUNT)
GO
-- Delete all widget instances on the pages to be deleted
delete from WidgetInstance where PageID IN ( SELECT PageID FROM PagesToDelete )
print 'Widget Instances deleted: ' + convert(varchar(255), @@ROWCOUNT)
GO
-- Delete the pages
delete from Page where ID IN
( SELECT PageID FROM PagesToDelete )
GO
-- Delete UserSetting
delete from UserSetting WHERE UserID IN ( SELECT UserID FROm AspnetUsersToDelete ) GO
-- Delete profile of users
delete from aspnet_Profile WHERE UserID IN ( SELECT UserID FROm AspnetUsersToDelete ) GO
-- Delete from aspnet_UsersInRoles
delete from aspnet_UsersInRoles WHERE UserID IN
( SELECT UserID FROm AspnetUsersToDelete ) GO
-- Delete from aspnet_PersonalizationPerUser delete from aspnet_PersonalizationPerUser WHERE UserID IN
( SELECT UserID FROm AspnetUsersToDelete ) GO
-- Delete the users
delete from aspnet_users where userID IN ( SELECT UserID FROm AspnetUsersToDelete )
PRINT 'Users deleted: ' + convert(varchar(255), @@ROWCOUNT)
GO
drop table PagesToDelete
drop table AspnetUsersToDelete
GO
Next: When to Run the Script >>
More .NET Articles
More By O'Reilly Media