Home.NET Grouping and Aggregating When Querying LIN...
Grouping and Aggregating When Querying LINQ to SQL
This is the second article in a series focusing on querying using “LINQ to SQL.” It covers grouping and aggregating data with both SQL and respective LINQ examples in VB.NET and C#.
For the convenience of this article, I created a LINQ model using the “LINQ to SQL designer” as follows:
All of the queries in this article will focus on the above model.
The entire source code for this article is available in the form of a free downloadable zip file. The solution was developed using Microsoft Visual Studio 2008 Team Edition on Microsoft Windows Server 2003 Standard Edition with Microsoft SQL Server 2005 Developer Edition. I didn’t really test it in any other environment. I request that you post in the discussion area if you have any problems in execution.
Working with Aggregation functions (or group functions)
SQL has several group functions (aggregate functions) to operate on a set of values. MAX, MIN, AVG, COUNT are a few of them. Now, we will try to write LINQ queries based on those functions.
The following SQL SELECT statement:
SELECT MAX(Freight) FROM Orders
Can be written using VB.NET LINQ as follows:
Dim q = (From p In db.Orders _
Select p.Freight).Max
Me.lblResult.Text = q
And in C# LINQ as follows:
var q = (from p in db.Orders
select p.Freight).Max();
this.lblResult.Text = q.ToString();
Similarly, all of the following (in VB.NET LINQ) are valid:
One of the most frequently used functions is COUNT. It comes in different flavors. The following three have different meanings:
SELECT COUNT(*) FROM Orders
SELECT COUNT(ShipRegion) FROM Orders
SELECT COUNT(DISTINCT ShipRegion) FROM Orders
The first returns the total number of rows available in the Orders table. The second returns the number of non-null values in the “ShipRegion” column of the Orders table. The third returns the number of values in the “ShipRegion” column without counting the repeated and null values.
From the point of view of LINQ, it always counts objects (and not values) in the collection. So, both of the following would return the same count values (i.e., total number of objects in Orders).
Dim q = (From p In db.Orders _
Select p).Count
Dim q = (From p In db.Orders _
Select p.ShipRegion).Count
In both of the above cases, the SQL SELECT which gets generated from the above LINQ would be same (with COUNT(*)). There exists no direct way in LINQ to achieve the above type of SQL SELECT query. But we can do it in an indirect manner.
I would like to rewrite the following query:
SELECT COUNT(ShipRegion) FROM Orders
in the following manner:
SELECT COUNT(*) FROM Orders WHERE ShipRegion IS NOT NULL
The same can be represented in LINQ (VB.NET) as follows:
Dim q = (From p In db.Orders _
Where p.ShipRegion IsNot Nothing _
Select p.ShipRegion).Count
You can observe that I am explicitly filtering nulls in the above LINQ query. The same can be achieved in C# LINQ as follows:
var q = (from p in db.Orders
where p.ShipRegion != null
select p.ShipRegion).Count();
And now, the following SQL SELECT:
SELECT COUNT(DISTINCT ShipRegion) FROM Orders
can be written in VB.NET LINQ as follows:
Dim q = (From p In db.Orders _
Where p.ShipRegion IsNot Nothing _
Select p.ShipRegion).Distinct.Count
And of course, in C# LINQ as follows:
var q = (from p in db.Orders
where p.ShipRegion != null
select p.ShipRegion).Distinct().Count();
Note that during the execution of above LINQ queries, the SQL SELECT which gets generated would be entirely different from what I have posted above. But, finally, the result would be same.
Now, let us consider the following SQL SELECT query:
SELECT CustomerID, SUM(Freight)
FROM Orders
GROUP BY CustomerID
HAVING SUM(Freight) > 200
The above does the aggregation first and then filters the output, applying the HAVING condition. The same can be achieved in VB.NET LINQ as follows:
Dim q = From p In db.Orders _
Group p By p.CustomerID _
Into FreightSum = Sum(p.Freight) _
Where FreightSum > 200 _
Select _
CustomerID, _
FreightSum
And in C# LINQ as follows:
var q = from p in db.Orders
group p.Freight by p.CustomerID
into g
where g.Sum() > 200
select new { CustomerID = g.Key, FreightSum = g.Sum() };
Further flavors for the same in VB.NET LINQ are as follows:
Dim q = From p In db.Orders _
Group p By p.CustomerID _
Into Group _
Where Group.Sum(Function(p) p.Freight) > 200 _
Select _
CustomerID, _
FreightSum = Group.Sum(Function(p) p.Freight)
Dim q = From p In db.Orders _
Group By p.CustomerID _
Into FreightSum = Sum(p.Freight) _
Where FreightSum > 200
Dim q = From p In db.Orders _
Group FreightValue = p.Freight By p.CustomerID _
Into FreightSum = Sum(FreightValue) _
Where FreightSum > 200
In this article, I gave many examples in VB.NET than on C#. Several of them can be written in C# syntax also. If you have any doubts, please post in the discussion area.
We will go further in-depth in my upcoming articles. I hope you enjoyed the article and any suggestions, bugs, errors, enhancements etc. are highly appreciated at http://jagchat.spaces.live.com