If you're looking for a way to simplify data manipulation in .NET, you may want to check out what LINQ has to offer. It adds some very useful capabilities to .NET. Keep reading for a full overview. This article is the first of two parts.
Programming involves data manipulation; it involves pulling different pieces of information from different sources. One might pull headlines from an RSS feed, query a relational database, or loop over a collection of objects, extracting elements that meet a given condition. Each one of these tasks involves somehow querying a given data source and then turning that data into something useful, but the exact method involved can vary widely. The method may be simple, but, then again, the method may also be more involved, as in querying a database. Still, the overall goal is the same, so why not unify the means to that end?
This is what LINQ, which stands for Language Integrated Query, aims to do. LINQ adds query syntax (similar to SQL, but native) and capabilities to .NET. Using LINQ, one can query a variety of data sources in a unified, easy way. Moreover LINQ contributes to readability, since someone browsing code can easily identify a query and determine what it does. This article will provide an overview of LINQ, its structure and its capabilities, using both C# and VB.NET.
Query Syntax
LINQ adds queries right into the language through new syntax. The syntax is similar to SQL, so it should be familiar to most developers. A query provides the developer with a concise way to say (at the risk of making LINQ seem simplistic), select all the elements where a certain condition is met.
For example, let's say that we have a class, Person, that represents, well, a person, and that for each person, we need to know a name, an age and a phone number. In C#, we can represent this as follows:
PublicSubNew(ByVal name AsString, ByVal age AsInteger, ByVal phone AsString)
_name = name
_age = age
_phone = phone
EndSub
PublicReadOnlyProperty Name() AsString
Get
Return _name
EndGet
EndProperty
PublicReadOnlyProperty Age() AsInteger
Get
Return _age
EndGet
EndProperty
PublicReadOnlyProperty Phone() AsString
Get
Return _phone
EndGet
EndProperty
EndClass
Now, let's say that we have various Person objects, all arranged in an array:
Person bob = newPerson("Bob", 35, "555-292-3044");
Person henry = newPerson("Henry", 43, "555-292-5312");
Person joe = newPerson("Joe", 22, "555-232-7222");
Person chuck = newPerson("Chuck", 29, "555-292-1134");
Person[] people = { bob, henry, joe, chuck };
Dim bob As Person = New Person("Bob", 35, "555-292-3044")
Dim henry As Person = New Person("Henry", 43, "555-292-5312")
Dim joe As Person = New Person("Joe", 22, "555-232-7222")
Dim chuck As Person = New Person("Chuck", 29, "555-292-1134")
Dim people() As Person = {bob, henry, joe, chuck}
Suppose we wanted to extract a subset of our array. For example, we can create a collection containing only people over the age of thirty. To do this, we could loop over the array, checking each element's Age property. This approach, however, is made unnecessary by LINQ and, furthermore, is ugly by comparison. So, let's get started with LINQ. Here is how to query the array, obtaining all elements whose Age property is more than thirty:
IEnumerable<Person> overThirty = from p in people
where p.Age > 30
select p;
Dim overThirty As IEnumerable(Of Person) = From p In people _
Where p.Age > 30 _
Select p
Above, we simply look through each Person object, represented as p, in the array and select, or pull out, each element whose Age property is over thirty. To do this, we use the From, Where and Select operators. From identifies where we're looking, Where sets the conditions we're testing, and Select pulls out the results in the form we want (here, we just pull out the results as Person objects).
The resulting code using LINQ is, obviously, much more concise and elegant than the alternative, and it's a lot more readable as well. From a casual glance, one can tell exactly what's being done.
Note that the query syntax is just shorthand for the following:
IEnumerable<Person> overThirty = people
.Where(p => p.Age > 30)
.Select(p => p);
Dim overThirty As IEnumerable(Of Person) = people _
.Where(Function(p) p.Age > 30) _
.Select(Function(p) p)
Above, the operators are translated into methods. For each operator, we need a function that will perform the relevant task. For example, with Where, we need a function that will determine if p.Age is more than thirty. Lambda expressions are used to provide functions here. They are similar to anonymous functions and were added to support LINQ.
LINQ is where implicitly typed variables shine. Notice how we stored the results in an IEnumerable<Person>. This, of course, works fine, but we're able to shorten it a bit:
var overThirty = from p in people
where p.Age > 30
select p;
Dim overThirty = From p In people _
Where p.Age > 30 _
Select p
Notice how we return the results as Person objects. This may be appropriate in our example, but consider the case of a relational database or an XML file. We need to represent each piece of data as an object, and we also may not need all of the fields. While in the above example we already had an obvious type handy to represent our data, we could also have used an anonymous type to represent our data.
Anonymous types are used like this:
var mike = new { Name = "Mike", Age = 51, Phone = "555-232-2341" };
Dim mike = NewWith {.Name = "Mike", .Age = 51, .Phone = "555-232-2341"}
Anonymous types make sense within the context of LINQ. The following code yields the same results (or, rather, approximately the same, as we'll soon see) as the previous code, but it uses an anonymous type:
var overThirty = from p in people
where p.Age > 30
selectnew { Name = p.Name, Age = p.Age, Phone = p.Phone };
Now, instead of getting a collection of Person objects as a result, we get a collection of objects of an anonymous type. These have to be treated a bit differently. For example, suppose we wanted to loop over the results and print them out. Before, we could do something like this:
foreach (Person p in overThirty)
{
Console.WriteLine("Name: {0}", p.Name);
Console.WriteLine("Age: {0}", p.Age);
Console.WriteLine("Phone: {0}", p.Phone);
}
ForEach p As Person In overThirty
Console.WriteLine("Name: {0}", p.Name)
Console.WriteLine("Age: {0}", p.Age)
Console.WriteLine("Phone: {0}", p.Phone)
Next
Using the anonymous type, though, we can't treat each element as a Person object. We can't match it to a specific type, but that's okay because we know the type's properties and can access them in the same way:
foreach (var p in overThirty)
{
Console.WriteLine("Name: {0}", p.Name);
Console.WriteLine("Age: {0}", p.Age);
Console.WriteLine("Phone: {0}", p.Phone);
}
ForEach p In overThirty
Console.WriteLine("Name: {0}", p.Name)
Console.WriteLine("Age: {0}", p.Age)
Console.WriteLine("Phone: {0}", p.Phone)
Next
The only difference is that in C#, Person has been replaced with var, and in VB.NET, the type declaration has been taken out entirely.
In the above example, we chose to make our anonymous type's properties mirror those of the real type, Person. However, we could have made them different. For example, if we only need the name and number, then we're free to leave the Age property out completely, or if we want to add a new property, such as the year in which the person was born, then we're free to add that and then compute the value of it. Anonymous types are especially useful when the data source provides numerous fields, of which you only need to use a few.
The From, Where and Select operators aren't the only ones provided by LINQ. Other useful operators are available, some of which we'll take a look at now. First, we'll look at the orderby operator which, as its name suggests, orders the results by a certain field. For example, say we want a list of everyone in our array with a phone number that starts with 555-292. Additionally, say we want this list to be sorted alphabetically by the person's first name. Sorting can be done simply by providing the orderby operator with a field by which to sort:
var number292 = from p in people
where p.Phone.StartsWith("555-292")
orderby p.Name
select p;
Dim number292 = From p In people _
Where p.Phone.StartsWith("555-292") _
OrderBy p.Name _
Select p
Be sure to take note of the space in the Visual Basic version of the operator.
We can also order the names in reverse alphabetical order (that is, in descending order) with the addition of a single word:
var reverse292 = from p in people
where p.Phone.StartsWith("555-292")
orderby p.Name descending
select p;
Dim reverse292 = From p In people _
Where p.Phone.StartsWith("555-292") _
OrderBy p.Name Descending _
Select p
Similar to the OrderBy operator is the GroupBy operator. Say we want to break our results into groups: those with 555-292 numbers, and those with 555-232 numbers (actually, this latter group contains only one person). Additionally, we want to alphabetize each group. To do all of this, we use both the OrderBy operator and the GroupBy operator. Let's query the array and then display the results by group. We'll start with C#:
var numberGrouped = from p in people
orderby p.Name
group p by p.Phone.Substring(0, 7) into g
select g;
foreach (var g in numberGrouped)
{
Console.WriteLine("{0} Numbers:", g.Key);
foreach (var p in g)
{
Console.WriteLine(p.Name);
}
}
The code is fairly straightforward. We specify what is to be grouped – p – and what it is to be grouped by – the first seven characters of the phone number. We also assign our group a variable name, g. Then, we loop through each group, printing the key (in this case, the first part of the phone number) and then the members of the group.
Now let's move on to Visual Basic:
Dim numberGrouped = From p In people _
OrderBy p.Name _
Group p By Phone = p.Phone.Substring(0, 7) IntoGroup _
SelectNewWith {Phone, Group}
ForEach g In numberGrouped
Console.WriteLine("{0} Numbers:", g.Phone)
ForEach p In g.Group
Console.WriteLine(p.Name)
Next
Next
The code is about the same, but a few very important differences are present. First, we have to explicitly assign a variable name to the key. Second, we don't give the group a variable name. Though it is possible (g = Group), the resulting code is longer. Third, we don't just select the group. If we do, then we won't be provided with a Key property as in C#. Instead, we work with an anonymous type. So, as a result, the loop is different. The output, however, is identical to the output in the C# version.