LINQ Tutorail Part4
Posted On Sunday, July 19, 2009 at at 5:03 PM by test
This is the forth of a series of articles exploring new features in C# 3. The first three have covered:
Imagine we have a list of orders. For this example, we will imagine they are stored in memory, but they could be in a file on disk too. We want to get a list of the costs of all orders that were placed by the customer identified by the number 84. If we set about implementing this in C# before version 3 and a range of other popular languages, we would probably write something like (assuming C# syntax for familiarity):
If we had the orders stored in a table in a database and we used SQL to query it, we would write something like:
Linq brings declarative programming features into imperative languages. It is not language specific, and has been implemented in the Orcas version of VB.Net amongst other languages. In this series we are focusing on C# 3.0, but the principles will carry over to other languages.
We then run across a new C# 3.0 keyword: "from".
Following this is another new keyword: "where".
The final new keyword is "select"., though.
You may be thinking at this point, "hey, this looks like SQL but kind of backwards and twisted about a bit". That is a pretty good summary. I suspect many who have written a lot of SQL will find the "select comes last" a little grating at first; the other important thing to remember is that all of the conditions are to be expressed in C# syntax, not SQL syntax. That means "==" for equality testing, rather than "=" in SQL. Thankfully, in most cases that mistake will lead to a compile time error anyway.
While I said earlier that the ordering condition is based on fields in the objects involved in the query, it actually doesn't have to be. Here's a way to get the results in a random order.
When the query is evaluated, an object in the Customers collection is located to match each object in the Orders collection. Note that if there were many customers with the same ID, there may be more than one matching Customer object per Order object. In this case, we get extra results. For example, change Vladimir to also have an OrderID of 84. The output of the program would then be:
Of course, knowing about something and doing it yourself are two entirely different things; if you haven't already done so, grab yourself the Visual Studio 2008 trial or the free Express Edition. Only then will you become comfortable with the new features and be able to use them effectively in your own development. Happy hacking, and have fun!
- Implicitly typed variables and arrays
- Extension methods and lambda expressions
- Object and collection initializers and anonymous types
Introducing Linq
Linq is short for Language Integrated Query. If you are used to using SQL to query databases, you are going to have something of a head start with Linq, since they have many ideas in common. Before we dig into Linq itself, let's step back and look at what makes SQL different from C#.Imagine we have a list of orders. For this example, we will imagine they are stored in memory, but they could be in a file on disk too. We want to get a list of the costs of all orders that were placed by the customer identified by the number 84. If we set about implementing this in C# before version 3 and a range of other popular languages, we would probably write something like (assuming C# syntax for familiarity):
ListHere we are describing how to achieve the result we want by breaking the task into a series of instructions. This approach, which is very familiar to us, is called imperative programming. It relies on us to pick a good algorithm and not make any mistakes in the implementation of it; for more complex tasks, the algorithm is more complex and our chances of implementing it correctly decrease.Found = new List ();
foreach (Order o in Orders)
if (o.CustomerID == 84)
Found.Add(o.Cost);
If we had the orders stored in a table in a database and we used SQL to query it, we would write something like:
SELECT Cost FROM Orders WHERE CustomerID = 84Here we have not specified an algorithm, or how to get the data. We have just declared what we want and left the computer to work out how to do it. This is known as declarative or logic programming.
Linq brings declarative programming features into imperative languages. It is not language specific, and has been implemented in the Orcas version of VB.Net amongst other languages. In this series we are focusing on C# 3.0, but the principles will carry over to other languages.
Understanding A Simple Linq Query
Let's jump straight into a code example. First, we'll create an Order class, then make a few instances of it in a List as our test data. With that done, we'll use Linq to get the costs of all orders for customer 84.class OrderThe output of running this program is:
{
private int _OrderID;
private int _CustomerID;
private double _Cost;
public int OrderID
{
get { return _OrderID; }
set { _OrderID = value; }
}
public int CustomerID
{
get { return _CustomerID; }
set { _CustomerID = value; }
}
public double Cost
{
get { return _Cost; }
set { _Cost = value; }
}
}
class Program
{
static void Main(string[] args)
{
// Set up some test orders.
var Orders = new List{
new Order {
OrderID = 1,
CustomerID = 84,
Cost = 159.12
},
new Order {
OrderID = 2,
CustomerID = 7,
Cost = 18.50
},
new Order {
OrderID = 3,
CustomerID = 84,
Cost = 2.89
}
};
// Linq query.
var Found = from o in Orders
where o.CustomerID == 84
select o.Cost;
// Display results.
foreach (var Result in Found)
Console.WriteLine("Cost: " + Result.ToString());
}
}
Cost: 159.12Let's walk through the Main method. First, we use collection and object initializers to create a list of Order objects that we can run our query over. Next comes the query - the new bit. We declare the variable Found and request that its type be inferred for us by using the "var" keyword.
Cost: 2.89
We then run across a new C# 3.0 keyword: "from".
from o in OrdersThis is the keyword that always starts a query. You can read it a little bit like a "foreach": it takes a collection of some kind after the "in" keyword and makes what is to the left of the "in" keyword refer to a single element of the collection. Unlike "foreach", we do not have to write a type.
Following this is another new keyword: "where".
where o.CustomerID == 84This introduces a filter, allowing us to pick only some of the objects from the Orders collection. The "from" made the identifier "o" refer to a single item from the collection, and we write the condition in terms of this. If you type this query into the IDE yourself, you will notice that it has worked out that "o" is an Order and intellisense works as expected.
The final new keyword is "select".
select o.CostThis comes at the end of the query and is a little like a "return" statement: it states what we want to appear in the collection holding the results of the query. As well as primitive types (such as int), you can instantiate any object you like here. In this case, we will end up with Found being a List
You may be thinking at this point, "hey, this looks like SQL but kind of backwards and twisted about a bit". That is a pretty good summary. I suspect many who have written a lot of SQL will find the "select comes last" a little grating at first; the other important thing to remember is that all of the conditions are to be expressed in C# syntax, not SQL syntax. That means "==" for equality testing, rather than "=" in SQL. Thankfully, in most cases that mistake will lead to a compile time error anyway.
A Few More Simple Queries
We may wish our query to return not only the Cost, but also the OrderID for each result that it finds. To do this we take advantage of anonymous types.var Found = from o in OrdersHere we have defined an anonymous type that holds an OrderID and a Cost. This is where we start to see the power and flexibility that they offer; without them we would need to write custom classes for every possible set of results we wanted. Remembering the projection syntax, we can shorten this to:
where o.CustomerID == 84
select new { OrderID = o.OrderID, Cost = o.Cost };
var Found = from o in OrdersAnd obtain the same result. Note that you can perform whatever computation you wish inside the anonymous type initializer. For example, we may wish to return the Cost of the order with an additional sales tax of 10% added on to it.
where o.CustomerID == 84
select new { o.OrderID, o.Cost };
var Found = from o in OrdersConditions can be more complex too, and are built up in the usual C# way, just as you would do in an "if" statement. Here we apply an extra condition that we only want to see orders valued over a hundred pounds.
where o.CustomerID == 84
select new {
o.OrderID,
o.Cost,
CostWithTax = o.Cost * 1.1
};
var Found = from o in Orders
where o.CustomerID == 84 && o.Cost > 100
select new {
o.OrderID,
o.Cost,
CostWithTax = o.Cost * 1.1
};
Ordering
It is possible to sort the results based upon a field or the result of a computation involving one or more fields. This is achieved by using the new "orderby" keyword.var Found = from o in OrdersAfter the "orderby" keyword, we write the expression that the objects will be sorted on. In this case, it is a single field. Notice this is different from SQL, where there are two words: "ORDER BY". I have added the keyword "ascending" at the end, though this is actually the default. The result is that we now get the orders in order of increasing cost, cheapest to most expensive. To get most expensive first, we would have used the "descending" keyword.
where o.CustomerID == 84
orderby o.Cost ascending
select new { o.OrderID, o.Cost };
While I said earlier that the ordering condition is based on fields in the objects involved in the query, it actually doesn't have to be. Here's a way to get the results in a random order.
Random R = new Random();
var Found = from o in Orders
where o.CustomerID == 84
orderby R.Next()
select new { OrderID = o.OrderID, Cost = o.Cost };
Joins
So far we have just had one type of objects to run our query over. However, real life is usually more complex than this. For this example, let's introduce another class named Customer.class CustomerIn the Main method, we will also instantiate a handful of Customer objects and place them in a list.
{
private int _CustomerID;
private string _Name;
private string _Email;
public int CustomerID
{
get { return _CustomerID; }
set { _CustomerID = value; }
}
public string Name
{
get { return _Name; }
set { _Name = value; }
}
public string Email
{
get { return _Email; }
set { _Email = value; }
}
}
var Customers = new ListWe would like to produce a list featuring all orders, stating the ID and cost of the order along with the name of the customer. To do this we need to involve both the List of orders and the List of customers in our query. This is achieved using the "join" keyword. Let's replace our query and output code with the following.{
new Customer {
CustomerID = 7,
Name = "Emma",
Email = "emz0r@worreva.com"
},
new Customer {
CustomerID = 84,
Name = "Pedro",
Email = "pedro@cerveza.es"
},
new Customer {
CustomerID = 102,
Name = "Vladimir",
Email = "vladimir@pivo.ru"
}
};
// Query.The output of running this program is:
var Found = from o in Orders
join c in Customers on o.CustomerID equals c.CustomerID
select new { c.Name, o.OrderID, o.Cost };
// Display results.
foreach (var Result in Found)
Console.WriteLine(Result.Name + " spent " +
Result.Cost.ToString() + " in order " +
Result.OrderID.ToString());
Pedro spent 159.12 in order 1We use the "join" keyword to indicate that we want to refer to another collection in our query. We then once again use the "in" keyword to declare an identifier that will refer to a single item in the collection; in this case it has been named "c". Finally, we need to specify how the two collections are related. This is achieved using the "on ... equals ..." syntax, where we name a field from each of the collections. In this case, we have stated that the CustomerID of an Order maps to the CustomerID of a Customer.
Emma spent 18.5 in order 2
Pedro spent 2.89 in order 3
When the query is evaluated, an object in the Customers collection is located to match each object in the Orders collection. Note that if there were many customers with the same ID, there may be more than one matching Customer object per Order object. In this case, we get extra results. For example, change Vladimir to also have an OrderID of 84. The output of the program would then be:
Pedro spent 159.12 in order 1Notice that Vladimir never featured in the results before, since he had not ordered anything.
Vladimir spent 159.12 in order 1
Emma spent 18.5 in order 2
Pedro spent 2.89 in order 3
Vladimir spent 2.89 in order 3
Getting All Permutations With Multiple "from"s
It is possible to write a query that gets every combination of the objects from two collections. This is achieved by using the "from" keyword multiple times.var Found = from o in OrdersEarlier I suggested that you could think of "from" as being a little bit like a "foreach". You can also think of multiple uses of "from" a bit like nested "foreach" loops; we are going to get every possible combination of the objects from the two collections. Therefore, the output will be:
from c in Customers
select new { c.Name, o.OrderID, o.Cost };
Emma spent 159.12 in order 1Which is not especially useful. You may have spotted that you could have used "where" in conjunction with the two "from"s to get the same result as the join:
Pedro spent 159.12 in order 1
Vladimir spent 159.12 in order 1
Emma spent 18.5 in order 2
Pedro spent 18.5 in order 2
Vladimir spent 18.5 in order 2
Emma spent 2.89 in order 3
Pedro spent 2.89 in order 3
Vladimir spent 2.89 in order 3
var Found = from o in OrdersHowever, don't do this, since it computes all of the possible combinations before the "where" clause, which goes on to throw most of them away. This is a waste of memory and computation. A join, on the other hand, never produces them in the first place.
from c in Customers
where o.CustomerID == c.CustomerID
select new { c.Name, o.OrderID, o.Cost };
Grouping
Another operations that you may wish to perform is categorizing objects that have the same value in a given field. For example, we might want to categorize orders by CustomerID. The result we expect back is a list of groups, where each group has a key (in this case, the CustomerID) and a list of matching objects. Here's the code to do the query and output the results.// Group orders by customer.The output that it produces is as follows:
var OrdersByCustomer = from o in Orders
group o by o.CustomerID;
// Iterate over the groups.
foreach (var Cust in OrdersByCustomer)
{
// About the customer...
Console.WriteLine("Customer with ID " + Cust.Key.ToString() +
" ordered " + Cust.Count().ToString() + " items.");
// And what they ordered.
foreach (var Item in Cust)
Console.WriteLine(" ID: " + Item.OrderID.ToString() +
" Cost: " + Item.Cost.ToString());
}
Customer with ID 84 ordered 2 items.This query looks somewhat different to the others that we have seen so far in that it does not end with a "select". The first line is the same as we're used to. The second introduces the new "group" and "by" keywords. After the "by" we name the field that we are going to group the objects by. Before the "by" we put what we would like to see in the resulting per-group collections. In this case, we write "o" so as to get the entire object. If we had only been interested in the Cost field, however, we could have written:
ID: 1 Cost: 159.12
ID: 3 Cost: 2.89
Customer with ID 7 ordered 1 items.
ID: 2 Cost: 18.5
// Group orders by customer.Which produces the output:
var OrdersByCustomer = from o in Orders
group o.Cost by o.CustomerID;
// Iterate over the groups.
foreach (var Cust in OrdersByCustomer)
{
// About the customer...
Console.WriteLine("Customer with ID " + Cust.Key.ToString() +
" ordered " + Cust.Count().ToString() + " items.");
// And the costs of what they ordered.
foreach (var Cost in Cust)
Console.WriteLine(" Cost: " + Cost.ToString());
}
Customer with ID 84 ordered 2 items.You are not restricted to just a single field or the object itself; you could, for example, instantiate an anonymous type there instead.
Cost: 159.12
Cost: 2.89
Customer with ID 7 ordered 1 items.
Cost: 18.5
Query Continuations
At this point you might be wondering if you can follow a "group ... by ..." with a "select". The answer is yes, but not directly. Both "group ... by ..." and "select" are special in so far as they produce a result. You must terminate a Linq query with one or the other. If you try to do something like:var CheapOrders = from o in Orders
where o.Cost <>Then it will lead to a compilation error. Since both "select" and "group ... by ..." terminate a query, you need a way of taking the results and using them as the input to another query. This is called a query continuation, and the keyword for this is "into".
In the following example we take the result of grouping orders by customer and then use a select to return an anonymous type containing the CustomerID and the number of orders that the customer has placed.var OrderCounts = from o in OrdersNotice the identifier "g", which we introduce after the keyword "into". This identifier represents an item in the collection containing the results of the previous query. We use in the select statement. Remember that each element of the collection we are querying in this second query is actually a collection itself, since this is what "group ... by ..." produces. Therefore, we can call Count() on it to get the number of elements, which is the number of orders per customer. We grouped by the CustomerID field, so that is our Key.
group o by o.CustomerID into g
select new {
CustomerID = g.Key,
TotalOrders = g.Count()
};
Query continuations can be used to chain together as many selection and grouping queries as you need in whatever order you need.Under The Hood
Now we have looked at the practicalities of using Linq, I am going to spend a little time taking a look at how it works. Don't worry if you don't understand everything in this section, it's here for those who like to dig a little deeper.
Throughout the series I have talked about how all of the language features introduced in C# 3.0 somehow help to make Linq possible. While anonymous types have shown up pretty explicitly and you can see from the lack of type annotations we have been writing that there is some type inference going on, where are the extension methods and lambda expressions?
There's a principle in language design and implementation called "syntactic sugar". We use this to describe cases where certain syntax isn't directly compiled, but is first transformed into some other more primitive syntax and then passed to the compiler. This is exactly what happens with Linq: your queries are transformed into a sequence of method calls and lambda expressions.
The C# 3.0 specification goes into great detail about these transformations. In practice, you probably don't need to know about this, but let's look at one example to help us understand what is going on. Our simple query from earlier:var Found = from o in OrdersAfter transformation by the compiler, becomes:
where o.CustomerID == 84
select o.Cost;var Found = Orders.Where(o => o.CustomerID == 84)And this is what actually gets compiled. Here the use of lambda expressions becomes clear. The lambda passed to the Where method is called on each element of Orders to determine whether it should be in the result or not. This produces another intermediate collection, which we then call the Select method on. This calls the lambda it is passed on each object and builds up a final collection of the results, which is then assigned to Found. Beautiful, huh?
.Select(o => o.Cost);
Finally, a note on extension methods. Both Where and Select, along with a range of other methods, have been implemented as extension methods. The type they use for "this" is IEnumerable, meaning that any collection that implements that interface can be used with Linq. Without extension methods, it would not have been possible to achieve this level of code re-use.DLinq and XLinq
In this article I have demonstrated Linq working with objects instantiated from classes that we implemented ourselves and stored in built-in collection classes that implement IEnumerable. However, the query syntax compiles down to calls on extension methods. This means that it is possible to write alternative implementations of Linq that follow the same syntax but perform different operations.
Two examples of this, which will ship with C# 3.0, are DLinq and XLinq. DLinq enables the same language integrated query syntax to do queries on databases by translating the Linq into SQL. XLinq enables queries on XML documents.Conclusion
Linq brings declarative programming to the C# language and will refine and unify the way that we work with objects, databases, XML and whatever anyone else writes the appropriate extension methods for. It builds upon the language features that we have already seen in the previous parts of the series, but hiding some of them away under syntactic sugar. While the query language has a range of differences to SQL, there are enough similarities to make knowledge of SQL useful to those who know it. However, its utility is far beyond providing yet another way to work with databases.Closing Thoughts On C# 3.0
This brings us to the end of this four part series on C# 3.0. Here is a quick recap on all that we have seen.
- Type inference removes much of the tedium of writing out type annotations again and again.
- Lambda expressions make higher order programming syntactically light.
- Extension methods provide another path to better code re-use when correctly applied.
- Object and collection initializers along with anonymous types make building up large data structures much less effort.
- Linq gives us declarative programming abilities over objects, databases and XML documents.
Of course, knowing about something and doing it yourself are two entirely different things; if you haven't already done so, grab yourself the Visual Studio 2008 trial or the free Express Edition. Only then will you become comfortable with the new features and be able to use them effectively in your own development. Happy hacking, and have fun!