LINQ Tutorail Part2

Your Ad Here

Introducing C# 3 – Part 2

This is the second of a series of articles exploring new features in C# 3. You should read the first one if you have not already done so. In this article we will cover extension methods, a powerful new language feature to increase abstraction and code re-use, and lambda expressions, which take advantage of type inference to deliver a much cleaner syntax for anonymous methods.

Extension Methods

In the beginning – that is, in C# 1.0 – the methods belonging to a class were defined within the body of the class. C# 2.0 changed that somewhat. By introducing partial classes, the methods making up a class could be defined in more than one place. However, they were all collected together at compile time, so it was nothing particularly new. Anonymous methods were something new, however. Conceptually, anonymous methods are associated with no class at all. C# 3.0 continues the journey with extension methods.

Extension methods are associated with a class (either a specific one or they are generic and can work for any class). What makes them different from normal instance methods is that they are not defined within the class itself. Instead, they are defined in some other static class. This all sounds rather strange, so let’s take an example.

Imagine that we had got some third party class library for computing a serial number for a product from a product ID and some user details. The method may take the arguments ProductID, CustomerName, CustomerDateOfBirth, and CustomerCountry. In the class library, its implementation may look something like this:
public sealed class SerialGenerator
{
public long MakeCode(int ProductID, string CustomerName,
DateTime CustomerDOB,
string CustomerCountry)
{
return (ProductID % 42) + CustomerName.GetHashCode() *
(CustomerDOB.Ticks - CustomerCountry.GetHashCode());
}
}
In our application, however, we pass around user data in a structure.
struct UserInfo
{
public int UserID;
public string Name;
public DateTime DOB;
public string CountryCode;
}
What we’d really like is to be able to add a wrapper method to SerialGenerator that takes this structure as a parameter. However, we can’t modify the class since it’s a third party component, and they sealed it so we can’t inherit from it. Sealed classes show up more than you might imagine; the Int and String types, for example, cannot be inherited from.

Extension methods aim to solve these issues. Before we dig into exactly what they are, I’d like to introduce a couple of different ways of thinking about things. First of all, consider the special variable "this", which magically exists in all instance methods. Where does its value come from? Well, look at the syntax for calling a method.
Obj.Method();
A v-table is a table of methods belonging to a class. It contains overridden methods, but in the same position in the table as where the first class to define them placed them. This enables subclassing and polymorphism to work as expected.
Obj is used in two ways. The first is that it refers to the v-table. This can be used to look up the method to call (in the case of method overriding). The second use is that it is passed as the first parameter to the method. This parameter is taken and becomes the "this" variable – this is what the CLR does under the hood. So essentially, all of your instance methods have an implicit first parameter that the C# compiler inserts for you and provides access to through the "this" variable.

Second, there is an alternative way of thinking about object orientation: rather than having a class-based system, you can use multi-methods. Multi-methods are very similar to method overloading in C# - you can have methods with the same name but different signatures, and the right one is invoked depending on the parameters that are used in the call. Now imagine that the implicit first parameter that we just discussed is not implicit, but instead you place it at the start of every method's parameter list. The type of that first parameter can be a class name. When deciding which method to call, the full signature (including the type of the invocant – the first parameter) is taken into account. That means that you can write methods for a particular class anywhere you like in your program, or extend existing classes with your own methods.

Sometimes the object a method is being called on is referred to as the invocant. For example, in obj.method(), obj is the invocant.
If you can get your head around those ideas, then you will find extension methods fairly straightforward. An extension method is a static method, but with the first parameter it takes being explicitly marked as receiving the object that the method was called on. Extension methods may only appear in static classes. Returning to our example, we can implement an extension method like this:
static class Extensions
{
public static long MakeCode(this SerialGenerator SG,
int ProductID, UserInfo User)
{
return SG.MakeCode(ProductID, User.Name, User.DOB,
User.CountryCode);
}
}
Notice the this modifier on the first parameter. This states that it is going to hold the object the method was called on, and therefore can be called using method invocation syntax. To try this out, create a console application with the code shown so far in this article and with the following Main method.
static void Main(string[] args)
{
SerialGenerator SG = new SerialGenerator();

UserInfo User = new UserInfo();
User.UserID = 453;
User.Name = "Fred";
User.DOB = DateTime.Now;
User.CountryCode = "UK";
long Code = SG.MakeCode(5181, User);

Console.WriteLine(Code);
}
Here, the call to MakeCode will call the extension method. We have been able to extend the sealed class.

Another example: Push and Pop for List

I love the generic List collection, but I miss a couple of methods that would be useful now and then. Being able to pretend that the List is a stack and have Push and Pop methods, for example, would lead to clearer code at times.

Now with extension methods we can add Push and Pop support to List.
static class StackOps
{
public static void Push(this List TheList, T Value)
{
TheList.Add(Value);
}
public static T Pop(this List TheList)
{
if (TheList.Count == 0)
throw new Exception("Nothing to pop.");

int LastPos = TheList.Count - 1;
T Result = TheList[LastPos];
TheList.RemoveAt(LastPos);

return Result;
}
}
Note that we have implemented it for a generic List, using the type variable T in the method. We can now use these new methods on any List; in the following example, we used them on a List.
static void Main(string[] args)
{
var Stack = new List();

// Put stuff onto the stack.
for (int i = 0; i <= 10; i++)
Stack.Push(i);

// And now pop stuff off it.
while (Stack.Count > 0)
Console.WriteLine(Stack.Pop());
}
Here I have implemented Push and Pop just for lists, but I could instead have used IEnumerable in place of List when declaring the extension method. This means that every collection that implements this interface can have Push and Pop called on it. This brings up another very powerful feature of extension methods: they allow us to attach implementation to interfaces. We have never been able to do this before.

Thinking about extension methods

Now we've seen what extension methods are, let's spend a little time thinking about the issues they raise. One question that comes to mind is precedence. What if you have an extension method and the class itself implements a method of the same name? The short answer that works for most cases is that the instance method in the class will win. The real answer requires us to consider overloading.

When trying to locate the method to call, first the instance methods are checked. A method is considered a candidate if it has the correct name and a matching parameter list. By matching we mean that it has the same number of parameters as are being passed and the types of the parameters are compatible. If we find a candidate amongst the instance methods, the search ends. If not, the compiler will start looking for extension methods, starting in the innermost namespace and working its way outwards.

This will probably not result in any surprising behavior, but it's worth mentioning the one place it may just catch you. Suppose you have the classes Puppy and Dog, where Puppy is a subclass of Dog. In the Dog class you have an instance method Chase. If you were to write an extension method for the Puppy class called Chase, it would never be called. This is because the inherited method in the class is found as a candidate, so the extension method is never considered, even though in some ways it may be a "better" choice. The following code demonstrates this.
class Dog
{
public void Chase()
{
Console.WriteLine("Chase method of Dog called.");
}
}
class Puppy : Dog
{
}
static class PuppyThings
{
public static void Chase(this Puppy Pup)
{
Console.WriteLine("Chase extension method for Puppy called.");
}
}
class Program
{
static void Main(string[] args)
{
Dog d = new Dog();
Puppy p = new Puppy();
// This calls the method from class Dog, as we expect.
d.Chase();
// So does this, which we might not have expected.
p.Chase();
}
}
Output:
Chase method of Dog called.
Chase method of Dog called.
Another question that comes up is that of performance. Does the runtime have to locate the method to call at runtime, or is it worked out at compile time? The answer is that it is decided at compile time, so there is not any runtime dispatch overhead. In fact, the lookup doesn't even go through the v-table, so it may well be faster than some instance method calls. (For those who like to have the details, the .Net CLR provides an instruction, named "call", for calling a method without consulting the v-table, which the compiler can use instead of the standard one when it knows that it is safe to do so).

Perhaps the most important question, though, is when extension methods should be used. The C# designers themselves say that they are not something you want to be using all of the time, but rather in situations where instance methods are unable to provide what you need.

If you have a class and want to extend it with some special functionality that is specific to your usage of it, you now have two options.
  • Inherit from it and put the special functionality in the sub class
  • Write extension methods
The first option is preferable. Inheritance is well understood by other programmers, it is clear what method will be called and you won't have to repeatedly write the class name when writing the methods. So why the second option? I can think of a few cases.
  • If the class is sealed, you will be unable to inherit from it. In this case, you have no option but to use extension methods.
  • You may be using some kind of object factory that instantiates objects of a given class, but you do not have the ability to modify it so that your subclass is instantiated instead. Therefore extension methods are the only way to extend the functionality of these objects.
  • You want to implement a method that can be invoked on all classes implementing a given interface; before, we had no way to attach a method implementation to an interface.
  • You may want to add related things to a number of other classes, and from a software engineering point of view it may be better to collect those together in one place rather than spreading them amongst many classes.
If you find yourself writing extension methods every day, then that's probably a bad sign. It's a myth that every language feature is intended to be used equally often, both in real and programming languages.

Lambda Expressions

The term "lambda expression" sounds somewhat frightening at first, but there’s no reason to be sheepish. In fact, the lambda calculus – a very simple language where everything is expressed in terms of functions – dates back to the day before we had computers, making it some of the earliest theoretical Computer Science work.

A lambda expression simply defines an anonymous function. A function is something that takes one or more parameters (just as a method does) and uses them in computing some value. That value becomes the return value for the function. In C# 3, the "=>" syntax is used to write a lambda expression. You place the parameters to the left of the arrow and the expression to compute to the right.

For example, here is a function that adds one to the value it is provided with:
x => x + 1
How does this work? Well, it takes one parameter x and then returns the result of doing "x + 1". You can write a Lambda expression that multiplies to numbers quite easily too:
(x, y) => x * y
Here we have taken two parameters, x and y, and the result is the multiplication of them. Note that if we have more than one parameter, we have to place them in parentheses. How about if you do not wish to take any parameters? In this case, you put an empty set of parentheses in place of the parameter.
() => new Beer()
The above function takes nothing and returns beer; this is rarely implemented in the real world. If you want to do something more complex, you can supply a block to the right of the arrow. In this case, you should write a return statement, unless you do not wish to return a value (which is allowable, though not the common case).
(x, y) => {
var result = x + y;
return result;
}


Using Lambda Expressions

At this point you could be forgiven for thinking, "well that's neat, but why?" C# 2.0 added support for anonymous methods. However, the syntax was rather verbose for a feature that, at least amongst some programmers, is used quite often. Let's look at a couple of examples where anonymous methods were used before and see the improvement that we get by using lambda expressions instead.

In this first example, we will take a list of strings, sort them by length and then display the output. We use an anonymous method to give the comparisons.
// Some words.
var Words = new List { "amazingly", "my", "badger", "exploded" };
// Sort them by word length.
Words.Sort(delegate(string a, string b)
{
return a.Length.CompareTo(b.Length);
});
// Show results.
foreach (string Word in Words)
Console.Write(Word + " ");
This prints "my badger exploded amazingly" on the console. We can re-write the sort using a lambda expression.
// Sort them by word length.
Words.Sort((a, b) => a.Length.CompareTo(b.Length));
Which is a lot neater. For a second example, suppose we are rendering some forum markup tags to HTML. We are going to match a tag with a regex, check if the tag is in a list of allowed tags and, if it is, render it to HTML. Otherwise, we'll just leave it unrendered. Here is the original implementation.
// List of tags we accept.
var AcceptedTags = new List { "b", "u", "br" };
// Text to render.
var ToRender = "[b]Bold, [i]bold italic[/i], just bold again.[/b][br]";
// Regex to match tags.
var FindTags = new Regex(@"\[(/?)(\w+)\]");
// Render it.
string Output = FindTags.Replace(ToRender,
delegate (Match m) {
return AcceptedTags.Contains(m.Groups[2].Value) ?
"<" + m.Groups[1].Value + m.Groups[2].Value + ">" :
m.Value;
});
Here we are using an anonymous method to specify code to generate the replacement string. We can replace that with a lambda expression too.
string Output = FindTags.Replace(ToRender,
m => AcceptedTags.Contains(m.Groups[2].Value) ?
"<" + m.Groups[1].Value + m.Groups[2].Value + ">" :
m.Value);


Lambda Expressions And Type Inference

One difference you may have spotted between lambda expressions and the original anonymous method syntax is the absence of types on the parameters. You actually can write the types in if you wish:
(int x, int y) => x + y
Be aware that you need the parentheses for a single parameter if you're going to write a type annotation:
(int x) => x + 1
Even here, there is something more special going on, since nowhere have we declared the type of value that will be returned by the lambda expression. With anonymous methods we had to do that.

In the previous article, I talked about type inference. As a very quick recap, this involves working out the types of variables based on information available in the code rather than making the programmer write them in. This is exactly what is happening here. The interesting question, then, is where is the type information coming from this time?

When a method expects to be passed an anonymous method as a parameter, it uses a delegate type. This delegate type contains the types of the parameters. When a lambda expression is used, it is often being passed as a parameter. Therefore, the delegate type of the parameter will, in turn, enable to compiler to work out what the types of the lambda expression's parameters are.

If you're wide awake, you might be wondering what happens when you have a generic delegate type as a parameter of a method and pass a lambda expression there. And the answer is that yes, you can do this, generic types will be inferred and it should all work out just fine. In fact, some of what LINQ does depends on it working.

Conclusion

In this article we've seen extension methods and lambda expressions. I've taken the time to dive into some of the ugly details, but don't worry if some of them haven't sunk in just yet.

Extension methods offer some powerful new possibilities, but we need to take care in how we use them from a software engineering angle. Don't expect to be using them every day, but remember them for those times when they really are the right thing to use.

Lambda expressions, on the other hand, are for regular use. Even if you aren't doing much higher order programming today, if you plan on using LINQ you soon will be. The biggest hurdle most people have to get over is realizing that it is possible (conceptually, at least) to treat code the same as data. Once you get comfortable with that idea, using anonymous methods or lambda expressions doesn't feel so unusual. Practice and experience help. I'd recommend trying to learn a functional programming language, but if you're reading this you're probably wanting to get C# 3.0 cracked first.

In the next article in the series we'll look at object initializers and anonymous types. These make it easier to build up data structures and set initial values for fields in objects and structures. With that, we will have seen all of the language features that act as the building blocks for LINQ, which will be covered in the final part in the series.

Subscribe
Posted in Labels: , , kick it on DotNetKicks.com |

0 comments: