LINQ

LINQ (Language Integrated Query)

Three distinct steps
The datasource
The query

Sequence
The range variable

Query execution
LINQ query operations are strongly-typed
Type relationship rules in a LINQ query operation.
Query clauses and keywords

The from clause
The join clause
The group join
Simulating a left outer join
The group..by clause
The let clause

Standard query operators
Query execution time
Operations not available to query syntax

Aggregation operations
Data type convertion operations
Set operations
Quantifier operations
Generation operations
Element operations
Data partitioning operations
Miscellaneous operations

LINQ-ready datasources
LINQ Providers
LINQ to Objects
LINQ to DataSet
Overcoming the constraint of the CopyToDataTable() method
Sample database
LINQ to DataSet queries

The from, where, orderby, select clauses
The join clause
The group join
Simulating a left outer join
The group..by clause
The let clause

LINQ (Language Integrated Query)

For a programmer to fully understand and efficiently use LINQ queries, an experience is required in various areas such as the following:

SQL query syntax
Generics
Implicitly typed variables
Object and collection initializers
Extension methods
Lambda expressions
Anonymous types

LINQ is an innovative technology. It brings the power of SQL-like query syntax to application code.

Here is a first example

    int[] numbers = { 8, 5, 2, 9, 3, 0, 7, 1, 6, 4 };

    var q = from n in numbers
            where n < 5
            orderby n
            select n;

    string S = "";

    foreach (int num in q)
        S += num.ToString() + Environment.NewLine;

    MessageBox.Show(S);

The output of the above code is

A LINQ query starts with a from clause and ends with a select or group..by clause.

Three distinct steps

A LINQ query operation is performed in three steps.

1. define the datasource
2. construct the query
3. execute the query

Here is the above example again

    // 1. define the datasource
    int[] numbers = { 8, 5, 2, 9, 3, 0, 7, 1, 6, 4 };

    // 2. construct the query 
    var q = from n in numbers
            where n < 5
            orderby n
            select n;

    string S = "";

    // 3. execute the query 
    foreach (int num in q)
        S += num.ToString() + Environment.NewLine;

    MessageBox.Show(S);

The datasource

Any type which implements the generic IEnumerable<T>, or a derived interface, is called queryable type and can be used as the datasource of a LINQ query operation.

    public interface IEnumerable<T> : IEnumerable
    {
        IEnumerator<T> GetEnumerator();
    }

Some types, such as types which implement the non-generic IEnumerable interface, can be cast to IEnumerable<T> and thus can be used as datasources with LINQ query operations. The ArrayList and Array classes are IEnumerable objects.

    // 1. get the datasource
    int[] numbers = { 8, 5, 2, 9, 3, 0, 7, 1, 6, 4 };

The query

The query is used to obtain, filter, sort, group and transform data. The query is actually an expression, called LINQ query expression, written in a SQL-like syntax.

The query expression, or the query variable, returns no data at all. It just represents the query commands. The actual query execution takes place at a later time.

The query expression may be stored to a local variable. That variable is of type IEnumerable<T>. It is easier and safer to use an implicitly typed local variable by using the keyword var and let compiler infer the actual data type.

    // 2. construct the query 
    var q = from n in numbers
            where n < 5
            orderby n
            select n;

A LINQ query expression starts with the from clause. The from clause contains a reference to a datasource, which is a sequence, and a range variable which references an object of that sequence in any successive iteration.

Sequence

A sequence is an object which implements the IEnumerable<T> generic interface, or it can be somehow presented as an IEnumerable<T>. In the above example the numbers array is a sequence.

The range variable

The range variable of a query expression is declared in the from clause and it serves as a reference to each successive element of a sequence when the query is executed. Additional range variables may be declared in a let clause. In the above example the n variable is the range variable. The range variable can be implicitly typed for generic datasources and must be explicitly typed for non-generic datasources such as the ArrayList class.

    var q = from int n in numbers
            where n < 5
            orderby n
            select n;

Query execution

The execution of a LINQ query expression is done by using a foreach statement. The foreach statement is where the query results are retrieved from the datasource. A foreach statement requires an IEnumerable or an IEnumerable<T> object.

    // 3. execute the query 
    foreach (int num in q)
        S += num.ToString() + Environment.NewLine;

LINQ query operations are strongly-typed

Consider the following example

    // 1. datasource
    List<Country> countries = new List<Country>
    {
        new Country {Code = "ES", Name = "Spain"},
        new Country {Code = "CZ", Name = "Czech Republic"},
        new Country {Code = "GR", Name = "Greece"},
        new Country {Code = "DK", Name = "Denmark"},
        new Country {Code = "CY", Name = "Cyprus"}
    };


    // 2. query
    var q = from c in countries
            where c.Code.StartsWith("C")
            select c.Name;

    string S = "";


    // 3. execution
    foreach (var countryName in q)
        S += countryName + Environment.NewLine;

    MessageBox.Show(S);

Any variable which participates in any of the three steps of a LINQ query operation is strongly-typed even if its type is not explicitly defined by the programmer. The compiler infers the actual type.

The following is equal to the previously used query expression.

    IEnumerable<string> q = from c in countries
            where c.Code.StartsWith("C")
            select c.Name;

It is possible to even declare the type of the range variable, although in cases like the above is of a little value.

    var q = from Country c in countries
            where c.Code.StartsWith("C")
            select c.Name;

The following, also, is equal to the above query execution.

    foreach (string countryName in q)
        S += countryName + Environment.NewLine;

Type relationship rules in a LINQ query operation.

1. The element type of the datasource sequence determines the type of the range variable.

    List<Country> countries = new List<Country>
    {
        new Country {Code = "ES", Name = "Spain"},
        new Country {Code = "CZ", Name = "Czech Republic"},
        new Country {Code = "GR", Name = "Greece"},
        new Country {Code = "DK", Name = "Denmark"},
        new Country {Code = "CY", Name = "Cyprus"}
    };

In the above example the element type is Country.

2. The type returned by the select clause determines the type of the whole query expression and thus the type of the query variable.

    var q = from c in countries
            where c.Code.StartsWith("C")
            select c.Name;

In the above example the type returned by the select clause is string. So the type of the query variable and the type of the whole query expression is IEnumerable<string>, although it is declared as an implicitly typed variable.

3. The type of the query expression determines the type of the loop variable in the foreach statement.

    foreach (var countryName in q)
        S += countryName + Environment.NewLine;

In the above example the type of the loop variable is string, although it is declared as an implicitly typed variable.

As a direct consequence of the above rules any type error is caught at compile-time.

Query clauses and keywords

A query expression is written using special contextual keywords introduced in C# 3.0. The contextual keywords used in a LINQ query are: from, join, where, group..by, orderby, select, into, in, on, let, equals, ascending and descending.

The from, join, where, group..by, orderby, select, into and let keywords introduce a clause in the expression. The from clause is the first clause in a LINQ query. The last clause should be a group..by or select clause.

Apparently LINQ queries use the same clauses as a typical SQL query syntax. Additionaly there is a let clause.

The following data are used in the discussion that follows

    class Country
    {
        public string Code;
        public string Name;
        public List<string> Cities;
    }

    class Customer
    {
        public string Name;
        public string Country;
        public double Debit;
    }

    List<Country> countries = new List<Country>
    {
        new Country {Code = "ES", Name = "Spain", Cities = new List<string>{"Madrid", "Barcelona", "Valencia"} },
        new Country {Code = "CZ", Name = "Czech Republic", Cities = new List<string>{"Prague", "Brno"} },
        new Country {Code = "GR", Name = "Greece", Cities = new List<string>{"Athens", "Thessaloniki"} },
        new Country {Code = "DK", Name = "Denmark", Cities = new List<string>{"Copenhagen", "Aarhus"} }, 
        new Country {Code = "UK", Name = "United Kingdom", Cities = new List<string>{"London", "Edinburgh"} }, 
    };

    List<Customer> customers = new List<Customer>
    {
        new Customer {Name = "Cheap Software Ltd", Country = "GR", Debit = 35.6},
        new Customer {Name = "Fast Internet Co", Country = "DK", Debit = 51.7},
        new Customer {Name = "Neat Hardware", Country = "ES", Debit = 12.1},
        new Customer {Name = "Cool Tech", Country = "DK", Debit = 23.5},
        new Customer {Name = "Steady Logic", Country = "CZ", Debit = 49.4},
        new Customer {Name = "Nasty Data", Country = "GR", Debit = 62.2}  
    };

The from clause

It is possible to have a compound from clause in order to access inner data.

    var q = from country in countries
                from city in country.Cities
                where city.StartsWith("A") || city.StartsWith("B")
                orderby city descending
                select city; 
           
    ...                
                
    var q = from country in countries
            where (country.Code == "GR") || (country.Code == "DK")
                from customer in customers
                where (customer.Country == country.Code) && (customer.Debit > 40)
                orderby customer.Name
                select customer;

Subsequent from clauses resemble nested foreach statements

    foreach (country in countries)
    {
        foreach (city in country.Cities)
        {
            ...
        }
    }

Queries containing compound from clauses could be written analytically.

    var q = from country in countries
            where (country.Code == "GR") || (country.Code == "DK")
            select country.Code;

    var q2 = from customer in customers
             where q.Contains(customer.Country) && (customer.Debit > 40)
             select customer;

The join clause

Joins in LINQ are always inner (equijoins) joins. A LINQ join joins two sequences in an equii-join based on some equality condition.

    var q = from customer in customers
            join country in countries on customer.Country equals country.Code
            orderby country.Code
            select new { Name = customer.Name, Country = country.Name };

Since a join is always an inner join the order of the datasources in the expression has no impact at all. Thus the next example results in the same output as the above one.

    var q = from country in countries
            join customer in customers on country.Code equals customer.Country
            orderby country.Code
            select new { Name = customer.Name, Country = country.Name };

The group join

The use of keyword into has a special effect when used with LINQ joins.

For each element of the left sequence it creates an IEnumerable<T> list, where T is the element type of the right sequence. That list contains those elements of the right sequence which pass the equality condition. This is grouping actually.

So joins using the keyword into are called group joins. There is no SQL equivalent for LINQ group join.

In general the keyword into is used to create an identifier (which is another implicitly typed local variable) which refers to the results of a group, join or select clause. That identifier can then be used further as a datasource for additional query commands.

    var q = from country in countries
            join customer in customers on country.Code equals customer.Country into customersPerCountry
            orderby country.Code
            select new { Country = country.Name, Customers = customersPerCountry };

Simulating a left outer join

The left join simulation is done by using a second iteration (from clause) over the sequence produced by the group join and referenced by the keyword into, and forcing ALL the elements of that produced sequence into the final sequence, by using the DefaultIfEmpty() extension method .

    var q = from country in countries
            join customer in customers on country.Code equals customer.Country into customersPerCountry
            orderby country.Code
                from v in customersPerCountry.DefaultIfEmpty()
                select new { Country = country.Name, Customer = (v == null ? string.Empty : v.Name) };

The group..by clause

The result of a group..by clause is actually a

    IEnumerable<IGrouping<TKey, TElement>>

that is a sequence of IGrouping<TKey, TElement> objects. Each IGrouping<TKey, TElement> object may contain zero or more elements that match the key condition for the group.

    /* here, the implicitly typed variable q would be defined as 
       IEnumerable<IGrouping<string, Customer>> q = ...  */
    var q = from customer in customers
            orderby customer.Country
            group customer by customer.Country;

Here is another group..by example using an into clause. The Key is a property of the IGrouping<TKey, TElement> interface.

    var q = from customer in customers
            group customer by customer.Name[0] into g
            orderby g.Key
            select g;

A very frequent need is to group a datasource and then apply some aggregate function. One way to achieve this is to group data using an into clause and then call the appropriate aggregate extension method, possibly using a lambda expression.

    var q = from customer in customers
            group customer by customer.Country into cc
            orderby cc.Key
            select new { Country = cc.Key, Debit = cc.Sum(c => c.Debit) };

The let clause

The let keyword can be used inside a query expression in order to create and initialize additional range variables. Those range variables are considered as constants inside the expression. It may be used in further queries though, if they store a queryable type.

    var q = from country in countries
            let DebitList = (
                from customer in customers
                where (customer.Country == country.Code)
                select customer.Debit )
            orderby country.Code
            select new { Country = country.Code, Debit = DebitList.Sum() };

In the above example the IEnumerable<T>.Sum() method is used through a range variable created with the let keyword.

Standard query operators

A LINQ query expression is written using special contextual keywords, such as from, select, orderby etc. At compile-time the compiler converts those keywords to special method calls known as Standard Query Operators.

Standard query operators are methods defined as extension methods in special static classes and have names such as Select(), Where(), OrderBy() etc.

The Enumerable class is the container of those extension methods related to IEnumerable<T>. The Queryable class is the container of those extension methods related to IQueryable<T>.

Most of those standard query operators operate on sequences, that is objects implementing the IEnumerable<T> generic interface or a derived interface such as the IQueryable<T>, and provide capabilities similar to a SQL SELECT statement, that is filtering, sorting, grouping, ordering, aggregation and the like.

As a consequence a query expression can be expressed using standard query operators with exactly the same results.

    int[] numbers = { 8, 5, 2, 9, 3, 0, 7, 1, 6, 4 };

    var q = from n in numbers
            where n < 5
            orderby n
            select n;
            
    ...            
            
    var q = numbers.Where(n => n < 5)
                   .OrderBy(n => n);

Query Syntax is the syntax that uses contextual keywords while Method Syntax is the syntax that uses standard query operators. Although query syntax is the recommended way there is no semantic difference between the two. Furthermore query syntax has its limits. For example there is no contextual keyword for aggregate functions such as Sum, Min or Max.

Query execution time

A query written in query syntax returns no data at all and remains un-executed. The query is executed and data is retrieved from the datasource when a variable of that query is used with a foreach statement. This is called Deferred Execution.

Immediate Execution happens when the query is executed by means other than a foreach statement. Some standard query operators such as the Count(), Min(), Max(), Average(), ToList(), ToArray() etc force an immediate execution.

    var q = numbers.Where(n => n < 5)
                   .OrderBy(n => n)
                   .ToArray();

The above can be written in two parts as following.

    var q = numbers.Where(n => n < 5)
                   .OrderBy(n => n);

    var q2 = q.ToArray();

Calling ToList() and ToArray() creates a strongly-type IEnumerable<T> collection object which then contains the data obtained from the datasource.

Operations not available to query syntax

A number of LINQ operations have no equivalent query syntax. Those methods are callable from inside a query expression though as any other available method.

Aggregation operations

    Min()        returns the minimum value in the sequence.
    Max()        returns the maximum value in the sequence.
    Sum()        returns the sum of the values in the sequence.
    Count()      returns the number of elements in the sequence.
    LongCount()  returns the number of elements in the sequence.
    Average()    returns the average value in the sequence.
    Aggregate()  performs a custom aggregation operation in the sequence.

Data type convertion operations

    AsEnumerable()    returns the sequence as IEnumerable<T>.
    AsQueryable()     returns the sequence as an IQueryable<T>.  
    Cast()            casts the elements of the sequence to a specified type.
    OfType()          returns a new sequence containing only those elements that can be cast to the specified type.
    ToList()          returns a List<T> containing all the elements of the sequence.
    ToArray()         returns a Array<T> containing all the elements of the sequence.
    ToLookup()        returns a Lookup<TKey, TElement>, an one-to-many dictionary (key-to-elements), based on a key condition. 
    ToDictionary()    returns a Dictionary<TKey, TValue>, an one-to-one dictionary (key-to-element), based on a key condition.

Set operations

    Distinct()   returns a new sequence removing duplicate values from the sequence.  
    Except()     returns a new sequence containing the elements of the sequence that do not appear in a specified sequence (set difference).
    Intersect()  returns a new sequence containing only the elements common to the sequence and to a specified sequence (set intersection). 
    Union()      returns a new sequence containing elements of the sequence and a specified sequence, excluding duplicates.

Quantifier operations

    All()        returns true if all the elements in the sequence satisfy a specified a condition.
    Any()        returns true if any of the elements in the sequence satisfy a specified a condition.
    Contains()   returns true if the sequence contains a specified element.

Generation operations

    DefaultIfEmpty()  returns a new sequence with a single element of the specified type and with a default value if the sequence is empty.  
    Empty()           returns an new empty collection.
    Range()           returns a new sequence that contains a sequence of numbers.
    Repeat()          returns a new sequence that contains one repeated value.

Element operations

    First()             returns the first element in the sequence or the first element that satisfies a specified condition.
    Last()              returns the last element in the sequence or the last element that satisfies a specified condition.
    ElementAt()         returns the element at a specified index in the sequence.
    Single()            returns the only element in the sequence or the only element that satisfies a specified condition.
    
    FirstOrDefault()      as the First(). Alternatively  it returns a default value if no such element exists.
    LastOrDefault()       as the Last(). Alternatively  it returns a default value if no such element exists.
    ElementAtOrDefault()  as the ElementAt(). Alternatively it returns a default value if no such element exists.
    SingleOrDefault()     as the Single(). Alternatively  it returns a default value if no such element exists.

Data partitioning operations

    Skip()     returns a new sequence containing elements from the specified index until the end of the sequence.  
    Take()     returns a new sequence containing elements from index zero to the specified index in the sequence.  
    
    SkipWhile()  skips elements in the sequence while the specified condition is true and returns the remaining elements
    TakeWhile()  takes elementes in the sequence while the specified condition is true and ignores the rest.

Miscellaneous operations

    Reverse()           returns a new sequence containing the elements of the sequence in reverse order.
    SequenceEqual()     returns true if this sequence equals to a specified sequence by comparing elements in a pair-wise manner.
    Concat()            returns a new sequence by concatenating the sequence and a specified sequence.

Here is an example using the Lookup() method.

    // the key is a bool value, odd and even numbers.
    // The Lookup() method creates a sub-list for each key
    ILookup<bool, int> lk = numbers.ToLookup(n => n % 2 == 0);
    

    // get the sub-list for odd numbers
    IEnumerable<int> list = lk[false];

    string S = "odd numbers" + Environment.NewLine;

    foreach (int n in list)
        S += "   " + n.ToString() + Environment.NewLine;


    // get the sub-list for even numbers
    list = lk[true];

    S += "even numbers" + Environment.NewLine;

    foreach (int n in list)
        S += "   " + n.ToString() + Environment.NewLine;

    MessageBox.Show(S);

Here is another example using the Intersect() and Last() methods.

    int[] a = { 2, 9, 6, 4, 7};
    int[] b = { 7, 1, 8, 5, 9 };


    var v = a.Intersect(b);

    string S = "";

    foreach (int n in v)
    {
        S += n.ToString();

        if (n != v.Last())
            S += ", ";
    }

    MessageBox.Show(S);

LINQ-ready datasources

LINQ can be extended to use a number of different datasources, such as a CLR data structure (CLR type), an SQL database of some kind, a Web or WCF service, the the file system, etc.

To use LINQ against a specific type of datasource, an implementation of Standard Query Operators is required for that datasource.

CLR already provides an implementation of Standard Query Operators for any CLR data structure (CLR type), that is for in-memory-data. The System.Linq.Enumerable static class contains a full set of Standard Query Operators, already implemented as extension methods, for querying objects that implement IEnumerable<T>.

The only additional requirements for a type to make it LINQ-ready is to implement the IEnumerable<T> generic interface.

Here is an example.

    public class CustomList: IEnumerable<int>
    {
        private List<int> list = new List<int> { 0, 8, 5, 2, 9, 3, 7, 1, 6, 4 }; 

        public IEnumerator<int> GetEnumerator()
        {
            for (int i = 0; i < list.Count; i++)
                yield return list[i];
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            for (int i = 0; i < list.Count; i++)
                yield return list[i];
        }
    }
    
    ...    

    CustomList list = new CustomList();

    var q = from n in list
            where n >= 5
            orderby n
            select n;

    string S = "";

    foreach (int n in q)
        S += n.ToString() + Environment.NewLine;

    MessageBox.Show(S);

If the predefined functionality provided by the Enumerable class extension methods is not the desirable functionality, then the programmer has to implenent a full custom set of Standard Query Operators either as instance methods to the type or as extension methods for that type.

To make a remote datasource, such as an SQL database of some kind or a Web-WCF service, LINQ-ready, the implementation of a LINQ Provider is required.

LINQ Providers

A LINQ provider is a library or API which provides an implementation of the Standard Query Operators for a specific datasource. The actual translation of query expressions to standard query operators is done by a LINQ provider.

There is already a predefined set of LINQ providers in BCL

LINQ to Objects (not strictly a provider, but an API)
LINQ to ADO.NET (in two parts: LINQ to SQL and LINQ to DataSet)
LINQ to XML

A LINQ Provider for a remote datasource has to implement the IQueryable<T> generic interface, at least, and possibly the IOrderedQueryable<T> and IQueryProvider interfaces. A provider which implements the IQueryable<T> receives LINQ queries in the form of Expression Trees which it then executes in a way best suited to the specific datasource. A LINQ provider can also define operations for inserting, updating and deleting data from/to the datasource it represents.

Expreesion Trees are described elsewhere.

LINQ to Objects

LINQ to objects means the use of LINQ queries with any datasource which is an IEnumerable or an IEnumerable<T> object, directly and without any specialized LINQ provider intervention. LINQ to objects is used to query datasources like an array, an ArrayList, a List<T>, a Dictionary<TKey, TValue> and the like.

All examples so far were LINQ to Objects applications.

Here is another LINQ to Objects example, which queries the file system.

    /* returns a list of FileInfo objects for files found in path */
    List<FileInfo> GetFileListOf(string path)
    {
        List<FileInfo> Result = new List<FileInfo>();

        string[] files = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories);

        foreach (string s in files)
            Result.Add(new FileInfo(s));

        return Result;
    }
    
    ...    
    
    /* groups and presents file names by extension */
    
    string path =  Path.GetFullPath(@"..\..\");

    List<FileInfo> fileList = GetFileListOf(path);

    var q = from FI in fileList
            group FI.Name by FI.Extension.ToUpper() into g
            orderby g.Key
            select g;
            

    string S = "";

    foreach (var g in q)
    {
        S += g.Key + Environment.NewLine;
        foreach (string fileName in g)
            S += "     " + fileName + Environment.NewLine;
    }

    MessageBox.Show(S);

LINQ to DataSet

LINQ to DataSet provides the means for querying DataTable objects. LINQ to DataSet functionality is provided by the DataTableExtensions and DataRowExtensions classes which contain a set of extension methods relative to DataTable and DataRow classes.

A DataTable can be converted to an IEnumerable<DataRow> sequence with a call to AsEnumerable() method

    public static EnumerableRowCollection<DataRow> AsEnumerable(this DataTable source);

and an IEnumerable<DataRow> sequence can be converted back to a DataTable with a call to CopyToDataTable() method

    public static DataTable CopyToDataTable<T>(this IEnumerable<T> source) where T : DataRow;

A column value is accessed by using the overloaded Field() generic method provided by the DataRowExtensions class

    public static T Field<T>(this DataRow row, string columnName);
    public static T Field<T>(this DataRow row, int columnIndex);

Overcoming the constraint of the CopyToDataTable() method

There is a problem with that implementation of the CopyToDataTable() though, since it defines a constraint, which means that only IEnumerable<DataRow> sequences may be passed to the method. But frequently when processing a join for example, the easiest way is to create sequences of anonymous type objects. Unfortunatelly the CopyToDataTable() restricts the element type to DataRow.

There is a solution though found at internet. Here it is, slightly modified by the author of this tutorial.

    /*  taken from:     http://msdn.microsoft.com/en-us/library/bb669096.aspx 
        and             http://blogs.msdn.com/aconrad/archive/2007/09/07/science-project.aspx 
     
        This class provides the code needed to implement overloads of the CopyToDataTable() 
        extension method for the IEnumerable<T> without a generic constraint
        as the other versions, those contained in the DataTableExtensions class, do.
        So these new overloads of the CopyToDataTable() can be used with any object
        returned by a query expression, even anonymous types.
     */
    public class ObjectShredder<T>
    {
        private System.Reflection.FieldInfo[] _fi;
        private System.Reflection.PropertyInfo[] _pi;
        private System.Collections.Generic.Dictionary<string, int> _ordinalMap;
        private System.Type _type;

        // ObjectShredder constructor.
        public ObjectShredder()
        {
            _type = typeof(T);
            _fi = _type.GetFields();
            _pi = _type.GetProperties();
            _ordinalMap = new Dictionary<string, int>();
        }

        /// <summary>
        /// Loads a DataTable from a sequence of objects.
        /// </summary>
        /// <param name="source">The sequence of objects to load into the DataTable.</param>
        /// <param name="table">The input table. The schema of the table must match that 
        /// the type T.  If the table is null, a new table is created with a schema 
        /// created from the public properties and fields of the type T.</param>
        /// <param name="options">Specifies how values from the source sequence will be applied to 
        /// existing rows in the table.</param>
        /// <returns>A DataTable created from the source sequence.</returns>
        public DataTable Shred(IEnumerable<T> source, DataTable table, LoadOption? options)
        {
            // Load the table from the scalar sequence if T is a primitive type.
            if (typeof(T).IsPrimitive)
            {
                return ShredPrimitive(source, table, options);
            }

            // Create a new table if the input table is null.
            if (table == null)
            {
                table = new DataTable(typeof(T).Name);
            }

            // Initialize the ordinal map and extend the table schema based on type T.
            table = ExtendTable(table, typeof(T));

            // Enumerate the source sequence and load the object values into rows.
            table.BeginLoadData();
            using (IEnumerator<T> e = source.GetEnumerator())
            {
                while (e.MoveNext())
                {
                    if (options != null)
                    {
                        table.LoadDataRow(ShredObject(table, e.Current), (LoadOption)options);
                    }
                    else
                    {
                        table.LoadDataRow(ShredObject(table, e.Current), true);
                    }
                }
            }
            table.EndLoadData();

            // Return the table.
            return table;
        }

        public DataTable ShredPrimitive(IEnumerable<T> source, DataTable table, LoadOption? options)
        {
            // Create a new table if the input table is null.
            if (table == null)
            {
                table = new DataTable(typeof(T).Name);
            }

            if (!table.Columns.Contains("Value"))
            {
                table.Columns.Add("Value", typeof(T));
            }

            // Enumerate the source sequence and load the scalar values into rows.
            table.BeginLoadData();
            using (IEnumerator<T> e = source.GetEnumerator())
            {
                Object[] values = new object[table.Columns.Count];
                while (e.MoveNext())
                {
                    values[table.Columns["Value"].Ordinal] = e.Current;

                    if (options != null)
                    {
                        table.LoadDataRow(values, (LoadOption)options);
                    }
                    else
                    {
                        table.LoadDataRow(values, true);
                    }
                }
            }
            table.EndLoadData();

            // Return the table.
            return table;
        }

        public object[] ShredObject(DataTable table, T instance)
        {

            FieldInfo[] fi = _fi;
            PropertyInfo[] pi = _pi;

            if (instance.GetType() != typeof(T))
            {
                // If the instance is derived from T, extend the table schema
                // and get the properties and fields.
                ExtendTable(table, instance.GetType());
                fi = instance.GetType().GetFields();
                pi = instance.GetType().GetProperties();
            }

            // Add the property and field values of the instance to an array.
            Object[] values = new object[table.Columns.Count];
            foreach (FieldInfo f in fi)
            {
                values[_ordinalMap[f.Name]] = f.GetValue(instance);
            }

            foreach (PropertyInfo p in pi)
            {
                values[_ordinalMap[p.Name]] = p.GetValue(instance, null);
            }

            // Return the property and field values of the instance.
            return values;
        }

        public DataTable ExtendTable(DataTable table, Type type)
        {
            // Extend the table schema if the input table was null or if the value 
            // in the sequence is derived from type T.            
            foreach (FieldInfo f in type.GetFields())
            {
                if (!_ordinalMap.ContainsKey(f.Name))
                {
                    // Add the field as a column in the table if it doesn't exist
                    // already.
                    DataColumn dc = table.Columns.Contains(f.Name) ? table.Columns[f.Name]
                        : table.Columns.Add(f.Name, f.FieldType);

                    // Add the field to the ordinal map.
                    _ordinalMap.Add(f.Name, dc.Ordinal);
                }
            }
            foreach (PropertyInfo p in type.GetProperties())
            {
                if (!_ordinalMap.ContainsKey(p.Name))
                {
                    // Add the property as a column in the table if it doesn't exist
                    // already.
                    DataColumn dc = table.Columns.Contains(p.Name) ? table.Columns[p.Name]
                        : table.Columns.Add(p.Name, p.PropertyType);

                    // Add the property to the ordinal map.
                    _ordinalMap.Add(p.Name, dc.Ordinal);
                }
            }

            // Return the table.
            return table;
        }
    }


    /* serves just as a container for the CopyToDataTable() extension methods */
    public static class CustomLINQtoDataSetMethods
    {
        public static DataTable CopyToDataTable<T>(this IEnumerable<T> source)
        {
            if (source is IEnumerable<DataRow>)
                return DataTableExtensions.CopyToDataTable<DataRow>(source as IEnumerable<DataRow>);

            return new ObjectShredder<T>().Shred(source, null, null);
        }

        public static DataTable CopyToDataTable<T>(this IEnumerable<T> source, DataTable table, LoadOption options)
        {
            if (source is IEnumerable<DataRow>)
            {
                 DataTableExtensions.CopyToDataTable<DataRow>(source as IEnumerable<DataRow>, table, options);
                 return table;
            }

            return new ObjectShredder<T>().Shred(source, table, options);
        }

    }

Sample database

All LINQ to DataSet examples use a MS Access database with the following schema and data.

    create table CITY
    (
       Country varchar(4),
       Name varchar(32)
    );
    
    create table COUNTRY
    (
       Code varchar(4),
       Name varchar(32)
    );
    
    create table CUSTOMER
    (   
       Name varchar(32),
       Country varchar(4),
       Debit float
    ); 
    
        
    insert into CITY values ('ES', 'Madrid');
    insert into CITY values ('ES', 'Barcelona');
    insert into CITY values ('ES', 'Valencia');
    insert into CITY values ('CZ', 'Prague');
    insert into CITY values ('CZ', 'Brno');
    insert into CITY values ('GR', 'Athens');
    insert into CITY values ('GR', 'Thessaloniki');
    insert into CITY values ('DK', 'Copenhagen');
    insert into CITY values ('DK', 'Aarhus');
    insert into CITY values ('UK', 'London');
    insert into CITY values ('UK', 'Edinburgh');

    insert into COUNTRY values ('ES', 'Spain');
    insert into COUNTRY values ('CZ', 'Czech Republic');
    insert into COUNTRY values ('GR', 'Greece');
    insert into COUNTRY values ('DK', 'Denmark');
    insert into COUNTRY values ('UK', 'United Kingdom');
     
    insert into CUSTOMER values ('Cheap Software Ltd', 'GR', 35.6); 
    insert into CUSTOMER values ('Fast Internet Co', 'DK', 51.7); 
    insert into CUSTOMER values ('Neat Hardware', 'ES', 12.1); 
    insert into CUSTOMER values ('Cool Tech', 'DK', 23.5); 
    insert into CUSTOMER values ('Steady Logic', 'CZ', 49.4); 
    insert into CUSTOMER values ('Nasty Data', 'GR', 62.2);

Data are loaded when the sample application starts and they are available as DataTable objects.

    public partial class MainForm : Form
    {
    
        ...
        ...

        /* static fields */
        static private readonly string connectionString = @"";
        static private DbProviderFactory factory = null;


        /* static constructor */
        static MainForm()
        {
            string S = Path.GetFullPath(@"..\..\..\LINQ.MDB");
            connectionString = string.Format(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=""{0}"";User Id=admin;Password=;", S);
            factory = DbProviderFactories.GetFactory("System.Data.OleDb");
        }


        /* a static method which executes SQL and returns a DataTable object */
        static DataTable Select(string SQL)
        {
            DataTable Result = null;

            if (factory != null)
            {
                DbConnection con = factory.CreateConnection();
                con.ConnectionString = connectionString;

                try
                {
                    con.Open();

                    using (DbCommand cmd = con.CreateCommand())
                    {
                        using (DbDataAdapter adapter = factory.CreateDataAdapter())
                        {
                            cmd.CommandText = SQL;
                            adapter.SelectCommand = cmd;
                            Result = new DataTable();
                            adapter.Fill(Result);
                        }
                    }
                }
                finally
                {
                    if (con.State == ConnectionState.Open)
                        con.Close();
                }

            }

            return Result;
        }


        /* instance fields */
        DataTable tblCity = Select("select * from CITY");
        DataTable tblCountry = Select("select * from COUNTRY");
        DataTable tblCustomer = Select("select * from CUSTOMER");
        
    
        ...
        ...    
        
    }

LINQ to DataSet queries

The following examples are based on the examples presented earlier for the LINQ to Objects and properly modified.

The from, where, orderby, select clauses

The query variable is of type IEnumerable<DataRow>. The AsEnumerable() is used to convert a DataTable to an IEnumerable<DataRow> object. Thus the range variable, city in this example, is a DataRow object. Finally the CopyToDataTable() is used to convert the IEnumerable<DataRow> back to a DataTable which then can be used as a DataSource for a DataGridView.

    var q = from city in tblCity.AsEnumerable()
            where city.Field<string>("Name").StartsWith("A") || city.Field<string>("Name").StartsWith("B")
            orderby city.Field<string>("Name") descending
            select city;

    Grid.DataSource = q.CopyToDataTable();

A sub-query based on a result list coming from the first query. Each from clause can be thought of as a separated foreach statement.

    var q = from country in tblCountry.AsEnumerable()
            where (country.Field<string>("Code") == "GR") || (country.Field<string>("Code") == "DK")
                from customer in tblCustomer.AsEnumerable()
                where (customer.Field<string>("Country") == country.Field<string>("Code")) && (customer.Field<double>("Debit") > 40)
                orderby customer.Field<string>("Name") 
                select customer;

    Grid.DataSource = q.CopyToDataTable();

Queries containing compound from clauses could be written analytically.

    var q = from country in tblCountry.AsEnumerable()
            where (country.Field<string>("Code") == "GR") || (country.Field<string>("Code") == "DK")
            select country.Field<string>("Code");

    var q2 = from customer in tblCustomer.AsEnumerable()
             where q.Contains(customer.Field<string>("Country")) && (customer.Field<double>("Debit") > 40)
             orderby customer.Field<string>("Name")
             select customer;

    Grid.DataSource = q2.CopyToDataTable();

The join clause

Joins in LINQ are always inner (equijoins) joins. A LINQ join joins two sequences in an equii-join based on some equality condition.

The DataTableExtensions.CopyToDataTable<T>() extension method is a generic with a constraint for DataRow class. Meaning that it can be used with DataRow objects only. However in the next example the element type is an anonymous type. For this and similar examples to work a solution is provided taken from

    http://msdn.microsoft.com/en-us/library/bb669096.aspx

which adds another CopyToDataTable() overloaded version without any type constraint.

    var q = from customer in tblCustomer.AsEnumerable()
            join country in tblCountry.AsEnumerable() on customer.Field<string>("Country") equals country.Field<string>("Code")
            orderby country.Field<string>("Code")
            select new { Name = customer.Field<string>("Name"), Country = country.Field<string>("Name") };

    Grid.DataSource = q.CopyToDataTable();

Since a join is always an inner join the order of the datasources in the expression has no impact at all. Thus the next example results in the same output as the above one.

    var q = from country in tblCountry.AsEnumerable()
            join customer in tblCustomer.AsEnumerable() on country.Field<string>("Code") equals customer.Field<string>("Country")
            orderby country.Field<string>("Code")
            select new { Name = customer.Field<string>("Name"), Country = country.Field<string>("Name") };

    Grid.DataSource = q.CopyToDataTable();

The group join

This and the next examples which group data, create a DataSet which holds a DataTable for each group. Here are the helper methods

    /* helper method: creates and returns a DataTable based on TableName and Rows */
    DataTable ConstructTable(string TableName, IEnumerable<DataRow> Rows)
    {
        DataTable Result;
        if (Rows.Count() > 0)
            Result = Rows.CopyToDataTable();
        else
            Result = new DataTable();

        Result.TableName = TableName;
        return Result;
    }


    /* helper method: returns a DataSet after placing all tables into its Tables  
       and TableNames into cboTables.Items */
    DataSet ConstructDataset(IEnumerable<DataTable> tables)
    {
        DataSet Result = new DataSet();
        cboTables.Items.Clear();

        foreach (DataTable table in tables)
        {
            Result.Tables.Add(table);
            cboTables.Items.Add(table.TableName);
        }

        cboTables.SelectedIndex = 0;
        return Result;
    }

The next example joins the two tables into customersPerCountry, an IEnumerable<DataRow> object, and then it calls the helper ConstructTable() method in order to create a DataTable for the current group. Here the q implicitly typed local variable is a IEnumerable<DataTable> object.

    var q = from country in tblCountry.AsEnumerable()
            join customer in tblCustomer.AsEnumerable() on country.Field<string>("Code") equals customer.Field<string>("Country") 
                into customersPerCountry
            orderby country.Field<string>("Code")
            select ConstructTable(country.Field<string>("Name"), customersPerCountry);

    Grid.DataSource = ConstructDataset(q).Tables[0];

Simulating a left outer join

As with the LINQ to Objects, the left join simulation is done by using a second iteration (from clause) over the sequence produced by the group join and referenced by the keyword into, and forcing ALL the elements of that produced sequence into the final sequence, by using the DefaultIfEmpty() extension method .

    var q = from country in tblCountry.AsEnumerable()
            join customer in tblCustomer.AsEnumerable() on country.Field<string>("Code") equals customer.Field<string>("Country")
                into customersPerCountry
            orderby country.Field<string>("Code")
                from v in customersPerCountry.DefaultIfEmpty()
                select new { Country = country.Field<string>("Name"), Customer = (v == null ? string.Empty : (string)v["Name"]) };
     
    Grid.DataSource = q.CopyToDataTable();

The group..by clause

This example is slightly different from the corresponding one in the LINQ to Objects examples. An into clause is added and the grouping result is passed to the ConstructTable().

The q implicitly typed local variable is a IEnumerable<DataTable> object, while the g is a IGrouping<string, DataRow>.

    /*  here, the implicitly typed variable q would be defined as 
        IEnumerable<DataTable> q = ...  */
    var q = from customer in tblCustomer.AsEnumerable()
            orderby customer.Field<string>("Country")
            group customer by customer.Field<string>("Country") into g
                select ConstructTable(g.Key, g.ToList());
                
    Grid.DataSource = ConstructDataset(q).Tables[0];

Another group..by example using an into clause.

    var q = from customer in tblCustomer.AsEnumerable()
            group customer by customer.Field<string>("Name")[0] into g
            orderby g.Key
            select ConstructTable(g.Key.ToString(), g.ToList());

    Grid.DataSource = ConstructDataset(q).Tables[0];

Grouping and using aggregate functions (sum).

    var q = from customer in tblCustomer.AsEnumerable()
            group customer by customer.Field<string>("Country") into cc
            orderby cc.Key
            select new { Country = cc.Key, Debit = cc.Sum(c => c.Field<double>("Debit")) };

    Grid.DataSource = q.CopyToDataTable();

The let clause

Almost the same as with the LINQ to Objects.

    var q = from country in tblCountry.AsEnumerable()
            let DebitList = (
                from customer in tblCustomer.AsEnumerable()
                where (customer.Field<string>("Country") == country.Field<string>("Code"))
                select customer.Field<double>("Debit"))
            orderby country.Field<string>("Code")
            select new { Country = country.Field<string>("Code"), Debit = DebitList.Sum() };

    Grid.DataSource = q.CopyToDataTable();

LINQ

Copyright � 2009 Theodoros Bebekis, Thessaloniki, Greece (teo point bebekis at gmail point com)