How do I – Filter values in two lists of custom classes using Linq

Written by Cornelius J. van Dyk on . Posted in How Do I...

I originally titled this post “Linq – So wonderful and oh so damn frustrating”, but then decided that the post probably warrants its current name since someone might actually be trying to do the same thing and search for examples of how to achieve that.  There are tons of examples of just how cool Linq is and I’m not saying that it isn’t, but man it can be so frustrating to work with at times.  Now I’m not going to pretend that I’m a Linq expert.  I’m not.  I know just enough to be dangerous.  Linq has been useful to me in the past, but this week I stumbled across an issue that just drove me batty!

<rant>
Just this weekend I was telling Jess (my wonderful SciFi, geeky wife) that 80% of a developer’s time is spent figuring out why code (methods & APIs) does NOT work the way it’s supposed to.  Not how we THINK it’s supposed to work, but how it was PUBLISHED and advertised to work.  
</rant>

OK, off my soapbox and back to the Linq issue…

I have a class defined as SystemFile.  I’m basically trying to compare a folder with all its files and sub folders to another folder with its files and sub folders.  In the process I have two lists of SystemFile containing the info about the two folders I wish to compare.  Using Linq to compare the lists, we can define a comparer class of type IEqualityComparer<T> to do the comparison of our custom class and assist in the filtering.  My comparer class is defined thus:

 

public class SystemFileGACComparor : IEqualityComparer<SystemFile>
{
    public bool Equals(SystemFile source, SystemFile target)
    {
        if (source.FullPath.ToLower().Contains(@"c:\windows\assembly\temp"))
        {
            return true;
        }
        else
        {
            if (source.FullPath.ToLower().Contains(@"c:\windows\assembly\tmp"))
            {
                return true;
            }
            else
            {
                if (source.FullPath == target.FullPath)
                {
                    return true;
                }
                else
                {
                    return false;
                }
            }
        }
    }

    public int GetHashCode(SystemFile source)
    {
        return source.FullPath.GetHashCode();
    }
}

 

A note about my logic.  The easiest code implementation of the Equals() method could have been simply as

   return (source.FullPath == target.FullPath);

but since I am actually comparing GAC folders in this case, there are the the “Temp” and “Tmp” folders to consider (and exclude) in my comparison.  Since “Tmp” is used during installation and “Temp” during uninstallation of of assemblies, they will have temporary GUID values in their path names which will always be different between systems.  As a result, we have to exclude anything from these folders in our comparison.  For that reason, I added the checks for these folders in the path of the source being checked.

With the comparer class in place, let’s implement it.  The code is straight forward thus:

 

SystemFileGACComparor compare = new SystemFileGACComparor();
IEnumerable<SystemFile> sfSource = LoadFileListFromXML("1WEB.xml");
IEnumerable<SystemFile> sfTarget = LoadFileListFromXML("1APP.xml");
List<SystemFile> lstOnSourceNotTarget = sfSource.Except(sfTarget, compare).ToList();

First we define a an instance of the comparer class for use.
Then we load the XML dump of our two lists into an IEnumerable<SystemFile> structure so that Linq can work on them.
Now simply ask Linq to compare the two lists and produce a list with the differences.  Per the published MSDN documentation, the Except() method will take our sfSource and remove any records that it finds in sfTarget that match, finally returning what remains.

For the record, here is what I’m comparing:

1WEB.xml

<?xml version=”1.0″ encoding=”utf-8″?>
<GAC>
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a\adodb.dll” FileName=”adodb.dll” Size=”110592″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-07T14:27:58.103-04:00″ LastModifiedAt=”2010-07-07T14:27:58.119-04:00″ LastAccessedAt=”2010-07-07T14:27:58.103-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a\__AssemblyInfo__.ini” FileName=”__AssemblyInfo__.ini” Size=”196″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-07T14:29:40.671-04:00″ LastModifiedAt=”2010-07-07T14:29:40.703-04:00″ LastAccessedAt=”2010-07-07T14:29:40.671-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” FileName=”7.0.3300.0__b03f5f7f11d50a3a” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC\ADODB” CreatedAt=”2010-07-07T14:27:58.103-04:00″ LastModifiedAt=”2010-07-07T14:29:40.671-04:00″ LastAccessedAt=”2010-07-07T14:29:40.671-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB” FileName=”ADODB” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC” CreatedAt=”2010-07-07T14:29:40.703-04:00″ LastModifiedAt=”2010-07-07T14:29:40.718-04:00″ LastAccessedAt=”2010-07-07T14:29:40.718-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a\envdte.dll” FileName=”envdte.dll” Size=”245760″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-13T11:42:59.953-04:00″ LastModifiedAt=”2010-07-13T11:42:59.968-04:00″ LastAccessedAt=”2010-07-13T11:42:59.953-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a\__AssemblyInfo__.ini” FileName=”__AssemblyInfo__.ini” Size=”194″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-13T11:43:16.625-04:00″ LastModifiedAt=”2010-07-13T11:43:16.625-04:00″ LastAccessedAt=”2010-07-13T11:43:16.625-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” FileName=”8.0.0.0__b03f5f7f11d50a3a” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE” CreatedAt=”2010-07-13T11:42:59.953-04:00″ LastModifiedAt=”2010-07-13T11:43:16.625-04:00″ LastAccessedAt=”2010-07-13T11:43:16.625-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE” FileName=”EnvDTE” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC” CreatedAt=”2010-07-13T11:43:16.64-04:00″ LastModifiedAt=”2010-07-13T11:43:16.64-04:00″ LastAccessedAt=”2010-07-13T11:43:16.64-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\30ROE94YXW\One.MasterPages.dll” FileName=”One.MasterPages.dll” Size=”6144″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\temp\30ROE94YXW” CreatedAt=”2010-10-11T10:43:21.996-04:00″ LastModifiedAt=”2010-10-11T10:43:21.996-04:00″ LastAccessedAt=”2010-10-11T10:43:21.996-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\30ROE94YXW” FileName=”30ROE94YXW” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\temp” CreatedAt=”2010-10-11T15:27:36.922-04:00″ LastModifiedAt=”2010-10-11T15:27:36.922-04:00″ LastAccessedAt=”2010-10-11T15:27:36.922-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\5ZY4HSUIYK\One.EVMS.Dashboard.dll” FileName=”One.EVMS.Dashboard.dll” Size=”837632″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\temp\5ZY4HSUIYK” CreatedAt=”2010-10-04T08:56:07.183-04:00″ LastModifiedAt=”2010-10-04T08:56:07.277-04:00″ LastAccessedAt=”2010-10-04T08:56:07.183-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\5ZY4HSUIYK” FileName=”5ZY4HSUIYK” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\temp” CreatedAt=”2010-10-11T10:51:38.965-04:00″ LastModifiedAt=”2010-10-11T10:51:38.965-04:00″ LastAccessedAt=”2010-10-11T10:51:38.965-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp” FileName=”temp” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly” CreatedAt=”2009-07-14T00:58:28.892-04:00″ LastModifiedAt=”2010-10-11T17:33:53.016-04:00″ LastAccessedAt=”2010-10-11T17:33:53.016-04:00″ />
  <File FullPath=”C:\Windows\Assembly\tmp” FileName=”tmp” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly” CreatedAt=”2010-07-07T14:18:45.924-04:00″ LastModifiedAt=”2010-10-11T17:35:50.969-04:00″ LastAccessedAt=”2010-10-11T17:35:50.953-04:00″ />
</GAC>

1APP.xml

<?xml version=”1.0″ encoding=”utf-8″?>
<GAC>
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a\adodb.dll” FileName=”adodb.dll” Size=”110599″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-07T14:27:58.103-04:00″ LastModifiedAt=”2010-07-07T14:27:58.119-04:00″ LastAccessedAt=”2010-07-07T14:27:58.103-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a\__AssemblyInfo__.ini” FileName=”__AssemblyInfo__.ini” Size=”196″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-07T14:29:40.671-04:00″ LastModifiedAt=”2010-07-07T14:29:40.703-04:00″ LastAccessedAt=”2010-07-07T14:29:40.671-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” FileName=”7.0.3300.0__b03f5f7f11d50a3a” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC\ADODB” CreatedAt=”2010-07-07T14:27:58.103-04:00″ LastModifiedAt=”2010-07-07T14:29:40.671-04:00″ LastAccessedAt=”2010-07-07T14:29:40.671-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB” FileName=”ADODB” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC” CreatedAt=”2010-07-07T14:29:40.703-04:00″ LastModifiedAt=”2010-07-07T14:29:40.718-04:00″ LastAccessedAt=”2010-07-07T14:29:40.718-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.1__b03f5f7f11d50a3a\envdte.dll” FileName=”envdte.dll” Size=”245760″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-13T11:42:59.953-04:00″ LastModifiedAt=”2010-07-13T11:42:59.968-04:00″ LastAccessedAt=”2010-07-13T11:42:59.953-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.1__b03f5f7f11d50a3a\__AssemblyInfo__.ini” FileName=”__AssemblyInfo__.ini” Size=”194″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-13T11:43:16.625-04:00″ LastModifiedAt=”2010-07-13T11:43:16.625-04:00″ LastAccessedAt=”2010-07-13T11:43:16.625-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.1__b03f5f7f11d50a3a” FileName=”8.0.0.0__b03f5f7f11d50a3a” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE” CreatedAt=”2010-07-13T11:42:59.953-04:00″ LastModifiedAt=”2010-07-13T11:43:16.625-04:00″ LastAccessedAt=”2010-07-13T11:43:16.625-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE” FileName=”EnvDTE” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC” CreatedAt=”2010-07-13T11:43:16.64-04:00″ LastModifiedAt=”2010-07-13T11:43:16.64-04:00″ LastAccessedAt=”2010-07-13T11:43:16.64-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\33ROE94YXW\One.MasterPages.dll” FileName=”One.MasterPages.dll” Size=”6144″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\temp\30ROE94YXW” CreatedAt=”2010-10-11T10:43:21.996-04:00″ LastModifiedAt=”2010-10-11T10:43:21.996-04:00″ LastAccessedAt=”2010-10-11T10:43:21.996-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\33ROE94YXW” FileName=”30ROE94YXW” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\temp” CreatedAt=”2010-10-11T15:27:36.922-04:00″ LastModifiedAt=”2010-10-11T15:27:36.922-04:00″ LastAccessedAt=”2010-10-11T15:27:36.922-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\53Y4HSUIYK\One.EVMS.Dashboard.dll” FileName=”One.EVMS.Dashboard.dll” Size=”837632″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\temp\5ZY4HSUIYK” CreatedAt=”2010-10-04T08:56:07.183-04:00″ LastModifiedAt=”2010-10-04T08:56:07.277-04:00″ LastAccessedAt=”2010-10-04T08:56:07.183-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\53Y4HSUIYK” FileName=”5ZY4HSUIYK” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\temp” CreatedAt=”2010-10-11T10:51:38.965-04:00″ LastModifiedAt=”2010-10-11T10:51:38.965-04:00″ LastAccessedAt=”2010-10-11T10:51:38.965-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp” FileName=”temp” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly” CreatedAt=”2009-07-14T00:58:28.892-04:00″ LastModifiedAt=”2010-10-11T17:33:53.016-04:00″ LastAccessedAt=”2010-10-11T17:33:53.016-04:00″ />
  <File FullPath=”C:\Windows\Assembly\tmp” FileName=”tmp” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly” CreatedAt=”2010-07-07T14:18:45.924-04:00″ LastModifiedAt=”2010-10-11T17:35:50.969-04:00″ LastAccessedAt=”2010-10-11T17:35:50.953-04:00″ />
</GAC>

When we run through the code and break after the Except() method, this is what we see for the sfSource an sfTarget:

 

image

As we expected, we see all the files in the sfSource.  Now let’s look at the value of the results list lstOnSourceNotTarget:

 

image

Hmm… that’s curious… I expected the three EnvDTE records to be there, but the \temp\ files SHOULD have been filtered out by our comparer class’ Equal() method, right?

Confused, I set a break inside our comparer class and rerun our code to see what is actually being compared and filtered.  This is what we see:

Breaking here:

 

image

First break

 

image

Second break

 

image

Third break

 

image

Fourth break

 

image

Fifth break

 

image

Sixth break

 

image

Seventh break

 

image

And ???

 

image

Hmm… Seven breaks for 14 files and NONE of them were the 4 that shows up at the end with \temp\ in the name.  WTF???!!!

Are you confused?  I sure am!!!

The closest thing to a “rationalization” I can make for myself on this is that it has something to do with Linq’s LAZY nature.  So the items doesn’t get checked unless I iterate over them.  (I thought that’s what the Except() method was doing, but oh well…).

OK, enough time spent on something that DOESN’T WORK AS PUBLISHED!!!

Let’s get a workaround in place…

We can use the Where() method in Linq to get the subset of records that actually contains the string we’re trying to parse out and then reversing our logic, we can pass that set of records to the Except() method to exclude them from the original list.  Our new code looks like this: 

 

SystemFileGACComparor compare = new SystemFileGACComparor();
IEnumerable<SystemFile> sfSource = LoadFileListFromXML("1WEB.xml");
sfSource = sfSource.Except(
    sfSource.Where(filter => filter.FullPath.ToLower().Contains(@"c:\windows\assembly\temp")), compare).Except(
    sfSource.Where(filter => filter.FullPath.ToLower().Contains(@"c:\windows\assembly\tmp")), compare);
IEnumerable<SystemFile> sfTarget = LoadFileListFromXML("1APP.xml");
sfTarget = sfTarget.Except(
    sfTarget.Where(filter => filter.FullPath.ToLower().Contains(@"c:\windows\assembly\temp")), compare).Except(
    sfTarget.Where(filter => filter.FullPath.ToLower().Contains(@"c:\windows\assembly\tmp")), compare);
List<SystemFile> lstOnSourceNotTarget = sfSource.Except(sfTarget, compare).ToList();

 

We start by taking the list and applying the Where() method against it.  Inside the Where() method expression, we use the .FullPath property and convert its value to all lower case using the ToLower() method.  Once in all lower case, we use the Contains() method to check for the “temp” reference.  This will produce a list of only the records that actually contains the “temp” values.  Passing that off to the Except() method leaves us with the original list MINUS the records containing “temp”.

Wash, rinse, repeat…

We simply drop in a second Except() with the same code and a reference to “tmp” instead and tada!  We have a list that doesn’t contain either “temp” or “tmp”.

Finally, we can move onto the next problems that doesn’t work as published…

 



Cheers
C




image

Tags: , , , , ,

Trackback from your site.

Cornelius J. van Dyk

Born and raised in South Africa during the 70's I got my start in computers when a game on my Sinclair ZX Spectrum crashed, revealing it's BASIC source code. The ZX had a whopping 48K of memory which was considered to be a lot in the Commodore Vic20 era, but more importantly, it had BASIC built into the soft touch keyboard. Teaching myself to program, I coded my first commercial program at age 15.

After graduating high school at 17, I joined the South African Air Force, graduating the Academy and becoming a Pilot with the rank of First Lieutenant by age 20. After serving my country for six years, I made my way back into computer software.

Continuing my education, I graduated Suma Cum Laude from the Computer Training Institute before joining First National Bank where my work won the Smithsonian Award for Technological Innovation in the field of Banking and Insurance. Soon I met Will Coleman from Amdahl SA, who introduced me to a little known programming language named Huron/ObjectStar. As fate would have it, this unknown language and Y2K brought me to the USA in 1998.

I got involved with SharePoint after playing around with the Beta for SharePoint Portal Server 2003. Leaving my career at Rexnord to become a consultant in 2004, I was first awarded the Microsoft Most Valuable Professional Award for SharePoint in 2005, becoming only the 9th MVP for WSS at the time. I fulfilled a life long dream by pledging allegiance to the Flag as a US citizen in 2006. I met the love of my life and became a private consultant in 2008. I was honored to receive my ninth MVP award for SharePoint Server in 2013.

Leave a comment

You must be logged in to post a comment.