C# numeric sorting revisited

Introduction

This is part two of my previous post. Sadly, yesterday I didn’t know I was writing part one.

Yesterday evening, having taken my ported StrCmpLogicalW code home, I set out to put it through some more rigorous testing. But alas, it didn’t even take 10 seconds to find issues:

  1. It didn’t sort the same as Windows Explorer, because it included spaces in the comparison.
  2. Integers can overflow quite easily – it doesn’t require an unreasonably long numeric file name to do so.

So today I share the fixed version.

The updated implementation

Despite my comments yesterday about keeping the same function name as the C code I converted, I decided to do the more sensible and standard thing for managed C# code this time. This is now an extension method on the managed string type, called CompareNumeric. The logic is essentially the same as yesterday’s code, and I will describe my changes below…

C# Numeric Sort Implementation
  1. public static int CompareNumeric(this string s, string other)
  2. {
  3.     if (s != null && other != null &&
  4.         (s = s.Replace(” “, string.Empty)).Length > 0 &&
  5.         (other = other.Replace(” “, string.Empty)).Length > 0)
  6.     {
  7.         int sIndex = 0, otherIndex = 0;
  8.  
  9.         while (sIndex < s.Length)
  10.         {
  11.             if (otherIndex >= other.Length)
  12.                 return 1;
  13.  
  14.             if (char.IsDigit(s[sIndex]))
  15.             {
  16.                 if (!char.IsDigit(other[otherIndex]))
  17.                     return -1;
  18.  
  19.                 // Compare the numbers
  20.                 StringBuilder sBuilder = new StringBuilder(), otherBuilder = new StringBuilder();
  21.  
  22.                 while (sIndex < s.Length && char.IsDigit(s[sIndex]))
  23.                 {
  24.                     sBuilder.Append(s[sIndex++]);
  25.                 }
  26.  
  27.                 while (otherIndex < other.Length && char.IsDigit(other[otherIndex]))
  28.                 {
  29.                     otherBuilder.Append(other[otherIndex++]);
  30.                 }
  31.  
  32.                 long sValue = 0L, otherValue = 0L;
  33.  
  34.                 try
  35.                 {
  36.                     sValue = Convert.ToInt64(sBuilder.ToString());
  37.                 }
  38.                 catch (OverflowException) { sValue = Int64.MaxValue; }
  39.  
  40.                 try
  41.                 {
  42.                     otherValue = Convert.ToInt64(otherBuilder.ToString());
  43.                 }
  44.                 catch (OverflowException) { otherValue = Int64.MaxValue; }
  45.  
  46.                 if (sValue < otherValue)
  47.                     return -1;
  48.                 else if (sValue > otherValue)
  49.                     return 1;
  50.             }
  51.             else if (char.IsDigit(other[otherIndex]))
  52.                 return 1;
  53.             else
  54.             {
  55.                 int difference = string.Compare(s[sIndex].ToString(), other[otherIndex].ToString(), StringComparison.InvariantCultureIgnoreCase);
  56.  
  57.                 if (difference > 0)
  58.                     return 1;
  59.                 else if (difference < 0)
  60.                     return -1;
  61.  
  62.                 sIndex++;
  63.                 otherIndex++;
  64.             }
  65.         }
  66.  
  67.         if (otherIndex < other.Length)
  68.             return -1;
  69.     }
  70.  
  71.     return 0;
  72. }

 

Changes

  1. The first thing it now does after checking that the strings are not null, is to remove spaces. (And thus now it sorts exactly the same as Windows Explorer.)
  2. Following removing characters, out of habit I then check that the strings’ lengths are greater than zero. Of course zero length file names (or file names consisting of only spaces) are illegal anyway. (Still, it’s a good habit.)
  3. Yesterday I made a joke of deliberately not using a StringBuilder to build strings. This time, I did it properly.
  4. Yesterday’s code used 32-bit integers. Now it uses 64-bit integers. Any numeric file name string that evaluates to larger than a long integer is now treated as Int64.MaxValue. And if you have more than one in the same directory (Why would anybody do that?) then they will all be considered equal. (This may differ to the Microsoft StrCmpLogicalW implementation, and hence the Windows Explorer sorting. Who knows? Not me. Nor do I care.)
  5. When parsing the numbers to compare, yesterday’s code included an unnecessary side-effect of converting the C code. That is, the C code used StrToIntEx, which converts all valid characters following the pointer. But the C developer still needed to then increment their original pointer past the already converted characters. (I converted the characters in a for loop, then incremented my index just like they did their pointer.) Looking at it again, this time I just used a simple while loop, incrementing my index at the same time – thus no need for the code denoted by the “Skip” comment of yesterday’s code.

Btw, as mentioned in item 3 (referring the previous code), I reserve the right to do stupid non-standard things in my code, such as building a string from characters using a List<char> instead of a StringBuilder. I always do such things deliberately, and call it out. I trust that anyone who uses my code has enough brains to be aware of such things and be able to modify it if needed, where needed. (I hate when people just copy and paste code, with zero understanding. The fun in programming is in figuring things out yourself, and when you can’t do it yourself, in learning how.)

I hope that this code may be useful to someone…

One thing I must comment on, after this exercise… I was surprised that I didn’t find anyone else who, when trying to implement it in C#, simply converted the C code. (OK – I don’t know how long the Wine implementation has been online.) What also surprised me was the lack of decent C to C# conversion tools online. Two that I found that promised to convert the code failed miserably, and this was a simple function. Just look at what it does (read the original C code posted yesterday). You won’t find simpler C code than that! Sure, there is no such function as isDigitW in managed code, but you don’t need to be a super-genius to figure it out. Uncle Google will happily tell you, if you can’t just guess the obvious all by yourself. Then again, converting such code will be much harder for a program than for a human being. That is, where the C code merely checks for valid character pointers, it was obvious to me that when it first does this, it means to check for valid string pointers, whereas every subsequent check is actually that the pointers are still valid after incrementing them – which translates nicely to checking if my indexes are still within each string in C#. But how would one tell the difference when creating a conversion program? I don’t know.

FYI, this is probably the first and last time that I’ll ever write so much tedious and unnecessary detail. Having been retrenched, but required to work my notice period, I’m sitting here bored out of my mind, waiting for the month to end so that I can start my new job.

The result of sorting  this way (in my Windows Forms application)

And lastly, to give this post some colour and show the results of this code, here are a couple of screenshots. First Windows Explorer, then the browser from my RomyView application (the download has been updated with the code from this post) – both showing thumbnail view while sorting by type on the same directory…

WindowsExplorer

RomyViewBrowser

Advertisements

About Jerome

I am a senior C# developer in Johannesburg, South Africa. I am also a recovering addict, who spent nearly eight years using methamphetamine. I write on my recovery blog about my lessons learned and sometimes give advice to others who have made similar mistakes, often from my viewpoint as an atheist, and I also write some C# programming articles on my programming blog.
This entry was posted in Programming and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s