merge sort

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

merge sort

acwest
I have written an implementation of merge sort, which I am using to
sort a list of polygon vertex pairs by the distance between each pair.
The problem I am having is that when the list goes over about 350
pairs, I blow up the stack. The problem isn't the recursion in the
sort function, it is the merge function. It seems like there is a
better way to do it, that I am missing.
Here is a simplified version of my code This only sorts direct
comparables, the version I am using is fancier, but this version blows
up at about 5000 values.
Does anybody know a good way of implementing the merge function to be
less recursive? I am limited to the 2015.03 version of OpenSCAD, as my
final code has to work in the Thingiverse customiser

COUNT = 100;

function fixStartIndex(v, index) = index < 0 ? len(v) + index : index;

function fixEndIndex(v, index) = index < 0 ? len(v) + index + 1 : index;

function sublist(list, start, end = -1) =
  let(
    startIndex = fixStartIndex(list, start),
    endIndex = fixEndIndex(list, end)
  )
  (len(list) < startIndex) ?
    undef
  : (endIndex > len(list)) ?
    undef
  : (endIndex < startIndex) ?
    undef
  : startIndex == endIndex ?
    []
  :
    [
      for (i = [startIndex : endIndex - 1])
        list[i]
    ];

function merge(left, right, l = 0, r = 0) =
  l >= len(left) && r >= len(right) ?
    []
  : l >= len(left) ?
    sublist(right, r)
  : r >= len(right) ?
    sublist(left, l)
  : left[l] <= right[r]  ?
    concat(
      [ left[l] ],
      merge(left, right, l = l + 1, r = r)
    )
  :
    concat(
      [ right[r] ],
      merge(left, right, l = l, r = r + 1)
    );

function mergeSort(v) =
  len(v) < 2 ?
    v
  :
    let (
      pivot = ceil(len(v) / 2),
      left = sublist(v, 0, pivot),
      right = sublist(v, pivot)
    )
    merge(mergeSort(left), mergeSort(right));

module test(v) {
  echo("unsorted", v);
  echo("sorted", mergeSort(v));
}

v = rands(0, 1000, COUNT);
test(v);

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: merge sort

adrianv
Do you have a specific reason not to just use quicksort, which is easily
implemented?  Like do you need your sort to be stable?  

I can't think of a nonrecursive way to write the merge function, but it
seems like you could rewrite it using tail recursion, which would solve the
recursion depth problem.   To get the advantage your function has to have
the form

function merge(...) = test ? merge(....) : alternative;

or

function merge(...) = test ? alternative : merge(...);

So you need to accumulate the answer in a parameter, something like

function merge(left,right,l=0, r=0, result=[]) = l >= len(left) && r>=
len(right) ? result :
     merge(left,right,l+(r>=len(right) || left[l]<=right[r]?1:0),
r+(l>=len(left) || left[l]>right[r]?1:0), concat(result, [r>=len(right) ||
left[i]<=right[r]?left[l]:right[r]));

The above code was just a stab to get you started.  Totally untested.  But
the basic method should work.  You can't use "let", so everything has to be
crammed into the recursive function call.  

Actually it crosses my mind that I don't know when tail recursion unwrapping
was added.  If it's not in the old version you need to use then I think
you're out of luck and you need to change algorithms.  

 



--
Sent from: http://forum.openscad.org/

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: merge sort

acwest
I tend to avoid quick sort because of the stability problem, as at
least partially sorted data is rather more common than not in the real
world.
I didn't think tail recursion unrolling was implemented in openSCAD 2015.03?

On Sat, Apr 13, 2019 at 9:29 AM adrianv <[hidden email]> wrote:

>
> Do you have a specific reason not to just use quicksort, which is easily
> implemented?  Like do you need your sort to be stable?
>
> I can't think of a nonrecursive way to write the merge function, but it
> seems like you could rewrite it using tail recursion, which would solve the
> recursion depth problem.   To get the advantage your function has to have
> the form
>
> function merge(...) = test ? merge(....) : alternative;
>
> or
>
> function merge(...) = test ? alternative : merge(...);
>
> So you need to accumulate the answer in a parameter, something like
>
> function merge(left,right,l=0, r=0, result=[]) = l >= len(left) && r>=
> len(right) ? result :
>      merge(left,right,l+(r>=len(right) || left[l]<=right[r]?1:0),
> r+(l>=len(left) || left[l]>right[r]?1:0), concat(result, [r>=len(right) ||
> left[i]<=right[r]?left[l]:right[r]));
>
> The above code was just a stab to get you started.  Totally untested.  But
> the basic method should work.  You can't use "let", so everything has to be
> crammed into the recursive function call.
>
> Actually it crosses my mind that I don't know when tail recursion unwrapping
> was added.  If it's not in the old version you need to use then I think
> you're out of luck and you need to change algorithms.
>
>
>
>
>
> --
> Sent from: http://forum.openscad.org/
>
> _______________________________________________
> OpenSCAD mailing list
> [hidden email]
> http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: merge sort

adrianv
The question with stability is do you care that items equal in the input list
might get permuted in the output list.  If the answer is yes, then stability
matters.  If the answer is no, then it does not.  Quicksort performance on
partially sorted data should be very good, because pivot selection will be
reasonable and the data will be divided in half.  If you implement mergesort
(with tail recursion unrolling) I'd be interested in seeing if it can
compete with quicksort.   The advantage of mergesort is a bound on worst
case run time instead of just average run time.  (Quicksort could fail if
the data were perversely organized.)  




--
Sent from: http://forum.openscad.org/

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: merge sort

acwest
The biggest problem with Quicksort is that partially sorted data is
the particular case which qualifies as "perversely organised". On the
other hand, I was looking at some example timing runs, and it isn't
that bad, and the quicksort implementatio  is very simple....
I am working on a tail-recursive version of the merge function, bit I
have the problem of the call to concat. I think that wil always break
tail recursion, at least by OpenSCAD's limited ability to optimise it.

On Sat, Apr 13, 2019 at 10:45 AM adrianv <[hidden email]> wrote:

>
> The question with stability is do you care that items equal in the input list
> might get permuted in the output list.  If the answer is yes, then stability
> matters.  If the answer is no, then it does not.  Quicksort performance on
> partially sorted data should be very good, because pivot selection will be
> reasonable and the data will be divided in half.  If you implement mergesort
> (with tail recursion unrolling) I'd be interested in seeing if it can
> compete with quicksort.   The advantage of mergesort is a bound on worst
> case run time instead of just average run time.  (Quicksort could fail if
> the data were perversely organized.)
>
>
>
>
> --
> Sent from: http://forum.openscad.org/
>
> _______________________________________________
> OpenSCAD mailing list
> [hidden email]
> http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: merge sort

acwest
I appear to have been misguided, in that Quicksort performance even
with completely sorted data isn't that bad. I am still curious if
there is a better way to write the merge function, though

On Sat, Apr 13, 2019 at 11:04 AM A. Craig West <[hidden email]> wrote:

>
> The biggest problem with Quicksort is that partially sorted data is
> the particular case which qualifies as "perversely organised". On the
> other hand, I was looking at some example timing runs, and it isn't
> that bad, and the quicksort implementatio  is very simple....
> I am working on a tail-recursive version of the merge function, bit I
> have the problem of the call to concat. I think that wil always break
> tail recursion, at least by OpenSCAD's limited ability to optimise it.
>
> On Sat, Apr 13, 2019 at 10:45 AM adrianv <[hidden email]> wrote:
> >
> > The question with stability is do you care that items equal in the input list
> > might get permuted in the output list.  If the answer is yes, then stability
> > matters.  If the answer is no, then it does not.  Quicksort performance on
> > partially sorted data should be very good, because pivot selection will be
> > reasonable and the data will be divided in half.  If you implement mergesort
> > (with tail recursion unrolling) I'd be interested in seeing if it can
> > compete with quicksort.   The advantage of mergesort is a bound on worst
> > case run time instead of just average run time.  (Quicksort could fail if
> > the data were perversely organized.)
> >
> >
> >
> >
> > --
> > Sent from: http://forum.openscad.org/
> >
> > _______________________________________________
> > OpenSCAD mailing list
> > [hidden email]
> > http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: merge sort

adrianv
For quicksort, "perverse" is whatever order of data causes the pivot to be
selected so that all the data is on one side at every iteration.  Then
quicksort turns into an n^2 algorithm.   There are different ways of
selecting the pivot, so the exact definition of the perverse input may vary.
But if you pick the pivot to be the middle data value then already sorted
data is the ideal input.  So for example, if you picked the pivot to be the
first entry in the input then sorted data would in fact be the perverse
input.  

The call to concat will not break tail recursion as long as it happens
*within* the recursive call.  That's why you have to organize the code like
I showed in my example.   Consider this simpler case:

function sum_bad(v,i=0) = i==len(v) ? 0 : v[i]+sum_bad(v,i+1);

This function will *not* get tail recursion optimization.  How can it be
fixed?  Like this:

function sum_good(v,i=0,total=0) i==len(v) ? total :
sum_good(v,i+1,total+v[i]);

In this case, the accumulating operation happens within the function
argument, so that is fine.  You can do the same thing with the mergesort,
following the example code I showed to rewrite your merge() function so that
the concatenation happens in the argument and the list is accumulating as an
argument instead of as a return value.   It's not pretty to pack all the
computation into the function arguments, and it kind of seems inefficient to
have to repeat your conditions over and over again, but I think it will
work.  

This will work for the function that combines two lists (the merging
operation).  You cannot do this for the recursion that calls mergesort once
on each half of the data.  But that should be OK because you only need
log(n) depth on that recursion.  




--
Sent from: http://forum.openscad.org/

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: merge sort

Ronaldo
That is the implementation of merge sort I have done some years ago. The merge function is tail recursive.

// ********************** Merge sort ****************************** //
    // sort the array arr in ascending order (only) of the arr[] values 
    // if keypos==undef and of the key arr[][keypos] otherwise.
    // if keypos==undef, arr is supposed to be an array of numbers
    // otherwise, arr[i] is dealed as an array of length greater or equal to keypos
               
function merge_sort(arr, keypos=undef) = 
    !( len(arr) > 1) ? arr :
        let ( left  = [ for(i=[0:floor(len(arr)/2)-1]) arr[i] ] ,
              right = [ for(i=[floor(len(arr)/2):len(arr)-1]) arr[i] ] ,
              mlft  = merge_sort(left, keypos) ,
              mrgt  = merge_sort(right, keypos) )
        merge(left=mlft, right=mrgt, kp=keypos);
// merge the lists left and right from position j[0] and j[1] respectively; 
// _acc is the list with the merge already done
function merge(left, right, kp, j=[0, 0], _acc=[]) =
    j[0] >= len(left) && j[1]>= len(right) ? 
        _acc:
        merge( left  = left ,
               right = right ,
               kp    = kp,
               j     = next_js(j, left, right, kp),
               _acc  = concat(  _acc, 
                                next_entry(j, left, right, kp)
                             )
             );
 // choose the next value of indices j
 function next_js(j, lft, rgt, kp) =
    let( jl = j[0], jr = j[1] )
    jl >= len(lft) || jr >= len(rgt) ? // any partition is done ? 
        [jl+1, jr+1] :   // advance both pointers
        kp == undef ?    // is there a key ? // select the partition to take from
            lft[jl]     > rgt[jr] ?     [jl, jr+1]: [jl+1, jr] : 
            lft[jl][kp] > rgt[jr][kp] ? [jl, jr+1]: [jl+1, jr] 
        ;
        
// choose whose partition the next entry will be taken from 
function next_entry(j, lft, lrt, kp) =
    let( jl = j[0], jr = j[0] )
    jl >= len(lft) ?     [ lrt[jr] ] :   // partition left is done ? no, take from right
        jr >= len(lrt) ? [ lft[jl] ] :   // partition right is done ? no, take from left
            kp == undef ?      // is there a key ?
                lft[jl]     > lrt[jr] ?     [ lrt[jr] ] : [ lft[jl] ] : 
                lft[jl][kp] > lrt[jr][kp] ? [ lrt[jr] ] : [ lft[jl] ] ;

I have not used this code since then because quick_sort is much, much faster. For an array with 10000 random entries, merge_sort spends 9sec. while quick_sort nearly spends nothing.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: merge sort

Ronaldo
It seems I have picked a wrong version. The correct version has a different next_entry() function:

function next_entry(j, lft, lrt, kp) =
    let( jl = j[0], jr = j[1] )
    jl >= len(lft) ?     [ lrt[jr] ] :   // partition left is done ? no, take from right
        jr >= len(lrt) ? [ lft[jl] ] :   // partition right is done ? no, take from left
            kp == undef ?      // is there a key ?
                lft[jl]     > lrt[jr] ?     [ lrt[jr] ] : [ lft[jl] ] : 
                lft[jl][kp] > lrt[jr][kp] ? [ lrt[jr] ] : [ lft[jl] ] ;
                


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org