Quantcast

Performance test - 'manyballs'

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Performance test - 'manyballs'

cacb
I have been playing a bit with performance testing and came up with this
very simple test that I call "manyballs" because it simply creates a
grid of NxNxN slightly overlapping balls (implicit unions here).

By setting n to the desired value, you get a small or large problem to
solve.

OpenSCAD script 'manyballs.scad':
---------------------------------
module manyballs(n)
{
    $fn=25;
    delta = 45;
    for(i=[0:1:n-1]) {
       x = i*delta;
       for(j=[0:1:n-1]) {
          y = j*delta;
          for(k=[0:1:n-1]) {
             z = k*delta;
             translate([x,y,z])sphere(25);
          }
       }
    }
}

manyballs(n=7);
--------------------------------

Some results on my new Linux Kubuntu 17.04 machine with 32GB ram and 4
CPUs, OpenSCAD 2017.05.03.nightly


N    sec     nTri    nTri/sec
1      2      602    301
2      8     4200    525
3     44    13590    308
4    120    31520    262
5    299    60750    203
6    526   104040    197
7    900   164150    182
8 ...

'sec' is the number of seconds reported by OpenSCAD and nTri is the
number of facets (triangles) created. As can be seen, the number is
around 200 triangles per sec for the larger ones.

I have replicated more or less equivalent tests for 'xcsg' (
https://github.com/arnholm/xcsg ) ref. the 'sample_files' folder there
for relevant input files. This is after implementing "basic"
multi-threading in booleans. I think the numbers are interesting. Only
whole seconds counted so the first tests are a bit inaccurate.


N    sec    nTri     nTri/sec
1      0       728    ~
2      1      5088    5088.00
3      1     16344   16344.00
4      4     37760    9440.00
5      7     72600   10371.43
6     13    124128    9548.31
7     22    195608    8891.27
8     32    290304    9072.00
9     47    411480    8754.89
10    69    562400    8150.72
11    99    746328    7538.67
12   125    966528    7732.22
13   169   1226264    7256.00
14   208   1528800    7350.00
15   269   1877400    6979.18
16   335   2275328    6792.02


Carsten Arnholm

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Ivo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance test - 'manyballs'

Ivo
xcsg seems *very* interesting.

Is there a way to get openscad models to xcsg ?

Can xcsg import stl ?

Kind regards,
Ivo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance test - 'manyballs'

cacb
On 13. mai 2017 22:36, Ivo wrote:
> xcsg seems *very* interesting.
>
> Is there a way to get openscad models to xcsg ?
>
> Can xcsg import stl ?

xcsg requires .xcsg files which is an XML protocol, ref. the wiki
https://github.com/arnholm/xcsg/wiki

.xcsg files can contain polyhedron objects, but processing STL "triangle
soup" is not part of the scope. It must be done elsewhere.

Carsten Arnholm

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance test - 'manyballs'

wolf
In reply to this post by cacb
First of all, congratulations for demonstrating that boolean operations can be done much faster than what OpenSCAD currently achieves, forty times as fast. This compares very favorably with the two times improvement that Michael@Oz has reported for parallelizing CGAL.
Though: My expectations for what can be achieved by switching to a different way of doing boolean operations (1000..100 000 times as fast) have not been fulfilled. Preliminary tests show that this speed increase ought to be possible, but my own code is in a very early stage, so nothing can be said with certainty. What I can say with reasonable certainty is that parallelization is essentially a waste of time and effort. The problem is that boolean operations are, at heart, a search-and-find operation. They contain far to many comparison operations (if statements) to effectively choke the supply pipe for the CPU. That's why Michael could only achieve a two-times speed increase in spite of throwing four CPU's (or was it eight?) at the task.

What I am missing in your test code is testing for robustness.
The critical part of boolean operations is dealing with the intersection of near-parallel faces, i.e. when rounding errors become the major consideration for code stability. CGAL has been designed to cope well with this situation. CGAL makes no rounding errors, and that is the reason why it is so slow. OpenSCAD, unfortunately and, because it does not consider its own rounding errors, does a rather poor job at converting floating point numbers into CGAL's gmpq number format, and back again into floating point. These poor number format conversions are fundamentally why OpenSCAD users experience issues with degenerated triangles, "non-manifoldness", and other CGAL assertion violations, and why users need to anticipate unwarranted OpenSCAD behavior to stay clear of assertion violations.
In section 8.2 of his report, Peter Hachenberger, one of the principal authors of CGAL, describes a test (build a cylinder as an n-sided polyhedron, rotate it by half the width of its side, and union it with the original cylinder). How many sides does xcsg permit the cylinder to have before the union operation fails? Hachenberger has tested CGAL to up to ten thousand sides and reports no failure.

Example: try this little program:
x=1/3;
y=2/3;
z=1/9;
echo((y-2*x)/(y-6*z));
echo(x,y,z);
echo(( 0.666667-2* 0.333333)/(0.666667-6*0.111111));
and see how far rounding errors can falsify your code. We had quite a few examples on these pages of users baffled by the output of their programs, due to rounding errors they presumed would not happen.

Another question I have is the time you report for manyballs(1). Two seconds is way too slow, that should have been zero seconds. Will you confirm?

wolf
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance test - 'manyballs'

cacb
On 15. mai 2017 02:26, wolf wrote:
> First of all, congratulations for demonstrating that boolean operations can
> be done much faster than what OpenSCAD currently achieves, forty times as
> fast.

Thank you for the thoughtful reply, most appreciated! I will answer to
the points you bring up separately.

> Another question I have is the time you report for manyballs(1). Two seconds
> is way too slow, that should have been zero seconds. Will you confirm?

Yes, when repeated I get that value now. It is of little importance on
the smallest values of N and zero for the timing is certainly not exact :-)

Carsten Arnholm




_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance test - 'manyballs'

cacb
In reply to this post by wolf
On 15. mai 2017 02:26, wolf wrote:
> Though: My expectations for what can be achieved by switching to a different
> way of doing boolean operations (1000..100 000 times as fast) have not been
> fulfilled. Preliminary tests show that this speed increase ought to be
> possible, but my own code is in a very early stage, so nothing can be said
> with certainty.

I will be interested when you can demonstrate a factor 1000..100 000
speedup. Such numbers require a radically different approach, processing
on the GPU or whatever. Please demonstrate it with the manyballs(n)
example :-)

> What I can say with reasonable certainty is that
> parallelization is essentially a waste of time and effort. The problem is
> that boolean operations are, at heart, a search-and-find operation. They
> contain far to many comparison operations (if statements) to effectively
> choke the supply pipe for the CPU. That's why Michael could only achieve a
> two-times speed increase in spite of throwing four CPU's (or was it eight?)
> at the task.

On the issue of "parallelization is essentially a waste of time and
effort", I don't necessarily agree when taking into account who's time
is being wasted. Even if a developer spent a couple of summer months
speeding up OpenSCAD by a factor of 2-3, users would appreciate it every
time they used the software. A theoretical ideal should not stand in the
way of practical improvements. In the case of xcsg the effort was far
less. It took no more than a couple of evening hours to implement
multi-threading in an almost simplistic way which provided a speedup of
similar magnitude.

The factor 40x is not mainly due to parallelization, but due to the
characteristics of CGAL vs. Carve and possibly other things. The xcsg
numbers also include STL file export and the general number of triangles
are higher in xcsg for each value of N, so 40x is somewhat conservative.

Carsten Arnholm

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance test - 'manyballs'

cacb
In reply to this post by wolf
On 15. mai 2017 02:26, wolf wrote:
> What I am missing in your test code is testing for robustness.
> The critical part of boolean operations is dealing with the intersection of
> near-parallel faces, i.e. when rounding errors become the major
> consideration for code stability. CGAL has been designed to cope well with
> this  situation

Robustness is important, but it isn't a numerical problem only. A
question is whether fractional numbers in OpenSCAD/CGAL is rewarding the
time and other resources paid by users. In my opinion the cost is
sometimes high in time and memory use compared to the reward you get
from using it. Things can fail in more than one way. I have had OpenSCAD
models fail because it exhausted the memory available, probably due to
the needs of the data structure intended to provide robustness.

xcsg is designed to be much less memory hungry.

I fully agree with the consideration for near parallel faces, it can
cause problems. Numerical issues are very real, but other issues can be
dominating/fatal too.

> <http://www.win.tue.nl/~phachenb/Nef/Nef3D-OptimizedImplementation.pdf>  .
> CGAL makes no rounding errors, and that is the reason why it is so slow.
> OpenSCAD, unfortunately and, because it does not consider its own rounding
> errors, does a rather poor job at converting floating point numbers into
> CGAL's gmpq number format, and back again into floating point. These poor
> number format conversions are fundamentally why OpenSCAD users experience
> issues with degenerated triangles, "non-manifoldness", and other CGAL
> assertion violations, and why users need to anticipate unwarranted OpenSCAD
> behavior to stay clear of assertion violations.

Yes, agreed. This can negate the goal of no round-off error.

> In section 8.2 of his  report
> <http://www.win.tue.nl/~phachenb/Nef/Nef3D-OptimizedImplementation.pdf>  ,

I notice the authors compare CGAL with ACIS R13. I used to work a lot
with ACIS R13 in Genie (I was the main designer) as mentioned in e.g.
https://www.dnvgl.com/Images/Sesam-Feature-Description_tcm8-58834.pdf

ACIS is very different from Carve and I presume it is also true for CGAL
(from the paper it looks CGAL is complex in a different way). ACIS is a
BREP library with a very complex data structure and it was designed to
efficiently represent much more complex objects than just "partitions of
three space into cells induced by planes". The speed comparison of ACIS
vs. CGAL in the report is thus comparing apples to oranges (they say so
in the report too). On precision ACIS claimed 1E-6, which they did not
always achieve. ACIS is more directly comparable to OpenCasCade.

 > Peter Hachenberger, one of the principal authors of CGAL, describes
 > a test (build a cylinder as an n-sided polyhedron, rotate it by half
 > the width of its side, and union it with the original cylinder).
> How many sides does xcsg
> permit the cylinder to have before the union operation fails? Hachenberger
> has tested CGAL to up to ten thousand sides and reports no failure.

I will give it a try. I suspect this will be a test in favour of CGAL
but we shall see :-)

> *Example*: try this little program:
> x=1/3;
> y=2/3;
> z=1/9;
> echo((y-2*x)/(y-6*z));
> echo(x,y,z);
> echo(( 0.666667-2* 0.333333)/(0.666667-6*0.111111));
> and see how far rounding errors can falsify your code. We had quite a few
> examples on these pages of users baffled by the output of their programs,
> due to rounding errors they presumed would not happen.

I see no surprise in this.  Formatting double precision values as text
with 6 significant digits will indeed cause major round-off

Carsten Arnholm


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance test - 'manyballs'

cacb
On 15. mai 2017 20:02, Carsten Arnholm wrote:
> On 15. mai 2017 02:26, wolf wrote:
>>> How many sides does xcsg
>> permit the cylinder to have before the union operation fails?
>
> I will give it a try. I suspect this will be a test in favour of CGAL
> but we shall see :-)

I generated the tests and got some surprising results.

A "cylinder" with radius=50 and height=50 was used. It was generated
from a polygon with N sides, where N = 100, 1000, 2000. The polygon was
linear_extruded to create the cylinder C

For the values of N, the copy of C called C' was rotated for the angles
alpha = 1E-1, 1E-2, 1E-3, 1E-4, 1E-5, 1E-6 [degrees]

Then a union of C and C' was computed for each combination of N and alpha.

The surprising thing is that almost all cases worked, except N=100,
alpha=1E-6 which resulted in a non-manifold model. Due to automatic
merging of co-planar faces after booleans, some of the cases have fewer
resulting facets. But the models are OK.

The other surprise was that the computations were quite slow, with
numbers comparable to the old report with older hardware. By the nature
of the tests with only one union, there was no multi-threading going on.

Carsten Arnholm

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org

Rotcylinder_test.png (62K) Download Attachment
Loading...