Digging into search( )

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Digging into search( )

runsun
This post was updated on .
Spent some time digging into the built-in search() :
    search( match_value , string_or_vector [, num_returns_per_match [, index_col_num ] ] )

Conclusion first: complicated, buggy, unpredictable, giving out unnecessary warnings.


Take notes for 2 points on its design:

1. match_value is set to:

1.1. Can be a single value or vector of values.
1.2. Strings are treated as vectors-of-characters to iterate over;
1.3. If match_value is a vector of strings, search will look for exact string matches.

In practical, match_value can be: list, number, string (treated as a collection of characters)

2. The return should be either a list, when: num_returns_per_match is unset or set to 1:
      search( "a","abcdabcd" )= [0]
      search( "abc","abcdabcd" )= [0, 1, 2]
or a list of lists, when: num_returns_per_match is set to anything other than 1 :

     search( "a","abcdabcd",0 )= [[0, 4]]
     search( "abc","abcdabcd",0 )= [[0, 4], [1, 5], [2, 6]] 

     data4= [["a", 1], ["b", 2], ["c", 3], ["a", 4], ["b", 5]]
     search( "abc",data4,2 )= [[0, 3], [1, 4], [2]]



Observations:

A. Since a match_value is treated as a list of chars, the following 2 should give same results:

    search( "abc","abcdabcd" )= [0, 1, 2]
    search( ["a","b","c"],"abcdabcd" ) want: [0, 1, 2] got: [[], [], []]
B. The following searches give same return. Users have no way to know what are matched:

    search( "bc","abcdabcd" )= [1, 2]
    search( "xbc","abcdabcd" )= [1, 2] 
    search( "xbzjck","abcdabcd" )= [1, 2]
C. Users can't possibly predict the following return:

    data9= [ ["cat", 1], ["b", 2], ["c", 3], ["dog", 4]
                , ["a", 5], ["b", 6], ["c", 7], ["d", 8]
                , ["e", 9], ["apple", 10], ["a", 11] ] 
    q= ["b", "zzz", "a", "c", "apple", "dog"]

    search( "cat",data9 ) want: [2,4] got: [0, 4]
This also gives out a warning: WARNING: search term not found: "t"

D. This is unpredictable, too:

     a1= [["ab",1],["bc",2],["cd",3]] 

     search( "ab", a1) want: [ ] got: [0, 1]
E. This gives correct answer, but showing two warnings:

     WARNING: search term not found: "p"
     WARNING: search term not found: "q"

     search( "pq",a1 )= [ ]
F. Inconsistent return when match not found:
    search( "e","abcdabcd" )= [] 
    search( ["zzz"],data9 )= want: [] got: [[]] 
Since [[]] is treated as true, the following would be impossible:
    search(...) ? do_found : do_not_found
So what I think so far are:
1) It is very difficult to understand how it works.
2) It is still buggy.
3) Would make a lot of effort for users to understand it, and a lot of effort trying to debug, if possible.

My conclusion is that search() is still buggy and it would probably not a good idea for it in the release yet.

What I believe is that it tries to accommodate too many usages in a single function. For example, match_value could have been designed as just a value (one of number, string or list), but not list of values. Since we have list comprehension, it'd be extremely easy to achieve this:
   [ for (m in match_values) search( m, ...) ]
This will take away a large chunk of complication inside the coding of search().
$ Runsun Pan, PhD
$ libs: scadx, doctest, faces(git), offline doc(git), runscad.py(2,git), editor of choice: CudaText ( OpenSCAD lexer); $ Tips; $ Snippets
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

MichaelAtOz
Administrator
runsun wrote
    1.2. Strings are treated as vectors-of-characters to iterate over;</br>
    1.3. If match_value is a vector of strings, search will look for exact string matches.</br>

Observations:</br>
</br>
A. Since a match_value is treated as a list of chars, the following 2 should give same results:<pre><code>
    search( "abc","abcdabcd" )= [0, 1, 2]
    search( ["a","b","c"],"abcdabcd" ) want: [0, 1, 2] got: [[], [], []]
See 1.3 above. "a" <> "abcdabcd"
</code></pre>
B. The following searches give same return. Users have no way to know what are matched:<pre><code>
    search( "bc","abcdabcd" )= [1, 2]
    search( "xbc","abcdabcd" )= [1, 2]
    search( "xbzjck","abcdabcd" )= [1, 2]
</code></pre>
See 1.2 above, b is at pos 1, c is at pos 2.
wiki: By default, search only looks for one match per element of match_value to return as a list of indices
C. Users can't possibly predict the following return:<pre><code>
    data9= [ ["cat", 1], ["b", 2], ["c", 3], ["dog", 4]
                , ["a", 5], ["b", 6], ["c", 7], ["d", 8]
                , ["e", 9], ["apple", 10], ["a", 11] ]
    q= ["b", "zzz", "a", "c", "apple", "dog"]

    search( "cat",data9 ) want: [2,4] got: [0, 4]
</code></pre>   
    This also gives out a warning:  WARNING: search term not found: "t"</br>
</br>
1.2 again, c is at 0, a is at 4, t is not found.
D. This is unpredictable, too:<pre><code>
     a1= [["ab",1],["bc",2],["cd",3]]

     search( "ab", a1) want: [ ] got: [0, 1]
</code></pre>
1.2 again, a is at 0,b is at 1.

Wiki:
If num_returns_per_match = 0, search returns a list of lists of all matching index values for each element of match_value.

index_col_num (default: 0): When string_or_vector is a vector-of-vectors, multidimensional table or more complex list-of-lists construct, the match_value may not be found in the first (index_col_num=0) column.

So as index_col_num=0 & num_returns_per_match=0, it looks for "a" and "b" in the index_col_num (0) element of the vector a1. Hence [0, 1] is where it found "a" & "b".
E. This gives correct answer, but showing two warnings:   <pre><code>
     WARNING: search term not found: "p"
     WARNING: search term not found: "q"

     search( "pq",a1 )= [ ]
</code></pre>
So, that is what it does.??
F. Inconsistent return when match not found:<pre><code>    search( "e","abcdabcd" )= []
    search( ["zzz"],data9 )= want: [] got: [[]]
</code></pre>
    Since [[]] is treated as true, the following would be impossible:<pre><code>    search(...) ? do_found : do_not_found
</code></pre>
Wiki:
If num_returns_per_match = 0, search returns a list of lists of all matching index values for each element of match_value.

It returna a list, the outside [ ... ], of lists of all matching values, no match=[], hence [ [] ].
So what I think so far are:</br>

1) It is very difficult to understand how it works. </br>
Yes.
2) It is still buggy.</br>
No.
3) Would make a lot of effort for users to understand it, and a lot of effort trying to debug, if possible.  </br>
Yes & No (no need to debug)
My conclusion is that search() is still buggy and it would probably not a good idea for it in the release yet. </br>
It has been released for a looong time. I suspect you may be reacting to the change to remove the Warnings??
What I believe is that it tries to accommodate too many usages in a single function. For example, match_value could have been designed as just a value (one of number, string or list), but not list of values. Since we have list comprehension, it'd be extremely easy to achieve this:<pre><code>   [ for (m in match_values) search( m, ...) ]
</code></pre>
This will take away a large chunk of complication inside the coding of search().
Admin - email* me if you need anything,
or if I've done something stupid...
* click on my MichaelAtOz label, there is a link to email me.

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work.
Obviously inclusion of works of previous authors is not included in the above.


The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out!
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

clothbot
In reply to this post by runsun
Hi Runsun,

Let me lead by saying thank you for taking the time to collect and share your thoughts and observations. It is very much appreciated!

A few comments/history behind my original writing of search():

When I wrote it (in 2012):

1. The ‘undef’ didn’t exist as a return option so I settled on returning empty lists which could be detected (list of length 0) as ‘no match’ conditions - it predated the Value rewrite of the code-base.

2. The ‘concat’ list construction operator didn’t exist; I needed a way to search for a string-of-characters (aka. an ordered list of character values) and get the results back in order, as a list.

3. The ‘let’ operator didn’t exist.

4. Lists were statically defined; [ for() … ] dynamically generated lists weren’t possible.

5. Function recursion was (and still is to some degree) dog-slow for more ‘elegant’ list construction approaches.

6. The text() module didn’t exist.
- See example023.scad combined with MCAD/fonts.scad for insight into how I was generating text, and the original motivation behind coding up search().

7. The no-match warnings are gone as of last week; you’ll have to build from source to see that.

It’s not buggy, just written within the constraints of the time. ;-)

All that said, I agree that now would be a good time to simplify+rewrite!


Rough outline of hypothetical simplified behaviour I’ll start looking at implementing:

1. search( substring, string):
- return list of substring match indices

Example 1:

string1=”abcdabcabcdd”;
search(“abc”,string1);
[0,4,7]
search(“efg”,string1);
undef

2. search( fullstring, vector_of_strings):
- return list of indices (set of ‘i’ values) where fullstring == vector_of_strings[i]
- do not attempt substring matches since [ for() …] list traversal and construction works

Example 2:

list2=[“caterpillar”,3,”cat”,2,”dog”,2,”cattle”,5,”cod”,42];
search(“cat”, list2);
[2]
search(2,list2);
[3,5]
search(“bird”,list2);
undef

3. search( match_value, vector_of_vectors [, index_col_num] ):
- return list of indices (set of ‘i’ values) where match_value == vector[i][index_col_num]
- this simplification should make it even more powerful+useful for hash-style table lookup operations

Example 3:

table3 =[ [“caterpillar”,3],[“cat”,2],[“dog”,2],[“cattle”,5],[“cod”,42]];
search(“cat”, table3);
[1]
search(2,table3,1);
[1,2]
search(“bird",table3);
undef

4. search( match_vector, string_or_vector [, index_col_num]):
- deprecate confusing legacy behaviour

Example 4:

search([“abc”],string1);
undef // Throw WARNING about deprecated usage; use new list comprehension capabilities.
search([“cat”],list2);
undef // Throw WARNING about deprecated usage; use new list comprehension capabilities.
search([“cat”],table3);
undef // Throw WARNING about deprecated usage; use new list comprehension capabilities.


Just to re-iterate, thank you for taking the time to collect and share your thoughts and observations. Please *do* continue this!

This is very much in the spirit of keeping OpenSCAD compact, more synthesizable HDL-like than bloating into a poor substitute for a scripting language like Python.

Gotta pick your battles. :-)

Andrew.

On Apr 18, 2015, at 10:39 PM, runsun <[hidden email]> wrote:


Spent some time digging into the built-in search() :

  search( match_value , string_or_vector [, num_returns_per_match [, index_col_num ] ] )

Conclusion first: complicated, buggy, unpredictable, giving out unnecessary warnings. 


Take notes for 2 points on its design:

1.  match_value is set to:
    
    1.1. Can be a single value or vector of values.
    1.2. Strings are treated as vectors-of-characters to iterate over;
    1.3. If match_value is a vector of strings, search will look for exact string matches.

    In practical, match_value can be: list, number, string (treated as a collection of characters)

2. The return should be either a list, when: num_returns_per_match is unset or set to 1:

      search( "a","abcdabcd" )= [0]
      search( "abc","abcdabcd" )= [0, 1, 2]

    or a list of lists, when: num_returns_per_match is set to anything other than 1 :

     search( "a","abcdabcd",0 )= [[0, 4]]
     search( "abc","abcdabcd",0 )= [[0, 4], [1, 5], [2, 6]] 

     data4= [["a", 1], ["b", 2], ["c", 3], ["a", 4], ["b", 5]]
     search( "abc",data4,2 )= [[0, 3], [1, 4], [2]]



Observations:

A. Since a match_value is treated as a list of chars, the following 2 should give same results:

    search( "abc","abcdabcd" )= [0, 1, 2]
    search( ["a","b","c"],"abcdabcd" ) want: [0, 1, 2] got: [[], [], []]

B. The following searches give same return. Users have no way to know what are matched:

    search( "bc","abcdabcd" )= [1, 2]
    search( "xbc","abcdabcd" )= [1, 2] 
    search( "xbzjck","abcdabcd" )= [1, 2]

C. Users can't possibly predict the following return:

    data9= [ ["cat", 1], ["b", 2], ["c", 3], ["dog", 4]
                , ["a", 5], ["b", 6], ["c", 7], ["d", 8]
                , ["e", 9], ["apple", 10], ["a", 11] ] 
    q= ["b", "zzz", "a", "c", "apple", "dog"]

    search( "cat",data9 ) want: [2,4] got: [0, 4]
    
    This also gives out a warning:  WARNING: search term not found: "t"

D. This is unpredictable, too:

     a1= [["ab",1],["bc",2],["cd",3]] 

     search( "ab", a1) want: [ ] got: [0, 1]

E. This gives correct answer, but showing two warnings:
   
     WARNING: search term not found: "p"
     WARNING: search term not found: "q"

     search( "pq",a1 )= [ ]

F. Inconsistent return when match not found:

    search( "e","abcdabcd" )= [] 
    search( ["zzz"],data9 )= [[]] 

    Since [[]] is treated as true, the following would be impossible:

    search(...) ? do_found : do_not_found


So what I think so far are:

1) It is very difficult to understand how it works. 
2) It is still buggy.
3) Would make a lot of effort for users to understand it, and a lot of effort trying to debug, if possible.  

My conclusion is that search() is still buggy and it would probably not a good idea for it in the release yet. 

What I believe is that it tries to accommodate too many usages in a single. For example, match_value could have been designed as just a value (one of number, string or list), but not list of values. Since we have list comprehension, it'd be extremely easy to achieve this:

   [ for (m in match_values) search( m, ...) ]

This will take away a large chunk of complication inside the coding of search().  


$ Runsun Pan, PhD
$ -- OpenScad_DocTest: doc and unit test ( Github, Thingiverse )
$ -- hash parameter model: here, here
$ -- Linux Mint 17.1 Rebecca x64 + OpenSCAD 2015.03.15/2015.04.01.nightly


View this message in context: Digging into search( )
Sent from the OpenSCAD mailing list archive at Nabble.com.
_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

runsun
In reply to this post by MichaelAtOz
Hi Michael, thx for your quick reply. It reveals to me that, even after I spent a lot of effort doing relatively extensive tests trying to understand how to use search(), I still miss something:
   
   (1) search( "abc","abcdabcd" )= [0, 1, 2]
   (2) search( ["a","b","c"],"abcdabcd" ) = [[], [], []]

   The 1st will match individual "a","b","c" to ANY CHARS in "abcdabcd"
   but the 2nd will match the entire "abcdabcd".

That is, given a whole string will match partial string, but given a list of partial strings will match the whole string. This already complicated enough.

I am pretty sure that it already pass the boundary of my limited brain power. But that's not where the complication stops.

   (3) search( ["a","bc","abcdabcd"],"abcdabcd" ) = want: [0] got: [[], [], []]

Didn't we say " If match_value is a vector of strings, search will look for exact string matches" ???

Where is the exact string match of the 3rd item, "abcdabcd" ?

   (4) And another one:

      data9= [ ["cat", 1], ["b", 2], ["c", 3], ["dog", 4]
                , ["a", 5], ["b", 6], ["c", 7], ["d", 8]
                , ["e", 9], ["apple", 10], ["a", 11] ]

      search( "act", data9, 0 )
      |   want:
      |   [[0, 4, 9, 10], [0, 2, 6], [0]]
      |   got:
      |   [[4, 9, 10], [0, 2, 6], []]
   
As we mentioned, "act" is treated as vector of chars, and iterate over. It  means, "a" and "t"
should have found a match in ["cat",1], but they don't.  

   (5) Another example:

   search( "ab",[["ab",1],["bc",2],["cd",3]], 0 )
   |  want:
   |  [[0], [0, 1]]
   |  got:
   |  [[0], [1]]
   search( "bc",[["ab",1],["bc",2],["cd",3]], 0 )
   |  want:
   |  [[0, 1], [1, 2]]
   |  got:
   |  [[1], [2]]
 
It seems to me that, other than the already-complicated rule: "given a whole string will match partial string, but given a list of partial strings will match the whole string", in come cases it matches an item of a list of strings to the BEGINNING of a whole string.  

Up to this point, I am too tired trying to figure out yet another rule.

You mentioned : " I suspect you may be reacting to the change to remove the Warnings?? "

Well, in fact, I haven't even started covering it yet. In the 2nd argument, string_or_vectors, I only covered string and "list of lists", [ ["abc",1], ["def",2]...]. I haven't even covered my real concern : a flat list: [ "abc",1, "def",2... ], where my original "request of suppressing warning" lies on.

Besides, I am using a nightly version in which the warning that bothered me in the first place has already been fixed.
$ Runsun Pan, PhD
$ libs: scadx, doctest, faces(git), offline doc(git), runscad.py(2,git), editor of choice: CudaText ( OpenSCAD lexer); $ Tips; $ Snippets
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

nophead
Yes, search is way too complicated for me to use as well. I just roll my own with recursion if I need to search.

On 19 April 2015 at 20:37, runsun <[hidden email]> wrote:
Hi Michael, thx for your quick reply. It reveals to me that, even after I
spent a lot of effort doing relatively extensive tests trying to understand
how to use search(), I still miss something:

   (1) search( "abc","abcdabcd" )= [0, 1, 2]
   (2) search( ["a","b","c"],"abcdabcd" ) = [[], [], []]

   The 1st will match individual "a","b","c" to ANY CHARS in "abcdabcd"
   but the 2nd will match the entire "abcdabcd".

That is, *given a whole string will match partial string, but given a list
of partial strings will match the whole string*. This already complicated
enough.

I am pretty sure that it already pass the boundary of my limited brain
power. But that's not where the complication stops.

   (3) search( ["a","bc","abcdabcd"],"abcdabcd" ) = want: [0] got: [[], [],
[]]

Didn't we say " If match_value is a vector of strings, search will look for
exact string matches" ???

Where is the exact string match of the 3rd item, "abcdabcd" ?

   (4) And another one:

      data9= [ ["cat", 1], ["b", 2], ["c", 3], ["dog", 4]
                , ["a", 5], ["b", 6], ["c", 7], ["d", 8]
                , ["e", 9], ["apple", 10], ["a", 11] ]

      search( "act", data9, 0 )
      |   want:
      |   [[0, 4, 9, 10], [0, 2, 6], [0]]
      |   got:
      |   [[4, 9, 10], [0, 2, 6], []]

As we mentioned, "act" is treated as vector of chars, and iterate over. It
means, "a" and "t"
should have found a match in ["cat",1], but they don't.

   (5) Another example:

   search( "ab",[["ab",1],["bc",2],["cd",3]], 0 )
   |  want:
   |  [[0], [0, 1]]
   |  got:
   |  [[0], [1]]
   search( "bc",[["ab",1],["bc",2],["cd",3]], 0 )
   |  want:
   |  [[0, 1], [1, 2]]
   |  got:
   |  [[1], [2]]

It seems to me that, other than the already-complicated rule: "*given a
whole string will match partial string, but given a list of partial strings
will match the whole string*", in come cases it matches an item of a list of
strings to the *BEGINNING of a whole string*.

Up to this point, I am too tired trying to figure out yet another rule.

You mentioned : " I suspect you may be reacting to the change to remove the
Warnings?? "

Well, in fact, I haven't even started covering it yet. In the 2nd argument,
/string_or_vectors/, I only covered string and "list of lists", [ ["abc",1],
["def",2]...]. I haven't even covered my real concern : a flat list: [
"abc",1, "def",2... ], where my original "request of suppressing warning"
lies on.

Besides, I am using a nightly version in which the warning that bothered me
in the first place has already been fixed.



-----

$  Runsun Pan, PhD

$ -- OpenScad_DocTest: doc and unit test ( Github , Thingiverse  )

$ -- hash parameter model: here , here

$ -- Linux Mint 17.1 Rebecca x64  + OpenSCAD 2015.03.15/2015.04.01.nightly




--
View this message in context: http://forum.openscad.org/Digging-into-search-tp12421p12432.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

runsun
In reply to this post by clothbot
Hi Andrew, thx for your effort to make search() in the first place.

I forgot to mention that:

(1) my tests were done with the new nightly built, after the unnecessary Wanring is supposed to be suppressed (but, as you can see in my examples above, there are at least two more warnings)

(2) Judging from all examples on the doc, I assume that the design of search() aims at searching matches against a list of lists/vectors:

     [ ["abc",1],["def",2] ...]   #1

but not from a flat list:

     [ "abc",1, "def",2 ...]    #2
 
Therefore, my tests haven't covered #2 situation yet. But as you can see, it's already complicated enough.

I understand that it's not fair to judge its features based on the current spirit of OpenSCAD, since the OpenSCAD has been upgraded a lot. It's good to know that a change in the implementation is under considerations.

clothbot wrote
1. search( substring, string):
        - return list of substring match indices

Example 1:

        string1=”abcdabcabcdd”;
        search(“abc”,string1);
                [0,4,7]
        search(“efg”,string1);
                undef
What's the answer to : search( "ae", string1 ) ? will it be [0, undef] ? or just [0] ?

If it's just [0] (means, the unmatched will just no-show), then, users won't be able to tell what are actually matched ( [0] for "a" or for "e" ???)

Note that this, in fact, is doing list comprehension's job :

    abc = "abc";
    [ for( i=[0:len(abc)-1] ) search( abc[i], string1) ]

means that it is still redundant. I'd suggest to get rid of this feature completely. Means, when match_value="abc", just search for "abc", but not "a", "b", "c".

clothbot wrote
2. search( fullstring, vector_of_strings):
        - return list of indices (set of ‘i’ values) where fullstring == vector_of_strings[i]
        - do not attempt substring matches since [ for() …] list traversal and construction works

Example 2:

        list2=[“caterpillar”,3,”cat”,2,”dog”,2,”cattle”,5,”cod”,42];
        search(“cat”, list2);
                [2]
        search(2,list2);
                [3,5]
        search(“bird”,list2);
                undef
I see that list2 is a flat list. This is new to the original design. Will it be treated the same way in ALL situations as a list of vectors ?

clothbot wrote
3. search( match_value, vector_of_vectors [, index_col_num] ):
        - return list of indices (set of ‘i’ values) where match_value == vector[i][index_col_num]
        - this simplification should make it even more powerful+useful for hash-style table lookup operations

Example 3:

        table3 =[ [“caterpillar”,3],[“cat”,2],[“dog”,2],[“cattle”,5],[“cod”,42]];
        search(“cat”, table3);
                [1]
        search(2,table3,1);
                [1,2]
        search(“bird",table3);
                undef
I believe the 2nd case should have read:  search(2,table3,1,1)

Also note that this will create confusion with Example 1, since in Example 1, a string is treated as a collection of chars, but here it's treated as a whole. It'd be extremely hard for users to remember that.

Still, I'd strongly suggest that not to try to treat a string as iteratible. Iterating though a string match_value creates lots of problems, it's hard for users to understand in the firsts place, and it's hard to come up with a clear answer (See Example 1 above). Thus users might as well roll up their own. IMHO, that defeats the purpose of making it a built-in.

SIDE NOTE: Judging from the existence of lookup and search as built-ins, I strongly believe that a hash-like data structure is needed:
    >>> h={ abc:1, def:2 }
    >>> h["abc"]
    1
    >>> g = update(h, {ghi:3} )
    >>> g
    { abc:1, def:2, ghi:3 }
    >>> keys( g )
    ["abc","def","ghi"]
    >>> values( g )
    [1,2,3]
    >>> kvs( g )
    [ ["abc",1], ["def",2], ["ghi",3] ]
 



_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
$ Runsun Pan, PhD
$ libs: scadx, doctest, faces(git), offline doc(git), runscad.py(2,git), editor of choice: CudaText ( OpenSCAD lexer); $ Tips; $ Snippets
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

runsun
This post was updated on .
In reply to this post by clothbot
clothbot wrote
Example 3:

        table3 =[ [“caterpillar”,3],[“cat”,2],[“dog”,2],[“cattle”,5],[“cod”,42]];
        search(“cat”, table3);
                [1]
        search(2,table3,1);
                [1,2]
        search(“bird",table3);
                undef
Also note that using a flat list like above is error-prone:

   table4 = [ “abc","def","def",3 ];
   search( “def”,table4 )= want:[2] got:[1]


$ Runsun Pan, PhD
$ libs: scadx, doctest, faces(git), offline doc(git), runscad.py(2,git), editor of choice: CudaText ( OpenSCAD lexer); $ Tips; $ Snippets
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

MichaelAtOz
Administrator
In reply to this post by runsun
runsun wrote
Hi Michael, thx for your quick reply. It reveals to me that, even after I spent a lot of effort doing relatively extensive tests trying to understand how to use search(), I still miss something:
   
   (1) search( "abc","abcdabcd" )= [0, 1, 2]
   (2) search( ["a","b","c"],"abcdabcd" ) = [[], [], []]

   The 1st will match individual "a","b","c" to ANY CHARS in "abcdabcd"
   but the 2nd will match the entire "abcdabcd".

That is, given a whole string will match partial string, but given a list of partial strings will match the whole string. This already complicated enough.
(2) is Wiki: Note: If match_value is a vector of strings, search will look for exact string matches.
I am pretty sure that it already pass the boundary of my limited brain power. But that's not where the complication stops.

   (3) search( ["a","bc","abcdabcd"],"abcdabcd" ) = want: [0] got: [[], [], []]

Didn't we say " If match_value is a vector of strings, search will look for exact string matches" ???

Where is the exact string match of the 3rd item, "abcdabcd" ?
Why do you expect [0]?
The "abcdabcd" not found part is unexpected. (bug or not designed for that)
   (4) And another one:

      data9= [ ["cat", 1], ["b", 2], ["c", 3], ["dog", 4]
                , ["a", 5], ["b", 6], ["c", 7], ["d", 8]
                , ["e", 9], ["apple", 10], ["a", 11] ]

      search( "act", data9, 0 )
      |   want:
      |   [[0, 4, 9, 10], [0, 2, 6], [0]]
      |   got:
      |   [[4, 9, 10], [0, 2, 6], []]
   

As we mentioned, "act" is treated as vector of chars, and iterate over. It  means, "a" and "t"
should have found a match in ["cat",1], but they don't.  
wiki:

match_value:
Can be a single value or vector of values.
Strings are treated as vectors-of-characters to iterate over; the search function does not search for substrings.

So "a" & "t" do not match "cat", but "c" does. Not saying it's sensible.
Change "cat" to "tac" and the "t" matches.



   (5) Another example:

   search( "ab",[["ab",1],["bc",2],["cd",3]], 0 )
   |  want:
   |  [[0], [0, 1]]
   |  got:
   |  [[0], [1]]
   search( "bc",[["ab",1],["bc",2],["cd",3]], 0 )
   |  want:
   |  [[0, 1], [1, 2]]
   |  got:
   |  [[1], [2]]
 
It seems to me that, other than the already-complicated rule: "given a whole string will match partial string, but given a list of partial strings will match the whole string", in come cases it matches an item of a list of strings to the BEGINNING of a whole string.  
Seems the match is as you say, the beginning. Possibly an implementation error when the match test was written.
Up to this point, I am too tired trying to figure out yet another rule.

You mentioned : " I suspect you may be reacting to the change to remove the Warnings?? "
That was in regard to you saying it wasn't ready for release. So I presumed you saw the change and assumed search() was not released yet.
Well, in fact, I haven't even started covering it yet. In the 2nd argument, string_or_vectors, I only covered string and "list of lists", [ ["abc",1], ["def",2]...]. I haven't even covered my real concern : a flat list: [ "abc",1, "def",2... ], where my original "request of suppressing warning" lies on.

Besides, I am using a nightly version in which the warning that bothered me in the first place has already been fixed.
search() worked for what I wanted it too do. I wrote that bit at the bottom of the wiki "Getting the right results" to figure it out myself, some time ago.
Admin - email* me if you need anything,
or if I've done something stupid...
* click on my MichaelAtOz label, there is a link to email me.

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work.
Obviously inclusion of works of previous authors is not included in the above.


The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out!
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

MichaelAtOz
Administrator
In reply to this post by runsun
runsun wrote
clothbot wrote
Example 3:

        table3 =[ [“caterpillar”,3],[“cat”,2],[“dog”,2],[“cattle”,5],[“cod”,42]];
        search(“cat”, table3);
                [1]
        search(2,table3,1);
                [1,2]
        search(“bird",table3);
                undef
Also note that using a flat list like above is error-prone:

   table4 = [ “abc","def","def",3 ];
   search( “def”,table4 )= want:[2] got:[1]
Why do you expect [2]?
[1] is the "d" of match_value "def" match to the first "def"
wiki: num_returns_per_match (default: 1)
So it only returns 1 match.
Admin - email* me if you need anything,
or if I've done something stupid...
* click on my MichaelAtOz label, there is a link to email me.

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work.
Obviously inclusion of works of previous authors is not included in the above.


The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out!
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

MichaelAtOz
Administrator
Actually I get

WARNING: Invalid entry in search vector at index 0, required number of values in the entry: 1. Invalid entry: "abc"

table4 = [ “abc","def","def",3 ];
echo(search( “def”,table4 ));  // = want:[2] got:[1]
Admin - email* me if you need anything,
or if I've done something stupid...
* click on my MichaelAtOz label, there is a link to email me.

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work.
Obviously inclusion of works of previous authors is not included in the above.


The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out!
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

clothbot
It's aliiiiive!

First attempt at search simplification (passes regressions) is here:

https://github.com/openscad/openscad/pull/1318

See the "Files Changed" report for how I've simplified the usage:

https://github.com/openscad/openscad/pull/1318/files


I updated the example023.scad to wrap the built-in search() in a user-defined search_vector_one() function to take advantage of simple [for(i=...)] list building:

function search_vector_one(vec,table,col=0) = [for(i=[0:len(vec)-1]) search(vec[i],table,col)[0]];

https://github.com/clothbot/openscad/blob/search_simplify/examples/Old/example023.scad


I used the same search_vector_one() function in text-search-test.scad to make it "just work":

https://github.com/clothbot/openscad/blob/search_simplify/testdata/scad/2D/features/text-search-test.scad


The two "search-tests-unicode.scad" and "search-tests.scad" have been significantly modified to reflect the simplified search behaviour.

https://github.com/clothbot/openscad/blob/search_simplify/testdata/scad/misc/search-tests-unicode.scad

https://github.com/clothbot/openscad/blob/search_simplify/testdata/scad/misc/search-tests.scad


As outlined in the comment here:

   https://github.com/clothbot/openscad/blob/search_simplify/src/func.cc#L667

--snip--

 Pattern:
  "search" "(" match_value  "," string_or_vector_or_table
          ("," index_col_num )?
        ")";
  match_value : ( Value::NUMBER | Value::STRING );
  string_or_vector_or_table : ( Value::STRING | "[" Value ("," Value)* "]" |  "[" ("[" Value ("," Value)* "]")+ "]" );
  index_col_num : int;

--end-snip--

- A string 'match_value' searches for full-string matches.
  - It does *not* iterate over each character in the string and return a list of matches per character any more.

- All matches are returned every time
  - no more 'num_returns_per_match' parameter.
  - use user-defined functions like the above search_vector_one() example to massage search results to your liking.

- the no-matches condition returns 'undef' instead of an empty vector '[]'
  - conditional expressions based on no-search-results will work now.

- Assigning any vector to 'match_value' throws a WARNING and return 'undef'
  - I started trying to get smart and 'collapse vectors of length=1' for backward compatibility but... no. Better to rip this bandaid off clean.
  - Perhaps a future enhancement could support vector-type match_value for things like searching for points... That could be handy for process polygon() and polyhedron() point sets.

Thoughts? Comments?

Speak now or fix it yourself.?. ;-)

Andrew.
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

runsun
Wow, Andrew, that was quick !!

Without going over the links in details, here is my quick view:

It looks great. The removal of iteration over match_value and the num_returns_per_match is very significant.

One note:

match_value doesn't have to exclude lists. You just treat it as a value and don't iterate over it. This way, it can be used to search points like you wish.

In fact, since it is "a value", there's no need to enforce any type constraint on match_value. It could be anything even boolean or even undef. Thus, there is no need for the warning sign. It wouldn't be too hard to check why a search doesn't return indices as expected.

Certainly, if match_value=vector is allowed,  we have to think about how to deal with this:

   search(  ["abc",1],   [ ["abc",1], [ ["abc",1],2 ], ["ghi",3] ...]   )

Will it give [0] ? [1]? [0,1] ?

This can be controlled by index_col_num, for example,

index_col_num = 0 ==> [1]  (match the column #0 )
index_col_num = -1 ==> [0] (means, no selection of column, so match the whole item, in this case, ["abc",1], and return [0] )

Lastly, a side note:

Since search( ) now seems to allow flat list (which I believe was not original design for), what it does is returning index:

   search( "def", ["abc",1,"def",2,"ghi",3] )

and a step-short to serve the purpose of hash-like feature, because this will fail :

   search( "def", ["abc","def","def",2,"ghi",3] )

It returns 1, the index of value of key "abc", but not 2, the index of key  "def".

Unless a new argument, every, is introduced. every=1 (default), every=2: allows for key search in a list of key-value pairs. Its addition depends on how you feel how important this "key-value pairs" is and if this search() wants to play that role.

BTW: I have a whole set of test cases for search(). Once it is merged into the nightly, I can try them out.


clothbot wrote
- A string 'match_value' searches for full-string matches.
  - It does *not* iterate over each character in the string and return a list of matches per character any more.

- All matches are returned every time
  - no more 'num_returns_per_match' parameter.
  - use user-defined functions like the above search_vector_one() example to massage search results to your liking.

- the no-matches condition returns 'undef' instead of an empty vector '[]'
  - conditional expressions based on no-search-results will work now.

- Assigning any vector to 'match_value' throws a WARNING and return 'undef'
  - I started trying to get smart and 'collapse vectors of length=1' for backward compatibility but... no. Better to rip this bandaid off clean.
  - Perhaps a future enhancement could support vector-type match_value for things like searching for points... That could be handy for process polygon() and polyhedron() point sets.

Thoughts? Comments?

Speak now or fix it yourself.?. ;-)

Andrew.
$ Runsun Pan, PhD
$ libs: scadx, doctest, faces(git), offline doc(git), runscad.py(2,git), editor of choice: CudaText ( OpenSCAD lexer); $ Tips; $ Snippets
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

clothbot
Hi Rusun,

Very briefly, vector and string 'index' counting starts at 0, not 1.

list1= ["abc",1,"def",2,"ghi",3]
 search( "def", list1 )

...will return '[2]' because list1[2]=="def"; list1[0]=="abc"

list2=["abc","def","def",2,"ghi",3]
search( "def", list2 )

...will now return '[1,2]' because list2[1]=="def" and list2[2]=="def"; list2[0]=="abc"

In my simplified search, all matches are always returned.  It is now up to the user to decide how many/few to filter off and by what mechanism/algorithm.

I think that yes, I'll eventually add list/vector support to match_value, however it will be considerably more involved to implement than the simple 'atomic' data structures.

Support search for an N-dimension vector match could be fun+useful:
  - add 'tol[erance]' parameter to allow for 'close enough' floating point 'distance' matches.

Picking my battles. :-)

Andrew.
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

runsun
Ok. I think I've bugged you enough. Whatever you decide, I think it's in a good direction :) :)
$ Runsun Pan, PhD
$ libs: scadx, doctest, faces(git), offline doc(git), runscad.py(2,git), editor of choice: CudaText ( OpenSCAD lexer); $ Tips; $ Snippets
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

MichaelAtOz
Administrator
I think you missed the point @runsun.

"Very briefly, vector and string 'index' counting starts at 0, not 1. "

This explains why you had so much difficulty understanding it.

search("a",[ "d", "c", "b", "a"); // returns [3]
                  0    1    2    3
                                  __

Admin - email* me if you need anything,
or if I've done something stupid...
* click on my MichaelAtOz label, there is a link to email me.

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work.
Obviously inclusion of works of previous authors is not included in the above.


The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out!
Reply | Threaded
Open this post in threaded view
|

Re: Digging into search( )

runsun
This post was updated on .
@ Michael, I didn't explain too much in details about the context, I guess that's why you (and Andrew) misunderstood.

The reason that I have this argument,

     search( "def", [ "abc","def","def",1 ] )

will cause confusion is that, the whole discussion stems from a discussion on other thread about hash parameter mapping. This includes the example you gave using lookup.

Then search() was mentioned. Note that the way I prefer is having key-value on a flat list:

    [ key1,val1, key2, val2 ...]

Search() was not designed for this type of key-value mapping, but I tried to use it that way, by applying search() on a flat list as above. Note that all examples on the doc about search() are either against strings :

    search( ... "abcdef")

or against list of vectors:

    search( ... [["abc",1],["def",2] ...])

but not flat list.

So if search() is to be use as a key-value type mapping against a flat list like I like it to be, it has to be able to find every other item to map, that is,

    search( "def", ["abc", "def", "def", 1] )

def has to map item 0, that is "abc", and skip item 1 (which is a value associates to item 0), then item 2, that is "def".

In this case, it should have returned [2]. Or, if return all, [1,2].

But, like I said, search() can't do that, and when set to return only one item, it will return [1], but not [2], which is not what I want. So using search() in a key-value mapping manner will fail.

This can be solved if search() can match every other item (see my previous post).

But I understand that this is probably just my way of using it, so I leave the decision to Andrew. It would probably make it too complicated, anyway.

So this is not the problem of mistaking the base indexing. Guess I was just too lazy to explain the entire context. :( :(

MichaelAtOz wrote
I think you missed the point @runsun.

"Very briefly, vector and string 'index' counting starts at 0, not 1. "

This explains why you had so much difficulty understanding it.

search("a",[ "d", "c", "b", "a"); // returns [3]
                  0    1    2    3
                                  __
$ Runsun Pan, PhD
$ libs: scadx, doctest, faces(git), offline doc(git), runscad.py(2,git), editor of choice: CudaText ( OpenSCAD lexer); $ Tips; $ Snippets