Differences

This shows you the differences between two versions of the page.

Link to this comparison view

gnucap:user:istring [2018/09/24 07:42] (current)
felixs created
Line 1: Line 1:
 +== case sensitivity in maps ==
  
 +gnucap provides a flag for case sensitivity that affects the parsing and
 +interpretation of identifiers. It is user controlled. The main use case is the
 +support for mostly obsolete but still widely used input languages such as
 +SPICE. the case sensitivity flag affects declaration and lookup.
 +
 +In case insensitive mode, an identifier is cast to lower case when it is
 +entered, and when it is looked up in a dictionary. This can be seen when
 +switching to the SPICE language, which turns case sensitivity off.
 +
 +  gnucap> spice
 +  gnucap-spice>.param a=1.
 +  gnucap-spice>.param B=2.
 +  gnucap-spice>.param
 +   a= 1. b= 2.
 +
 +internally, these names are stored in lower case, not case preserving, similar
 +to names in FAT12, but lower case. back to verilog mode, things look sensitive
 +again.
 +
 +  gnucap-spice>.verilog
 +  gnucap-verilog>param C=2.
 +  gnucap-verilog>param c=3.
 +  gnucap-verilog>param b=4.
 +  gnucap-verilog>param
 +  .C( 2.), .a( 1.), .b( 4.), .c(3.)
 +
 +in particular, the actual case is lost, and the parameter B is gone.
 +
 +  gnucap-verilog>eval C
 +  C= 1.
 +  gnucap-verilog>eval B
 +  B=B
 +  gnucap-verilog>eval b
 +  b= 4.
 +
 +similarly, when a parameter "X" is declared in case sensitive mode, it is
 +invisible in case insensitive mode. This can have surprising consequences, when
 +running simulations (e.g. parameter sweeps) in spice mode.
 +
 +Newer case insensitive file systems (e.g. NTFS, VFAT, HFS+) are case
 +preserving, not without odd side effects. on NTFS, readme.txt and a Readme.txt
 +can coexist, according to wikipedia. Since file systems usually fix the
 +sensitivity mode, the side effects are limited. But the approaches do not work
 +very well in gnucap.
 +
 +gnucap should be case preserving internally, but must implement a lookup that
 +depends on the current mode.
 +
 +  gnucap> options noinsensitive
 +  gnucap> param A=1.
 +  gnucap> options insensitive
 +  gnucap> eval A
 +   A= 1.
 +  gnucap> eval a
 +   a= 1.
 +  gnucap> param A=2.
 +  gnucap> eval A
 +   A= 2.
 +  gnucap> eval a
 +   a= 2.
 +  gnucap> param A=3.
 +  gnucap> eval A
 +   A= 3.
 +  gnucap> eval a
 +   a= 3.
 +
 +With an approriate lookup, no mangling will be required in case insensitive
 +mode. But how to implement an appropriate lookup? Gnucap stores parameters in
 +an std::map<std::string, PARAMETER>. The problem with that is, if "AAAA" is a
 +key, a case insensitive lookup for "aaaa" will be tricky. The expectation is,
 +that "AAAA" will be found. however, Maps store keys in binary trees, and the
 +search is implemented as a bisection, for performance reasons.
 +In a linear search, we could simply cast the search string to upper, as well as
 +all the keys that we compare it to. In the map above, we'd have to do a lookup
 +for all strings that "aaaa" could possibly match. Sixteen in this case, but
 +much worse for longer strings.
 +
 +The issue with the lookup boils down to the fact that the insensitive matches
 +could be anywhere in the map, which is sorted by the order induced by the ASCII
 +encoding, 1 < 2 < ... < 9 < A < B ... < Z < a < b < ... < z. The following
 +example illustrates the pattern, which easily scales to any size.
 +
 +  AAAA *
 +  ABBB
 +  AAaa *
 +  AABB
 +  Aaaa *
 +  Bbbb
 +  aaaa *
 +
 +In contrast, if the keys were sorted insensitively first and blocks with
 +congruent keys sorted individually, case sensitive, the order would be
 +
 +  AAAA *
 +  AAaa *
 +  Aaaa *
 +  aaaa *
 +  AABB
 +  ABBB
 +  Bbbb.
 +
 +Putting all possible matches to our query next to each other. In an std::map
 +sorted like this, lookups and inserts with either case (in)sensitivity mode
 +will work, while the mode may change in between.
 +
 +- implement the string comparison operator
 +- encapsulate it in a custom string type
 +- provide case sensitive and insensitive lookup, by default these should use OPT::case_insensitive.
 +
 +(this is implemented in the paramap-* branch)
 +
 +== CARD_LIST ==
 +
 +With this kind of string in place, it will be possible to also implement an
 +efficient CARD_LIST::find, supporting both case sensitivities. CARD_LISTs
 +implement the lists of CARDs, which represent all user input, including the
 +order, for each individual scope. The CARDs could be models or components or
 +subcircuit declarations or just comments.
 +
 +Unlike in PARAM_LISTs there can be many cards with the same key (short_label)
 +in one CARD_LIST, which is implemented as a linked list. The lookup is
 +implemented as a linear search, and hence slow. And if there was no case
 +sensitivity flag, this could be easily fixed. an std::multimap<string,
 +list_iterator> or std::multiset<list_iterator> right next to the std::list
 +could be used to lookup the position.
 +
 +So the list needs to stay the same, and instead of the multiset, we need a
 +custom multiset that works transparently through case sensitivity changes.
 +One way of doing this, is to nest sets. One set, with elements sorted case
 +insensitively holds (at most) one multiset for each string class, a "bucket".
 +a bucket may then hold arbitrarily many CARDs, but in case sensitive ordering.
 +
 +For example, if there are cards A A a B b b c, they will end up in three
 +buckets.
 +
 +  A - A
 +    - A
 +    - a
 +  B - B
 +    - b
 +    - b
 +  c - c
 +
 +A case insensitive lookup for b will then try and find it in the B bucket, then
 +pick any of the cards there. similarly, a case sensitive lookup will pick one
 +of the lower case cards. since std::multiset preserves the insertion order, it
 +will be possible to fine tune the lookup.
 +
 +(this is implemented in the multimap-* branch)
gnucap/user/istring.txt · Last modified: 2018/09/24 07:42 by felixs
 
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Run by Debian Driven by DokuWiki