unicode - What characters are allowed in Perl identifiers? -


i'm working on regular expressions homework 1 question is:

using language reference manuals online determine regular expressions integer numeric constants , identifiers java, python, perl, , c.

i don't need on regular expression, have no idea identifiers in perl. found pages describing valid identifiers c, python , java, can't find perl.

edit: clarify, finding documentation meant easy (like doing google search python identifiers). i'm not taking class in "doing google searches".

perl integer constants

integer constants in perl can

  • in base 16 if start ^0x
  • in base 2 if start ^0b
  • in base 8 if start 0
  • otherwise in base 10.

following leader number of valid digits in base and optional underscores.

note digit not mean \p{posix_digit}; means \p{decimal_number}, quite different, know.

please note leading minus sign not part of integer constant, proven by:

$ perl -mo=concise,-exec -le '$x = -3**$y' 1  <0> enter  2  <;> nextstate(main 1 -e:1) v:{ 3  <$> const(iv 3) s 4  <$> gvsv(*y) s 5  <2> pow[t1] sk/2 6  <1> negate[t2] sk/1 7  <$> gvsv(*x) s 8  <2> sassign vks/2 9  <@> leave[1 ref] vkp/refc -e syntax ok 

see 3 const, , later on negate op-code? tells bunch, including curiosity of precedence.

perl identifiers

identifiers specified via symbolic dereferencing have absolutely no restriction whatsoever on names.

  • for example, 100->(200) calls function named 100 arugments (100, 200).
  • for another, ${"what’s up, doc?"} refers scalar package variable name in current package.
  • on other hand, ${"what's up, doc?"} refers scalar package variable name ${"s up, doc?"} , not in current package, rather in what package. well, unless current package what package, of course. similary $who's $s variable in who package.

one can have identifiers of form ${^identifier}; these not considered symbolic dereferences symbol table.

identifiers single character alone can punctuation character, include $$ or %!.

identifers can of form $^c, either control character or circumflex folllowed non-control character.

if none of things true, (non–fully qualified) identifier follows unicode rules related characters properties id_start followed property id_continue. however, overrules in allowing all-digit identifiers , identifiers start (and perhaps have nothing else beyond) underscore. can pretend (but it’s pretending) that saying \w+, \w described in annex c of uts#18. is, has of these:

  • the alphabetic property — includes far more letters; contains various combining characters , letter_number code points, plus circled letters
  • the decimal_number property, rather more merely [0-9]
  • any , characters mark property, not marks deemed other_alphabetic
  • any characters connector_puncutation property, of underscore 1 such.

so either ^\d+$ or else

^[\p{alphabetic}\p{decimal_number}\p{mark}\p{connector_punctuation}]+$ 

ought simple ones if don’t care explore intricacies of unicode id_start , id_continue properties. that’s how it’s done, bet instructor doesn’t know that. perhaps 1 shan’t tell him, eh?

but should cover nonsimple ones describe earlier.

and haven’t talked packages yet.

perl packages in identifiers

beyond simple rules, must consider identifiers may qualified package name, , package names follow rules of identifiers.

the package separator either :: or ' @ whim.

you not have specify package if first component in qualified identifier, in case means package main. means things $::foo , $'foo equivalent $main::foo, , isn't_it() equivalent isn::t_it(). (typo removed)

finally, special case, trailing double-colon (but not single-quote) @ end of hash permitted, , refers symbol table of name.

thus %main:: main symbol table, , because can omit main, %::.

meanwhile %foo:: foo symbol table, %main::foo:: , %::foo:: perversity’s sake.

summary

it’s nice see instructors giving people non-trivial assignments. question whether instructor realized non-trivial. not.

and it’s hardly perl, either. regarding java identifiers, did figure out yet textbooks lie? here’s demo:

$ perl -le 'print qq(public class escape { public static void main(string argv[]) { string var_\033 = "i escape: ^\033"; system.out.println(var_\033); }})' > escape.java $ javac escape.java $ java escape | cat -v escape: ^[ 

yes, it’s true. true many other code points, if use -encoding utf-8 on compile line. job find pattern describes these startlingly unforbidden java identifiers. hint: make sure include code point u+0000.

there, aren’t glad asked? hope helps. or something. ☺


Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -

Determine if a XmlNode is empty or null in C#? -