unicode - What characters are allowed in Perl identifiers? -
i'm working on regular expressions homework 1 question is:
using language reference manuals online determine regular expressions integer numeric constants , identifiers java, python, perl, , c.
i don't need on regular expression, have no idea identifiers in perl. found pages describing valid identifiers c, python , java, can't find perl.
edit: clarify, finding documentation meant easy (like doing google search python identifiers). i'm not taking class in "doing google searches".
perl integer constants
integer constants in perl can
- in base 16 if start
^0x
- in base 2 if start
^0b
- in base 8 if start
0
- otherwise in base 10.
following leader number of valid digits in base and optional underscores.
note digit not mean \p{posix_digit}
; means \p{decimal_number}
, quite different, know.
please note leading minus sign not part of integer constant, proven by:
$ perl -mo=concise,-exec -le '$x = -3**$y' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v:{ 3 <$> const(iv 3) s 4 <$> gvsv(*y) s 5 <2> pow[t1] sk/2 6 <1> negate[t2] sk/1 7 <$> gvsv(*x) s 8 <2> sassign vks/2 9 <@> leave[1 ref] vkp/refc -e syntax ok
see 3 const
, , later on negate
op-code? tells bunch, including curiosity of precedence.
perl identifiers
identifiers specified via symbolic dereferencing have absolutely no restriction whatsoever on names.
- for example,
100->(200)
calls function named100
arugments(100, 200)
. - for another,
${"what’s up, doc?"}
refers scalar package variable name in current package. - on other hand,
${"what's up, doc?"}
refers scalar package variable name${"s up, doc?"}
, not in current package, rather inwhat
package. well, unless current packagewhat
package, of course. similary$who's
$s
variable inwho
package.
one can have identifiers of form ${^
identifier}
; these not considered symbolic dereferences symbol table.
identifiers single character alone can punctuation character, include $$
or %!
.
identifers can of form $^c
, either control character or circumflex folllowed non-control character.
if none of things true, (non–fully qualified) identifier follows unicode rules related characters properties id_start
followed property id_continue
. however, overrules in allowing all-digit identifiers , identifiers start (and perhaps have nothing else beyond) underscore. can pretend (but it’s pretending) that saying \w+
, \w
described in annex c of uts#18. is, has of these:
- the alphabetic property — includes far more letters; contains various combining characters , letter_number code points, plus circled letters
- the decimal_number property, rather more merely
[0-9]
- any , characters mark property, not marks deemed other_alphabetic
- any characters connector_puncutation property, of underscore 1 such.
so either ^\d+$
or else
^[\p{alphabetic}\p{decimal_number}\p{mark}\p{connector_punctuation}]+$
ought simple ones if don’t care explore intricacies of unicode id_start , id_continue properties. that’s how it’s done, bet instructor doesn’t know that. perhaps 1 shan’t tell him, eh?
but should cover nonsimple ones describe earlier.
and haven’t talked packages yet.
perl packages in identifiers
beyond simple rules, must consider identifiers may qualified package name, , package names follow rules of identifiers.
the package separator either ::
or '
@ whim.
you not have specify package if first component in qualified identifier, in case means package main
. means things $::foo
, $'foo
equivalent $main::foo
, , isn't_it()
equivalent isn::t_it()
. (typo removed)
finally, special case, trailing double-colon (but not single-quote) @ end of hash permitted, , refers symbol table of name.
thus %main::
main
symbol table, , because can omit main, %::
.
meanwhile %foo::
foo
symbol table, %main::foo::
, %::foo::
perversity’s sake.
summary
it’s nice see instructors giving people non-trivial assignments. question whether instructor realized non-trivial. not.
and it’s hardly perl, either. regarding java identifiers, did figure out yet textbooks lie? here’s demo:
$ perl -le 'print qq(public class escape { public static void main(string argv[]) { string var_\033 = "i escape: ^\033"; system.out.println(var_\033); }})' > escape.java $ javac escape.java $ java escape | cat -v escape: ^[
yes, it’s true. true many other code points, if use -encoding utf-8
on compile line. job find pattern describes these startlingly unforbidden java identifiers. hint: make sure include code point u+0000.
there, aren’t glad asked? hope helps. or something. ☺
Comments
Post a Comment