Data Types and Objects
Perl has three data types: scalars,
arrays of scalars, and associative arrays of scalars. Normal
arrays are indexed by number, and associative arrays by string.
The interpretation of operations and values
in perl sometimes depends on the requirements of the context
around the operation or value. There are three major contexts:
string, numeric and array. Certain operations return array values
in contexts wanting an array, and scalar values otherwise. (If
this is true of an operation it will be mentioned in the
documentation for that operation.) Operations which return
scalars don't care whether the context is looking for a string or
a number, but scalar variables and values are interpreted as
strings or numbers as appropriate to the context. A scalar is
interpreted as TRUE in the boolean sense if it is not the null
string or 0. Booleans returned by operators are 1 for true and 0
or '' (the null string) for false.
There are actually two varieties of null
string: defined and undefined. Undefined null strings are
returned when there is no real value for something, such as when
there was an error, or at end of file, or when you refer to an
uninitialized variable or element of an array. An undefined null
string may become defined the first time you access it, but prior
to that you can use the defined() operator to
determine whether the value is defined or not.
References to scalar variables always begin
with '$', even when referring to a scalar that is part of an
array. Thus:
$days # a simple scalar variable
$days[28] # 29th element of array @days
$days{'Feb'} # one value from an associative array
$#days # last index of array @days
but entire arrays or array slices are
denoted by '@':
@days # ($days[0], $days[1],\|... $days[n])
@days[3,4,5] # same as @days[3.\|.5]
@days{'a','c'} # same as ($days{'a'},$days{'c'})
and entire associative arrays are denoted
by '%':
%days # (key1, val1, key2, val2 ...)
Any of these eight constructs may serve as
an lvalue, that is, may be assigned to. (It also turns out that
an assignment is itself an lvalue in certain contexts--see
examples under s, tr and chop.) Assignment to a
scalar evaluates the righthand side in a scalar context, while
assignment to an array or array slice evaluates the righthand
side in an array context.
You may find the length of array @days by
evaluating "$#days", as in csh. (Actually, it's
not the length of the array, it's the subscript of the last
element, since there is (ordinarily) a 0th element.) Assigning to
$#days changes the length of the array. Shortening an array by
this method does not actually destroy any values. Lengthening an
array that was previously shortened recovers the values that were
in those elements. You can also gain some measure of efficiency
by preextending an array that is going to get big. (You can also
extend an array by assigning to an element that is off the end of
the array. This differs from assigning to $#whatever in that
intervening values are set to null rather than recovered.) You
can truncate an array down to nothing by assigning the null list
() to it. The following are exactly equivalent
@whatever = ();
$#whatever = $[ - 1;
If you evaluate an array in a scalar
context, it returns the length of the array. The following is
always true:
scalar(@whatever) == $#whatever - $[ + 1;
If you evaluate an associative array in a
scalar context, it returns a value which is true if and only if
the array contains any elements. (If there are any elements, the
value returned is a string consisting of the number of used
buckets and the number of allocated buckets, separated by a slash.)
Multi-dimensional arrays are not directly
supported, but see the discussion of the $; variable later for a means of emulating multiple
subscripts with an associative array. You could also write a
subroutine to turn multiple subscripts into a single subscript.
Every data type has its own namespace. You
can, without fear of conflict, use the same name for a scalar
variable, an array, an associative array, a filehandle, a
subroutine name, and/or a label. Since variable and array
references always start with '$', '@', or '%', the "reserved"
words aren't in fact reserved with respect to variable names. (They
ARE reserved with respect to labels and filehandles, however,
which don't have an initial special character. Hint: you could
say open(LOG,'logfile') rather than open(log,'logfile'). Using uppercase filehandles also
improves readability and protects you from conflict with future
reserved words.) Case IS significant--"FOO", "Foo"
and "foo" are all different names. Names which start
with a letter may also contain digits and underscores. Names
which do not start with a letter are limited to one character, e.g.
"$%" or "$$". (Most of the
one character names have a predefined significance to perl.
More later.)
Numeric literals are specified in any of
the usual floating point or integer formats:
12345
12345.67
.23E-10
0xffff # hex
0377 # octal
4_294_967_296
String literals are delimited by either
single or double quotes. They work much like shell quotes: double-quoted
string literals are subject to backslash and variable
substitution; single-quoted strings are not (except for \' and \e).
The usual backslash rules apply for making characters such as
newline, tab, etc., as well as some more exotic forms:
\t tab
\n newline
\r return
\f form feed
\b backspace
\a alarm (bell)
\e escape
\033 octal char
\x1b hex char
\c[ control char
\l lowercase next char
\u uppercase next char
\L lowercase till \E
\U uppercase till \E
\E end case modification
You can also embed newlines directly in
your strings, i.e. they can end on a different line than they
begin. This is nice, but if you forget your trailing quote, the
error will not be reported until perl finds another line
containing the quote character, which may be much further on in
the script. Variable substitution inside strings is limited to
scalar variables, normal array values, and array slices. (In
other words, identifiers beginning with $ or @, followed by an
optional bracketed expression as a subscript.) The following code
segment prints out "The price is $100."
$Price = '$100'; # not interpreted
print "The price is $Price.\n"; # interpreted
Note that you can put curly brackets around
the identifier to delimit it from following alphanumerics. Also
note that a single quoted string must be separated from a
preceding word by a space, since single quote is a valid
character in an identifier (see Packages).
Two special literals are __LINE__ and
__FILE__, which represent the current line number and filename at
that point in your program. They may only be used as separate
tokens; they will not be interpolated into strings. In addition,
the token __END__ may be used to indicate the logical end of the
script before the actual end of file. Any following text is
ignored, but may be read via the DATA filehandle. (The DATA
filehandle may read data only from the main script, but not from
any required file or evaluated string.) The two control
characters ^D and ^Z are synonyms for __END__.
A word that doesn't have any other
interpretation in the grammar will be treated as if it had single
quotes around it. For this purpose, a word consists only of
alphanumeric characters and underline, and must start with an
alphabetic character. As with filehandles and labels, a bare word
that consists entirely of lowercase letters risks conflict with
future reserved words, and if you use the -w switch, Perl
will warn you about any such words.
Array values are interpolated into double-quoted
strings by joining all the elements of the array with the
delimiter specified in the $" variable, space by default. (Since
in versions of perl prior to 3.0 the @ character was not a
metacharacter in double-quoted strings, the interpolation of
@array, $array[EXPR], @array[LIST], $array{EXPR}, or @array{LIST}
only happens if array is referenced elsewhere in the program or
is predefined.) The following are equivalent:
$temp = join($",@ARGV);
system "echo $temp";
system "echo @ARGV";
Within search patterns (which also undergo
double-quotish substitution) there is a bad ambiguity: Is /$foo[bar]/
to be interpreted as /${foo}[bar]/ (where [bar] is a character
class for the regular expression) or as /${foo[bar]}/ (where [bar]
is the subscript to array @foo)? If @foo doesn't otherwise exist,
then it's obviously a character class. If @foo exists, perl takes
a good guess about [bar], and is almost always right. If it does
guess wrong, or if you're just plain paranoid, you can force the
correct interpretation with curly brackets as above.
A line-oriented form of quoting is based on
the shell here-is syntax. Following a << you specify a
string to terminate the quoted material, and all lines following
the current line down to the terminating string are the value of
the item. The terminating string may be either an identifier (a
word), or some quoted text. If quoted, the type of quotes you use
determines the treatment of the text, just as in regular quoting.
An unquoted identifier works like double quotes. There must be no
space between the << and the identifier. (If you put a
space it will be treated as a null identifier, which is valid,
and matches the first blank line--see Merry Christmas example
below.) The terminating string must appear by itself (unquoted
and with no surrounding whitespace) on the terminating line.
print <<EOF; # same as above
The price is $Price.
EOF
print <<"EOF"; # same as above
The price is $Price.
EOF
print << x 10; # null identifier is delimiter
Merry Christmas!
print <<`EOC`; # execute commands
echo hi there
echo lo there
EOC
print <<foo, <<bar; # you can stack them
I said foo.
foo
I said bar.
bar
Array literals are denoted by separating
individual values by commas, and enclosing the list in
parentheses:
(LIST)
In a context not requiring an array value,
the value of the array literal is the value of the final element,
as in the C comma operator. For example,
@foo = ('cc', '-E', $bar);
assigns the entire array value to array
foo, but
$foo = ('cc', '-E', $bar);
assigns the value of variable bar to
variable foo. Note that the value of an actual array in a scalar
context is the length of the array; the following assigns to $foo
the value 3:
@foo = ('cc', '-E', $bar);
$foo = @foo; # $foo gets 3
You may have an optional comma before the
closing parenthesis of an array literal, so that you can say:
@foo = (
1,
2,
3,
);
When a LIST is evaluated, each element of
the list is evaluated in an array context, and the resulting
array value is interpolated into LIST just as if each individual
element were a member of LIST. Thus arrays lose their identity in
a LIST--the list (@foo,@bar,&SomeSub) contains all the
elements of @foo followed by all the elements of @bar, followed
by all the elements returned by the subroutine named SomeSub.
A list value may also be subscripted like a
normal array. Examples:
$time = (stat($file))[8]; # stat returns array value
$digit = ('a','b','c','d','e','f')[$digit-10];
return (pop(@foo),pop(@foo))[0];
Array lists may be assigned to if and only
if each element of the list is an lvalue:
($a, $b, $c) = (1, 2, 3);
($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00);
The final element may be an array or an
associative array:
($a, $b, @rest) = split;
local($a, $b, %rest) = @_;
You can actually put an array anywhere in
the list, but the first array in the list will soak up all the
values, and anything after it will get a null value. This may be
useful in a local().
An associative array literal contains pairs
of values to be interpreted as a key and a value:
# same as map assignment above
%map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
Array assignment in a scalar context
returns the number of elements produced by the expression on the
right side of the assignment:
$x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
There are several other pseudo-literals
that you should know about. If a string is enclosed by backticks
(grave accents), it first undergoes variable substitution just
like a double quoted string. It is then interpreted as a command,
and the output of that command is the value of the pseudo-literal,
like in a shell. In a scalar context, a single string consisting
of all the output is returned. In an array context, an array of
values is returned, one for each line of output. (You can set $/ to use a different line terminator.) The command is
executed each time the pseudo-literal is evaluated. The status
value of the command is returned in $? (see Predefined Names for the interpretation of $?). Unlike in csh, no translation is done on
the return data--newlines remain newlines. Unlike in any of the
shells, single quotes do not hide variable names in the command
from interpretation. To pass a $ through to the shell you need to
hide it with a backslash.
Evaluating a filehandle in angle brackets
yields the next line from that file (newline included, so it's
never false until EOF, at which time an undefined value is
returned). Ordinarily you must assign that value to a variable,
but there is one situation where an automatic assignment happens.
If (and only if) the input symbol is the only thing inside the
conditional of a while loop, the value is automatically
assigned to the variable "$_". (This may
seem like an odd thing to you, but you'll use the construct in
almost every perl script you write.) Anyway, the following
lines are equivalent to each other:
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while $_ = <STDIN>;
print while <STDIN>;
The filehandles STDIN , STDOUT
and STDERR are predefined. (The filehandles stdin, stdout
and stderr will also work except in packages, where they
would be interpreted as local identifiers rather than global.)
Additional filehandles may be created with the open
function.
If a <FILEHANDLE> is used in a
context that is looking for an array, an array consisting of all
the input lines is returned, one line per array element. It's
easy to make a LARGE data space this way, so use with care.
The null filehandle <> is special and
can be used to emulate the behavior of sed and awk.
Input from <> comes either from standard input, or from
each file listed on the command line. Here's how it works: the
first time <> is evaluated, the ARGV array is checked, and
if it is null, $ARGV[0] is set to '-', which when opened gives
you standard input. The ARGV array is then processed as a list of
filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like
pseudo code:
unshift(@ARGV, '-') if $#ARGV < $[;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't as cumbersome to say,
and will actually work. It really does shift array ARGV and put
the current filename into variable ARGV. It also uses filehandle
ARGV internally--<> is just a synonym for <ARGV>,
which is magical. (The pseudo code above doesn't work because it
treats <ARGV> as non-magical.)
You can modify @ARGV before the first <> as long as the array ends
up containing the list of filenames you really want. Line numbers
($.) continue as if the input was one big happy file. (But see
example under eof for how to reset
line numbers on each file.)
If you want to set @ARGV to your own list of files, go right ahead. If you
want to pass switches into your script, you can put a loop on the
front like this:
while ($_ = $ARGV[0], /^-/) {
shift;
last if /^--$/;
/^-D(.*)/ && ($debug = $1);
/^-v/ && $verbose++;
... # other switches
}
while (<>) {
... # code for each line
}
The <> symbol will return FALSE only
once. If you call it again after this it will assume you are
processing another @ARGV list, and if you
haven't set @ARGV, will input from
STDIN.
If the string inside the angle brackets is
a reference to a scalar variable (e.g. <$foo>), then that
variable contains the name of the filehandle to input from.
If the string inside angle brackets is not
a filehandle, it is interpreted as a filename pattern to be
globbed, and either an array of filenames or the next filename in
the list is returned, depending on context. One level of $
interpretation is done first, but you can't say <$foo>
because that's an indirect filehandle as explained in the
previous paragraph. You could insert curly brackets to force
interpretation as a filename glob: <${foo}>.
Example:
while (<*.c>) {
chmod 0644, $_;
}
is equivalent to
open(foo, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'");
while (<foo>) {
chop;
chmod 0644, $_;
}
In fact, it's currently implemented that
way. (Which means it will not work on filenames with spaces in
them unless you have /bin/csh on your machine.) Of course, the
shortest way to do the above is:
chmod 0644, <*.c>;
|