Macro Assembler AS V1.41r7
User's Manual
Edition August 1998
IBM, PPC403GA, OS/2, and PowerPC are registered trademarks of IBM Corporation.
Intel, MCS-48, MCS-51, MCS-251, MCS-96, MCS-196 und MCS-296 are registered trademarks of Intel Corp. .
Motorola and ColdFire are registered trademarks of Motorola Inc. .
UNIX is a registered trademark of X/Open Company.
Microsoft, Windows, and MS-DOS are registered trademarks of Microsoft Corporation.
All other trademarks not explicitly mentioned in this section and used in this manual are properties of their respective owners.
This document has been processed with the LaTeX typesetting system using Digital Unix, Linux, and OS/2 operating systems running on AMD K6 and DEC Alpha processors.
This instruction is meant for those people who are already very familiar with Assembler and who like to know how to work with AS. It is rather a reference than a user's manual and so it neither tries to explain the ''language assembler'' nor the processors. I have listed further literature in the bibliography which was substantial in the implementation of the different code generators. There is no book I know where you can learn Assembler from the start, so I generally learned this by ''trial and error''.
Before we can go ''in medias res'', first of all the inevitable
prologue:
I publish AS, in the present version, as ''Public Domain''. This
means, the program and overlay files and also the utility and tool
programs appended may be copied and use for free (of charge). There
are no plans to convert AS into a commercial or shareware program.
This permission however is valid only under the following premises:
On request the source code of this program can also be made
available. Programs or derivates structured hereon must be passed-on
under the same conditions as this program.
I explicitly encourage you to spread this program by disc or mailbox,
BBS, resp. Internet!
May be, you have got this program as enclosure to a commercial
program. The license agreement for the commercial program in no case
applies to AS.
If you took so much pleasure in this assembler that you like to send
me some money, I would ask you kindly to give the amount to
Greenpeace.
I have been trying to make the programs as bug free as possible. But
since there is principally no bug free software (the only people
making no mistakes are lying in the cemetery!), I do not take any
warranty for the function of the assembler in a particular
environment (hard or software) or liability for damages. Naturally I
will always be thankful for bug-reports or improvements and will work
on the fixing.
To accelerate the error diagnose and correction, please add the
following details to the bug report:
Please don't call me by phone. First, complex relations are extremely
hard to discuss at phone. Secondly, the telephone companies are
already rich enough...
The latest german version of AS (DOS,DPMI,OS/2) is available from the
following FTP-Server:
Whoever has no access to an FTP-Server can ask me to send the
assembler by mail. Only requests containing floppies (2 pieces 1.44
Mbytes, for 720Kbytes/1.2Mbytes format 4/3 pieces) and a
self-addressed, (correctly) stamped envelope will be answered. Don't
send any money!
Now, after this inevitable introduction we can turn to the actual
documentation:
In contrast to ordinary assemblers, AS offers the possibility to
generate code for totally different processors. At the moment, the
following processor families have been implemented:
The reason for this flexibility is that AS has a history, which may
also be recognized by looking at the version number. AS was created
as an extension of a macro assembler for the 68000 family. On special
request, I extended the original assembler so that it was able to
translate 8051 mnemonics. On this way (decline ?!) from the 68000 to
8051, some other processors were created as by-products. All others
were added over time due to user requests. So At least for the
processor-independent core of AS, one may assume that it is
well-tested and free of obvious bugs. However, I often do not have
the chance to test a new code generator in practice (due to lack of
appropriate hardware), so surprises are not impossible when working
with new features. You see, the things stated in section 1.1 have a reason...
This flexibility implies a somewhat exotic code format, therefore I
added some tools to work with it. Their description can be found in
chapter 6.
AS is a macro assembler, which means that the programmer has the
possibility to define new ''commands'' by means of macros.
Additionally it masters conditional assembling. Labels inside macros
are automatically processed as being local.
For the assembler, symbols may have either integer, string or
floating point values. These will be stored - like interim values in
formulas - with a width of 32 bits for integer values, 80 or 64 bits
for floating point values, and 255 characters for strings. For a
couple of micro controllers, there is the possibility to classify
symbols by segmentation. So the assembler has a (limited) possibility
to recognize accesses to wrong address spaces.
The assembler does not know explicit limits in the nesting depth of
include files or macros; a limit is only given by the program stack
restricting the recursion depth. Nor is there a limit for the symbol
length, which is only restricted by the maximum line length.
From version 1.38 on, AS is a multipass-assembler. This pompous term
means no more than the fact that the number of passes through the
source code need not be exactly two. If the source code does not
contain any forward references, AS needs only one pass. In case AS
recognizes in the second pass that it must use a shorter or longer
instruction coding, it needs a third (fourth, fifth...) pass to
process all symbol references correctly. There is nothing more behind
the term ''multipass'', so it will not be used further more in this
documentation.
After so much praise a bitter pill: AS cannot generate linkable code.
An extension with a linker needs considerable effort and is not
planned at the moment.
As regards ''release of sources'': the sources of AS are not
presented in a form which allows easy understanding (== no comments).
So I will emit sources only in case somebody really wants to work on
it (e.g. to port AS into another computer system) and the derivates
become again Public Domain. Particularly I want to prevent that
someone changes 5 lines (most popular the copyright entry) and sell
the result commercially as ''his own'' program.
Though AS started as a pure DOS program, there are a couple of
versions available that are able to exploit a bit more than the Real
Mode of an Intel CPU. Their usage is kept as compatible to the DOS
version as possible, but there are of course differences concerning
installation and embedding into the operating system in question.
Sections in this manual that are only valid for a specific version of
AS are marked with a corresponding sidemark (at this paragraph for
the DOS version) aheaded to the paragraph. In detail, the following
further versions exist (distributed as separate packages):
In case you runinto memory problems when assembling large and complex
programs, there is a DOS version that runs in protected mode via a
DOS extender and can therefore make use of the whole extended memory
of an AT. The assembly becomes significantly slower by the extender,
but at least it works...
There is a native OS/2 version of AS for friends of IBM's OS/2
operating system. This is currently only a 16-bit version, but at
least this way saves the roundtrips via DOS boxes and one does not
have any problems any more with longer file names.
You can leave the area of PCs-only with the C version of AS that was
designed to be compilable on a large number of UNIX systems (this
includes OS/2 with the emx compiler) without too much of tweaking. In
contrast to the previously mentioned versions, the C version is
delivered in source code, i.e. one has to create the binaries by
oneself using a C compiler. This is by far the simpler way (for me)
than providing a dozen of precompiled binaries for machines I
sometimes only have limited access to...
People who have read this enumeration up to this point will notice
that world's best-selling operating system coming from Redmont is
missing in this enumeration. People who know me personally will know
that I do not regard Windows to be a pat solution (regardless if its
3.X, 95, or NT). Frankly said, I am a 'windows hater'. A large number
of people will now regard this to be somewhere between obsolete and
ridiculous, and they will tell me that I withhold AS from a large
part of potential users, but they will have to live with it: I
primarily continue to improve AS because I have fun doing it; AS is a
non-commercial project and I therefore take the freedom not to look
at potential market shares. I select platforms for me where I have
fun programming, and I definitely do not have any fun when
programming for Windows! By the way, there was a time when I had to
write Windows programs so I do not simply jabber without having an
idea what I am talking about. If someone wants to port AS into this
direction, I will not stand in his way, but (s)he should not expect
anything more from me than providing sources (which is why (s)he will
have to deal with questions like 'why does AS not work any more after
I changed the JUNK-CAD 18.53 registry entry from upper to lower
case?').
The hardware requirements of AS vary substantially from version to
version:
The DOS version will principally run on any IBM-compatible PC,
ranging from a PC/XT with 4-dot-little megahertz up to a Pentium.
However, similar to other programs, the fun using AS increases the
better your hardware is. An XT user without a hard drive will
probably have significant trouble placing the overlay file on a
floppy because it is larger than 500 Kbytes...the PC should therefore
have at least a hard drive, allowing acceptable loading times. AS is
not very advanced in its main memory needs: the program itself
allocates less than 300 Kbytes main memory, AS should therefore work
on machines with at least 512 Kbytes of memory.
The version of AS compiled for the DOS Protected Mode Interface
(DPMI) requires at least 1 Mbyte of free extended memory. A total
memory capacity of at least 2 Mbytes is therefore the absolute
minimum given one does not have other tools in the XMS (like disk
caches, RAM disks, or a hi-loaded DOS); the needs will rise then
appropriately. If one uses the DPMI version in a DOS box of OS/2, one
has to assure that DPMI has been enabled via the box's DOS settings
(set to on or auto) and that a sufficient amount of
XMS memory has been assigned to the box. The virtual memory
management of OS/2 will free you from thinking about the amount of
free real memory.
The hardware requirements of the OS/2 version mainly result from the
needs of the underlying operating system, i.e. at minimum an 80386SX
processor, 8 Mbytes of RAM (resp. 4 Mbytes without the graphical user
interface) and 100..150 Mbytes of hard disk space. AS2 is only a
16-bit application and therefore it should also work on older OS/2
versions (thereby reducing the processor needs to at least an 80286
processor); I had however no chance to test this.
The C version of AS is delivered as source code and therefore
requires a UNIX or OS/2 system equipped with a C compiler. The
compiler has to fulfill the ANSI standard (GNU-C for example is
ANSI-compliant). You can look up in the README file whether
your UNIX system has already been tested so that the necessary
definitions have been made. You should reserve about 15 Mbytes of
free hard disk space for compilation; this value (and the amount
needed after compilation to store the compiled programs) strongly
differs from system to system, so you should take this value only as
a rough approximation.
Depending on the platform, the distributions contain a different
amount of files. One reason for this is that some packets use files
from other packets, on the other hand certain packets have to contain
additional files which are e.g. necessary for the operation of DOS
extenders. If one of the files listed in the following tables is
missing, someone (in case of doubt me) took a nap while copying the
files...
The DOS package released by me contains the files listed in tables 2.1 and 2.2, which can roughly be divided in
program files, documentation, includes, and test programs:
The package of the DPMI version is significantly smaller as it
neither contains include files nor utility programs nor test
programs. You may (you even must...) take them from the DOS version's
package. The utility programs were not compiled as extra DPMI
versions as they would not benefit from the larger memory but still
would experience the slowdown introduced by DPMI. The package can
therefore reduce itself to the executable of AS and the needed DOS
extender (table 2.3).
The OS/2 version of AS is similar to the DPMI version in the sense
that include files and test programs weren't added to the package
(again, take them from the DOS version), but of course the utility
programs were compiled as ''native'' versions (table 2.4). AS2MSG is missing due to the
missing base (==no Borland Pascal for OS/2):
As the C version is delivered in source code (in contrast to all
previous versions), its package is substantially larger (at least to
an extent that listing all files at this place would primarily result
in a waste of paper...). For example, it additionally includes a test
suite that is quite complete in contrast to the few test programs
delivered with the DOS version. The test suite allows to check the
correct operation of a freshly compiled version of AS.
There is no need for a special installation prior to usage of AS. It
is sufficient to copy all EXE and OVR files to a
directory listed in the PATH environment variable. It is
irrelevant whether you use an existing directory or create a new one
for this task. The documentation, demo programs, and include files
may be placed where you like. Following is an example for an
installation a UNIX guru would choose:
Create the following directories (I will assume in the following that
you are going to install AS on drive C):
As the DPMI version is primarily an Addon to the DOS version designed
for special situations, it is best to previously install the DOS
version according to the scheme outlined above. Afterwards, on can
copy AS2.EXE and the DPMI server's files to the bin
directory (DPMIUSER.DOC can e.g. be placed in the
DOC directory). If you are working on an 80286-based system, it
is possible that you get the following message the first time you try
to start ASX:
The installation of the OS/2 version can generally be done just like
for the DOS version, with the difference that all EXE
and OVR files can rightaway be deleted and replaced by the
OS/2 counterparts. In contrast to DOS, the setting of the
ASCMD variable has to be done from CONFIG.SYS (the
position of the statement is however again arbitrary).
As the C version comes in source code, the installation is naturally
a bit more complicated. Roughly said, the necessary steps consist of
an adaption of the makefiles, starting the compilation, doing a
testrun, and installing the executables, includes, and documentation.
The details can be found in README. OS/2 users should also
read README.OS2 to avoid unsatisfying failures!
AS is a command line driven program, i.e. all parameters and file
options are to be given in the command line.
In order to fulfill AS's memory requirements under DOS, the various
code generator modules were moved for the DOS version to an overlay
file. Its existence is checked by the assembler immediately after
program startup. If the file is not found, the program run will find
an abrupt end already at this position... The file AS.OVR
should be always in the same directory as the EXE-file.
Using overlays naturally results in slight overhead. AS tries to
reduce this by using possibly existing EMS or XMS memory. In case
this results in trouble, you may suppress usage of EMS or XMS by
setting the environment variable USEXMS or USEEMS
to n. E.g., it is possible to suppress the using of XMS by
the command:
The DOS extender of the DPMI version can be influenced in its memory
allocation strategies by a couple of environment variables; if you
need to know their settings, you may look up them in the file
DPMIUSER.DOC. ASX is additionally able to extend the available
memory by a swap file. To do this, set up an environment variable
ASXSWAP in the following way:
The longer loading time can be slightly reduced by using the
program RTMRES. It loads the control programs residently and
starts a new shell, resulting in a reduced loading overhead for
subsequent calls. A simple EXIT from the shell removes the
control programs again.
In contrast to all other versions, the language of the C version is
not compiled into the program. The fitting set of messages is instead
loaded from a set of message files at runtime. AS searches the
following directories for these files:
The command line parameters can roughly be divided into two
categories: switches and file specifications. Parameters of these two
categories may be arbitrarily mixed in the command line. The
assembler evaluates at first all parameters and then assembles the
specified files. From this follow two things:
At the moment, the following switches are defined:
This example shows that the assembler assumes ASM as the
default extension for source files.
A bit of caution should be applied when using switches that have
optional arguments: if a file specification immediately follows such
aswitch without the optional argument, AS will try to interprete the
file specification as argument - what of course fails:
Beside from specifying options in the command line, permanently
needed options may be placed in the environment variable
ASCMD. For example, if someone always wants to have assembly
listings and has a fixed directory for include files, he can save a
lot of typing with the following command:
In the case of very long path names, space in the ASCMD
variable may become a problem. For such cases a key file may be the
alternative, in which the options can be written in the same way as
in the command line or the ASCMD-variable. But this file may
contain several lines each with a maximum length of 255 characters.
In a key file it is important, that for options which require an
argument, switches and argument have to be written in the same
line. AS gets informed of the name of the key file by a @
aheaded in the ASCMD variable, e.g.
In case that you like to start AS from another program or a shell and
this shell hands over only lower-case or capital letters in the
command line, the following workaround exists: if a tilde (~) is put
in front of an option letter, the following letter is always
interpreted as a lower-case letter. Similarly a # demands
the interpretation as a capital letter. For example, the following
transformations result for:
As there is no compatible way in C under different operating systens
to find out the amount of available memory resp. stack, both lines
are missing completely from the statistics the C version prints.
Like most assemblers, AS expects exactly one instruction per line
(blank lines are naturally allowed as well). The lines must not be
longer than 255 characters, additional characters are discarded.
A single line has following format:
Some signal processor families from Texas Instruments optionally use
a double line (||) in place of the label to signify the prallel
execution with the previous instruction(s). If these two assembler
instructions become a single instruction word at machine level (C3x),
an additional label in front of the second instruction of course does
not make sense and is not allowed. The situation is different for the
C6x with its instruction packets of variable length: If someone wants
to jump into the middle of an instruction packet (bad style, if you
ask me...), he has to place the necessary label before into
a separate line. The same is valid for conditions, which however may
be combined with the double line in a single source line.
The attribute is used by a couple of processors to specify variations
or different codings of a certain instruction. The most prominent
usage of the attibute is is the specification of the operand size,
for example in the case of the 680x0 family (table 2.5).
Since this manual is not also meant as a user's manual for the
processor families supported by AS, this is unfortunately not the
place to enumerate all possible attributes for all families. It
should however be mentioned that in general, not all instructions of
a given instruction set allow all attributes and that the omission of
an attribute generally leads to the usage of the ''natural'' operand
size of a processor family. For more thorough studies, consult a
reasonable programmer's manual, e.g. [1]
for the 68K's.
In the case of TLCS-9000, H8/500, and M16(C), the attribute serves
both as an operand size specifier (if it is not obvious from the
operands) and as a description of the instruction format to be used.
A colon has to be used to separate the format from the operand size,
e.g. like this:
The number of instruction parameters depends on the mnemonic and is
principally located between 0 and 20. The separation of the
parameters from each other is to be performed only by commas
(exception: DSP56xxx, its parallel data transfers are separated with
blanks). Commas that are included in brackets or quotes, of course,
are not taken into consideration.
Instead of a comment at the end, the whole line can consist of
comment if it starts in the first column with a semicolon.
To separate the individual components you may also use tabulators
instead of spaces.
The listing produced by AS using the command line options i or I is
roughly divisible into the following parts :
In the first part, AS lists the complete contents of all source files
including the produced code. A line of this listing has the following
form:
In the field line, the source line number of the referenced
file is issued. The first line of a file has the number 1. The
address at which the code generated from this line is written follows
after the slash in the field address.
The code produced is written behind address in the field
code, in hexadecimal notation. Depending on the processor type
and actual segment the values are formatted either as bytes or
16/32-bit-words. If more code is generated than the field can take,
additional lines will be generated, in which case only this field is
used.
Finally, in the field source, the line of the source file is
issued in its original form.
The symbol table was designed in a way that it can be displayed on an
80-column display whenever possible. For symbols of ''normal
length'', a double column output is used. If symbols exceed (with
their name and value) the limit of 40 columns (characters), they will
be issued in a separate line. The output is done in alphabetical
order. Symbols that have been defined but were never used are marked
with a star (*) as prefix.
The parts mentioned so far as well as the list of all
macros/functions defined can be selectively masked out from the
listing. This can be done by the already mentioned command line
switch -t. There is an internal byte inside AS whose bits
represent which parts are to be written. The assignment of bits to
parts of the listing is listed in table 2.6.
All bits are set to 1 by default, when using the switch
The cross reference list issues any defined symbol in alphabetical
order and has the following form:
CAUTION! AS can only print the listing correctly if it was
previously informed about the output media's page length and width!
This has to be done with the PAGE instruction (see there).
The preset default is a length of 60 lines and an unlimited line
width.
Symbols are allowed to be up to 255 characters long (as hinted
already in the introduction) and are being distinguished on the whole
length, but the symbol names have to meet some conventions: Symbol
names are allowed to consist of a random combination of letters,
digits, underlines and dots, whereby the first character must not be
a digit. The dot is only allowed to meet the MCS-51 notation of
register bits and should - as far as possible - not be used in own
symbol names. To separate symbol names in any case the underline
(_) and not the dot (.) should be used .
AS is by default not case-sensitive, i.e. it does not matter whether
one uses upper or lower case characters. The command line switch
U however allows to switch AS into a mode where upper and lower
case makes a difference. The predefined symbol CASESENSITIVE
signifies whether AS has been switched to this mode: TRUE means
case-sensitiveness, and FALSE its absence.
Table 2.7 shows the most important
symbols which are predefined by AS.
CAUTION! While it does not matter in case-sensitive mode
which combination of upper and lower case to use to reference
predefined symbols, one has to use exactly the version given above
(only upper case) when AS is in case-sensitive mode!
Additionally some pseudo instructions define symbols that reflect the
value that has been set with these instructions. Their descriptions
are explained at the individual commands belonging to them.
A hidden feature (that has to be used with care) is that symbol names
may be assembled from the contents of string symbols. This can be
achieved by framing the string symbol's name with braces and
inserting it into the new symbol's name. This allows for example to
define a symbol's name based on the value of another symbol:
A complete list of all symbols predefined by AS can be found in
appendix E.
Apart from its value, every symbol also owns a marker which signifies
to which segment it belongs. Such a distinction is mainly
needed for processors that have more than one address space. The
additional information allows AS to issue a warning when a wrong
instruction is used to access a symbol from a certain address space.
A segment attribute is automatically added to a symbol when is gets
defined via a label or a special instruction like BIT; a
symbol defined via the ''allround instructions'' SET
resp. EQU is however ''typeless'', i.e. its usage will never
trigger warnings. A symbol's segment attribute may be queried via the
buit-in function SYMTYPE, e.g.:
In most places where the assembler expects numeric inputs, it is
possible to specify not only simple symbols or constants, but also
complete formula expressions. The components of these formula
expressions can be either single symbols and constants. Constants may
be either integer, floating point, or string constants.
Integer constants describe non-fractional numbers. They may eitther
be written as a sequence of digits or as a sequence of characters
enclosed in single quotation marks. In case they are written
as a sequence of digits, this may be done in different numbering
systems (table 2.9).
In case the numbering system has not been explicitly stated by adding
the special control characters listed in the table, AS assumes the
base given with the RADIX statement (which has itself 10 as
default). This statement allows to set up 'unusual' numbering
systems, i.e. others than 2, 8, 10, or 16.
Valid digits are numbers from 0 to 9 and letters from A to Z (value
10 to 35) up to the numbering system's base minus one. The usage of
letters in integer constants however brings along some ambiguities
since symbol names also are sequences of numbers and letters: a
symbol name must not start with a character from 0 to 9. This means
that an integer constant which is not clearly marked a such with a
special prefix character never mav begin with a letter. One has to
add an additional, otherwise superfluous zero in front in such cases.
The most prominent case is the writing of hexadecimal constants in
Intel mode: If the leftmost digit is between A and F, the trailing H
doesn't help anything, an additional 0 has to be prefixed (e.g. 0F0H
instead of F0H). The Motorola and C syntaxes whis both mark the
numbering system at the front of a constant do not have this problem
(hehehe..).
Quite tricky is furthermore that the higher the default numbering
system set via RADIX becomes, the more letters used to
denote numbering systems in Intel and C syntax become 'eaten'. For
example, you cannot write binary constants anymore after a RADIX
16, and starting at RADIX 18, the Intel syntax even
doesn't allow to write hexadecimal constants any more. Therefore
CAUTION!
With the help of the RELAXED instruction (see section 3.8.6), the strict assignment of a syntax
to a certain target processor can be removed. The result is that an
arbitrary syntax may be used (loosing compatibility to standard
assemblers). This option is however turned off by default.
Integer constants may also be written as ASCII values, like in the
following examples:
Floating point constants are to be written in the usual scientific
notation, which is known in the most general form:
String constants have to be included in double quotation
marks (to distinguish them from the abovementioned ASCII-integers).
In order to make it possible to write quotation marks and special
characters without trouble in string constants, an ''escape
mechanism'' has been implemented, which should sound familiar for C
programmers:
The assembler understands a backslash (\) with a following decimal
number of three digits maximum in the string as a character with the
according decimal ASCII value. The numerical value may alternitavely
be written in hexadecimal or octal notation if it is prefixed with an
x resp. a 0. In case of hexadecimal notation, the maximum number of
digits is limited to 2. For example, it is possible to include an ETC
character by writing\3. But be careful with the definition
of NUL characters! The C version currently uses C strings to store
strings internally. As C strings use a NUL character for termination,
the usage of NUL characters in strings is currently not portable!
Some frequently used control characters can also be reached with the
following abbreviations:
By means of this escape character, you can even work formula
expressions into a string, if they are enclosed by braces: e.g.
Except for the insertion of formula expressions, you can use this
''escape-mechanism'' as well in ASCII defined integer constants, like
this:
The calculation of intermediary results within formula expressions is
always done with the highest available resolution, i.e. 32 bits for
integer numbers, 80 bit for floating point numbers and 255 characters
for strings. An possible test of value range overflows is done only
on the final result.
The portable C version only supports floating point values up to 64
bits (resulting in a maximum value of roughly 10 308), but
in turn features integer lengths of 64 bits on some platforms.
The assembler provides the operands listed in table 2.10 for combination.
''Rank'' is the priority of an operator at the separation of
expressions into subexpressions. The operator with the highest rank
will be evaluated at the very end. The order of evaluation can be
defined by new bracketing.
The compare operators deliver TRUE in case the condition fits, and
FALSE in case it doesn't. For the logical operators an expression is
TRUE in case it is not 0, otherwise it is FALSE.
The mirroring of bits probably needs a little bit of explanation: the
operator mirrors the lowest bits in the first operand and leaves the
higher priority bits unchanged. The number of bits which is to be
mirrored is given by the right operand and may be between 1 and 32 .
A small pitfall is hidden in the binary complement: As the
computation is always done with 32 resp. 64 bits, its application on
e.g. 8-bit masks usually results in values taht do not fit into 8-bit
numbers any more due to the leading ones. A binary AND with a fitting
mask is therefore unavoidable!
In addition to the operators, the assembler defines another line of
primarily transcendental functions with floating point arguments
which are listed in tables 2.11 and 2.12.
The functions FIRSTBIT, LASTBIT, and
BITPOS return -1 as result if no resp. not exactly one bit is
set. BITPOS additionally issues an error message in such a
case.
The string function SUBSTR expects the source string as
first parameter, the start position as second and the number of
characters to be extracted as third parameter (a 0 means to extract
all characters up to the end). STRSTR returns the first
occurence of the second string within the first one resp. -1 if the
search pattern was not found. Both functions number characters in a
string starting at 0!
If a function expects floating point arguments, this does not mean it
is impossible to write e.g.
When AS is switched to case-sensitive mode, predefined functions may
be accessed with an arbitrary combination of upper and lower case (in
contrast to predefined symbols). However, in the case of user-defined
functions (see section 3.4.7), a
distinction between upper and lower case is made. This has e.g. the
result that if one defines a function Sin, one can
afterwards access this function via Sin, but all other
combinations of upper and lower case will lead to the predefined
function.
For a correct conversion of lower case letters into capital letters a
DOS version >= 3.30 is required.
This section is the result of a significant amount of hate on the
(legal) way some people program. This way can lead to trouble in
conjunction with AS in some cases. The section will deal with
so-called 'forward references'. What makes a forward reference
different from a usual reference? To understand the difference, take
a look at the following programming example (please excuse my bias
for the 68000 family that is also present in the rest of this
manual):
Unfortunately, things are not that simple in the case of assembler,
because one sometimes has to jump forward in the code or there are
reasons why one has to move variable definitions behind the code. For
our example, this is the case for the conditional branch that is used
to skip over another instruction. When the assembler hits the branch
instruction in the first pass, it is confronted with the situation of
either leaving blank all instruction fields related to the target
address or offering a value that ''hurts noone'' via the formula
parser (which has to evaluate the address argument). In case of a
''simple'' assembler that supports only one target architecture with
a relatively small number of instructions to treat, one will surely
prefer the first solution, but the effort for AS with its dozens of
target architectures would have become extremely high. Only the
second way was possible: If an unknown symbol is detected in the
first pass, the formula parser delivers the program counter's current
value as result! This is the only value suitable to offer an address
to a branch instruction with unknown distance length that will not
lead to errors. This answers also a frequently asked question why a
first-pass listing (it will not be erased e.g. when AS does not start
a second pass due to additional errors) partially shows wrong
addresses in the generated binary code - they are the result of
unresolved forward references.
The example listed above however uncovers an additional difficulty of
forward references: Depending on the distance of branch instruction
and target in the source code, the branch may be either long or
short. The decision however about the code length - and therefore
about the addresses of following labels - cannot be made in the first
pass due to missing knowledge about the target address. In case the
programmer did not explicitly mark whether a long or short branch
shall be used, genuine 2-pass assemblers like older versions of MASM
from Microsoft ''solve'' the problem by reserving space for the
longest version in the first pass (all label addresses have to be
fixed after the first pass) and filling the remaining space with
NOPs in the second pass. AS versions up to 1.37 did the same
before I switched to the multipass principle that removes the strict
separation into two passes and allows an arbitrary number of passes.
Said in detail, the optimal code for the assumed values is generated
in the first pass. In case AS detects that values of symbols changed
in the second pass due to changes in code lengths, simply a third
pass is done, and as the second pass'es new symbol values might again
shorten or lengthen the code, a further pass is not impossible. I
have seen 8086 programs that needed 12 passes to get everything
correct and optimal. Unfortunately, this mechanism does not allow to
specify a maximum number passes; I can only advise that the number of
passes goes down when one makes more use of explicit length
specifications.
Especially for large programs, another situation might arise: the
position of a forward directed branch has moved so much in the second
pass relative to the first pass that the old label value still valid
is out of the allowed branch distance. AS knows of such situations
and suppresses all error messages about too long branches when it is
clear that another pass is needed. This works for 99% of all cases,
but there are also constructs where the first critical instruction
appears so early that AS had no chance up to now to recognize that
another pass is needed. The following example constructs such a
situation with the help of a forward reference (and was the reason
for this section's heading...):
Admittedly, this was quite a lengthy excursion, but I thought it was
necessary. Which is the essence you should learn from this section?
Sometimes it is desirable not only to assign symbolic names to memory
addresses or constants, but also to a register, to emphasize its
function in a certain program section. This is no problem for
processors that treat registers simply as another address space, as
this allows to use numeric expressions and one can use simple
EQUs to define such symbols. (e.g. for the MCS-96 or TMS70000).
However, for most processors, register identifiers are fixed literals
which are seperately treated by AS for speed reasons. A special
mechanism is therefore necessary to define symbolic register names. A
register symbol is usually defined via the REG instruction,
which has otherwise the same syntax as an EQU definition.
This however has a couple of restrictions: A register symbol is a
pure character string stored 'as is' which may exclusively be used
this way. For example, no arithmetic is allowed to calculate a
register's successor, like in the following example:
Analogous to ordinary symbols, register symbols are local to sections
and it is possible to access a register symbol from a specific
section by appending the section's name enclosed in brackets. Due to
the missing ability to do forward references, there is nothing like
a FORWARD directive, and an export by something comparable
to PUBLIC or GLOBAL is also not possible since
register symbols generally have their meaning in a small context.
If there is both an ordinary and a register symbol of same name
present in a context, the register symbol will be preferred. This is
however not the case when the name is embedded into a complex
expression (parentheses are sufficient!), the normal symbol will be
used then.
This function is a by-product from the old pure-68000 predecessors of
AS, I have kept them in case someone really needs it. The basic
problem is to access certain symbols produced during assembly,
because possibly someone would like to access the memory of the
target system via this address information. The assembler allows to
export symbol values by means of SHARED pseudo commands (see
there). For this purpose, the assembler produces a text file with the
required symbols and its values in the second pass. This file may be
included into a higher-level language or another assembler program.
The format of the text file (C, Pascal or Assembler) can be set by
the command line switches p, c or, a.
CAUTION! If none of the switches is given, no file will be
generated and it makes no difference if SHARED-commands are
in the source text or not!
When creating a Sharefile, AS does not check if a file with the same
name already exists, such a file will be simply overwritten. In my
opinion a request does not make sense, because AS would ask at each
run if it should overwrite the old version of the Sharefile, and that
would be really annoying...
Common microcontroller families are like rabbits: They become more at
a higher speed than you can provide support for them. Especially the
development of processor cores as building blocks for ASICs and of
microcontroller families with user-definable peripherals has led to a
steeply rising number of controllers that only deviate from a
well-known type by a slightly modified peripheral set. But the
distinction among them is still important, e.g. for the design of
include files that only define the appropriate subset of peripherals.
I have struggled up to now to integrate the most important
reperesentatives of a processor family into AS (and I will continue
to do this), but sometimes I just cannot keep pace with the
development...there was an urgent need for a mechanism to extend the
list of processors by the user.
The result are processor aliases: the alias command line option
allows to define a new processor type, whose instruction set is equal
to another processor built into AS. After switching to this processor
via the CPU instruction, AS behaves exactly as if the
original processor had been used, with a single difference: the
variables MOMCPU resp. MOMCPUNAME are set to the
alias name, which allows to use the new name for differentiation,
e.g. in include files.
There were two reasons to realize the definition of aliases by the
command line and not by pseudo instructions: first, it would anyway
be difficult to put the alias definitions together with register
definitions into a single include file, because a program that wants
to use such a file would have to include it before and after the CPU
instruction - an imagination that lies somewhere between inelegant
and impossible. Second, the definition in the command line allows to
put the definitions in a key file that is executed automatically at
startup via the ASCMD variable, without a need for the
program to take any further care about this.
Not all pseudo instructions are defined for all processors. A note
that shows the range of validity is therefore prepended to every
individual description.
valid for: all processors
SET and EQU allow the definition of typeless
constants, i.e. they will not be assigned to a segment and their
usage will not generate warnings because of segment mixing.
EQU defines constants which can not be modified (by
EQU) again, but SET permits the definition of
variables, which can be modified during the assembly. This is useful
e.g. for the allocation of resources like interrupt vectors, as shown
in the following example:
EQU/SET allow to define constants of all possible types,
e.g.
A simple equation sign may be used instead of EQU.
Similarly, one may simply write := instead of SET
resp. EVAL.
Symbols defined with SET or EQU are typeless by
default, but optionally a segment name (CODE, DATA, IDATA, XDATA,
YDATA, BITDATA, IO, or REG) or MOMSEGMENT for
the currently active segment may be given as a second parameter,
allowing to assign the symbol to a specific address space. AS does
not check at this point if the used address space exists on the
currently active target processor!
valid for: various, SFRB only MCS-51
These instructions act like EQU, but symbols defined with
them are assigned to the directly addressable data segment, i.e. they
serve preferential for the definition of RAM-cells and (as the name
lets guess) hardware registers mapped into the data area. The allowed
range of values is equal to the range allowed for ORG in the
data segment (see section 3.2.1). The
difference between SFR and SFRB is that
SFRB marks the register as bit addressable, which is why AS
generates 8 additional symbols which will be assigned to the bit
segment and carry the names xx.0 to xx.7, e.g.
Whenever a bit-addressable register is defined via SFRB, AS
checks if the memory address is bit addressable (range 20h..3fh resp.
80h, 88h, 90h, 98h...0f8h). If it is not bit-addressable, a warning
is issued and the generated bit symbols are undefined.
valid for: DSP56xxx
Also the DSP56000 has a few peripheral registers memory-mapped to the
RAM, but the affair becomes complicated because there are two data
areas, the X- and Y-area. This architecture allows on the one hand a
higher parallelism, but forces on the other hand to divide the
normal SFR instruction into the two above mentioned
variations. They works identically to SFR, just that
XSFR defines a symbol in the X- addressing space and YSFR a
corresponding one in the Y-addressing space. The allowed value range
is 0..$ffff.
valid for: all processors
The function of the LABEL instruction is identical to
EQU, but the symbol does not become typeless, it gets the
attribute ''code''. LABEL is needed exactly for one purpose:
Labels are normally local in macros, that means they are not
accessible outside of a macro. With an EQU instruction you
could get out of it nicely, but the phrasing
valid for: MCS/(2)51, XA, 80C166, 75K0, ST9
BIT serves to equate a single bit of a memory cell with a
symbolic name. This instruction varies from target platform to target
platform due to the different ways in which processors handle bit
manipulation and addressing:
The MCS/51 family has an own address space for bit operands. The
function of BIT is therefore quite similar to SFR,
i.e. a simple integer symbol with the specified value is generated
and assigned to the BDATA segment. For all other processors,
bit addressing is done in a two-dimensional fashion with address and
bit position. In these cases, AS packs both parts into an integer
symbol in a way that depends on the currently active target processor
and separates both parts again when the symbol is used. The latter is
is also valid for the 80C251: While an instruction like
The BIT instruction of the 75K0 family even goes further: As
bit expressions may not only use absolute base addresses, even
expressions like
The ST9 in turn allows to invert bits, what is also allowed in
the BIT instruction:
valid for: TMS 370xxx
Though the TMS370 series does not have an explicit bit segment,
single bit symbols may be simulated with this instruction.
DBIT requires two operands, the address of the memory cell that
contains the bit and the exact position of the bit in the byte. For
example,
valid for: 8080/8085/8086, XA, Z80, 320xx, TLCS-47, AVR
PORT works similar to EQU, just the symbol becomes
assigned to the I/O-address range. Allowed values are 0..7 at the
3201x, 0..15 at the 320C2x, 0..65535 at the 8086, 0..63 at the AVR,
and 0..255 at the rest.
Example : an 8255 PIO is located at address 20H:
valid for: AVR, M*Core, ST9, 80C16x
Though it always has the same syntax, this instruction has a slightly
different meaning from processor to processor: If the processor uses
a separate addressing space for registers, REG has the same
effect as a simple EQU for this address space (e.g. for the
ST9). REG defines register symbols for all other processors
whose function is described in section 2.10.
valid for: 8X30x LIV and RIV allow to
define so-called ''IV bus objects''. These are groups of bits located
in a peripheral memory cell with a length of 1 up to 8 bits, which
can afterwards be referenced symbolically. The result is that one
does not anymore have to specify address, position, and length
separately for instructions that can refer to peripheral bit groups.
As the 8X30x processors feature two peripheral address spaces (a
''left'' and a ''right'' one), there are two separate pseudo
instructions. The parameters of these instructions are however equal:
three parameters have to be given that specify address, start
position and length. Further hints for the usage of bus objects can
be found in section 4.16 .
valid for: all processors
Single board systems, especially when driving LCDs, frequently use
character sets different to ASCII. So it is probably purely
coincidental that the umlaut coding corresponds with the one used by
the PC. To avoid error-prone manual encoding, the assembler contains
a translation table for characters which assigns a target character
to each source-code. To modify this table (which initial translates
1:1), one has to use the CHARSET instruction.
CHARSET may be used with different numbers and types of
parameters. If there is only a single parameter, it has to be a
string expression which is interpreted as a file name by AS. AS reads
the first 256 bytes from this table and copies them into the
translation table. This allows to activate complex, externally
generated tables with a single statement. For all other variants, the
first parameter has to be an integer in the range of 0 to 255 which
designates the start index of the entries to be modified in the
translation table. One or two parameters follow, giving the type of
modification:
A single additional integer modies exactly one entry. For example,
For the last variant, a string follows the start index and contains
the characters to be placed in the table. The last example therefore
may also be written as
CHARSET may also be called without any parameters, which
however has a drastical effect: the translation table is
reinitialized to its initial state, i.e. all character translations
are removed.
CAUTION! CHARSET not only affects string constants
stored in memory, but also integer constants written as ''ASCII''.
This means that an already modified translation table can lead to
other results in the above mentioned examples!
valid for: all processors
Though the CHARSET statement gives unlimited freedom in the
character assignment between host and target platform, switching
among different character sets can become quite tedious if
several character sets have to be supported on the target platform.
The CODEPAGE instruction however allows to define and keep
different character sets and to switch with a single statement among
them. CODEPAGE expects one or two arguments: the name of the
set to be used hereafter and optionally the name of another table
that defines its initial contents (the second parameter therefore
only has a meaning for the first switch to the table when AS
automatically creates it). If the second parameter is missing, the
initial contents of the new table are copied from the previously
active set. All subsequent CHARSET statements only
modify the new set.
At the beginning of a pass, AS automatically creates a single
character set with the name STANDARD with a one-to-one
translation. If no CODEPAGE instructions are used, all
settings made via CHARSET refer to this table.
valid for: all processors
Similar to the same-named instruction known from C, ENUM is
used to define enumeration types, i.e. a sequence of integer
constants that are assigned sequential values starting at 0. The
parameters are the names of the symbols, like in the following
example:
ENUM instructions are always single-line instructions, i.e.
the enumeration will again start at zero when a new ENUM
instruction is found. Multi-line enumerations may however be achieved
with a small trick that exploits the fact that the internal counter
can be set to a new value with an explicit assignment, like in the
following case:
valid for: all processors
Even for assembler programs, there is from time to time the need to
define complex data structures similar to high-level languages. AS
supports this via the instructions STRUCT and
ENDSTRUCT that begin resp. finish the definition of such a
structure. The operation is simple: Upon occurrence of a
STRUCT, the current value of the program counter is saved and
the PC is reset to zero. By doing so, all labels placed obtain the
offset values of the structure's members. The reservation of space
for the individual fields is done with the instructions used on the
currently active processor to reserve memory space, e.g.
DS.x for Motorolas and DB & co. for Intels. The
label prepended to STRUCT (not optional) is the record's
name and may optionally be repeated for the ENDSTRUCT
statement. ENDSTRUCT furthermore places the record's total
length in the symbol <Name_len> (one may force the
usage of another symbol by giving its name as an argument to
ENDSTRUCT). For example, in the definition
STRUCT definitions may be nested; after the inner
STRUCT definition has been ended, the address counter of the
outer structure will be automatically incremented by the inner
structure's size (the counting inside the inner structure of course
starts at zero).
To avoid ambiguities when fields in different structures have same
names, AS by default prepends the structures name to the field names,
separated by an underbar. For the example listed above, the
symbols Rec_Ident, Rec_Pad, and Rec_Pointer would
be created. This behaviour can be suppressed by giving
NOEXTNAMES as a parameter to the STRUCT statement. This
works in the same sense for nested structure definitions, i.e. field
names are extended by the names of all structures that did not obtain
a NOEXTNAMES directive.
valid for: all processors
PUSHV and POPV allow to temporarily save the value
of a symbol (that is not macro-local) and to restore it at a later
point of time. The storage is done on stacks, i.e. Last-In-First-Out
memory structures. A stack has a name that has to fulfill the general
rules for symbol names and it exists as long as it contains at least
one element: a stack that did not exist before is automatically
created upon PUSHV, and a stack becoming empty upon a
POPV is deleted automatically. The name of the stack that shall
be used to save or restore symbols is the first parameter of
PUSH resp. POPV, followed by a list of symbols as
further parameters. All symbols referenced in the list already have
to exist, it is therefore not possible to implicitly define
symbols with a POPV instruction.
Stacks are a global resource, i.e. their names are not local to
sections.
It is important to note that symbol lists are always processed
from left to right. Someone who wants to pop several variables from a
stack with a POPV therefore has to use the exact reverse
order used in the corresponding PUSHV!
The name of the stack may be left blank, like this:
AS checks at the end of a pass if there are stacks that are not empty
and issues their names together with their ''filling level''. This
allows to find out if there are any unpaired PUSHVs or
POPVs. However, it is in no case possible to save values in a
stack beyond the end of a pass: all stacks are cleared at the
beginning of a pass!
valid for: all processors
ORG allows to load the internal address counter (of the
assembler) with a new value. The value range depends on the currently
selected segment and on the processor type (tables 3.1 to 3.4). The
lower bound is always zero, and the upper bound is the given value
minus 1:
In case that different variations in a processor family have address
spaces of different size, the maximum range is listed for each.
ORG is mostly needed to give the code a new starting
address or to put different, non-continuous code parts into one
source file. In case there is no explicit other value listet in a
table entry, the initial address for this segment (i.e. the start
address used without ORG) is 0.
valid for: all processors
This command rules for which processor the further code shall be
generated. Instructions of other processor families are not
accessible afterwards and will produce error messages!
The processors can roughly be distinguished in families, inside the
families different types additionally serve for a detailed
distinction:
1.1. License Agreement
You can contact me as follows:
If someone likes to meet me personally to ask questions and lives
near Aachen (= Aix-la-Chapelle), you will be able to meet me there.
You can do this most probably on thursdays from 7pm to 9pm at the
computerclub inside the RWTH Aachen (Eilfschornsteinstrasse 16,
cellar of philosophers' building, backdoor entry).
ftp.uni-stuttgart.de, directory
pub/systems/msdos/programming/as
The sources of the C version can be fetched from the following
server:
sunsite.unc.edu, directory
pub/Linux/devel/lang/assemblers/asl-<version>.tar.gz
..and of course thereby from every Sunsite mirror of the world!
1.2. General Capabilities of the Assembler
under work / planned / in consideration :
I'm currently searching for documentation about the following
families:
unloved, but now, however, present :
The switch to a different code generator is allowed even within one
file, and as often as one wants!
Kirk: Analysis, Mr. Spock?
Spock: Captain, it doesn't appear in the symbol table.
Kirk: Then it's of external origin?
Spock: Affirmative.
Kirk: Mr. Sulu, go to pass two.
Sulu: Aye aye, sir, going to pass two.
File
Function
AS.EXE
AS.OVR
AS.DOC
PLIST.EXE
BIND.EXE
P2HEX.EXE
P2BIN.EXE
AS2MSG.EXEassembler
overlay for assembler
this file containing the documentation
lists contents of code files
merges code files
converts code files into hex files
converts code files into binary files
error message filter AS --> Borland-Pascal
80C50X.INC
80C552.INC
H8_3048.INC
STDDEF04.INC
STDDEF16.INC
STDDEF17.INC
STDDEF18.INC
STDDEF2X.INC
STDDEF37.INC
STDDEF3X.INC
STDDEF47.INC
STDDEF51.INC
STDDEF56.INC
STDDEF5X.INC
STDDEF60.INC
STDDEF62.INC
STDDEF75.INC
STDDEF87.INC
STDDEF90.INC
STDDEF96.INC
STDDEFXA.INC
STDDEFZ8.INC
REG166.INC
REG251.INCregister addresses SAB C50x
register addresses 80C552
register addresses H8/3048
register addresses 6804
command macros and register addresses
PIC16C5x
register addresses PIC17C4x
register addresses PIC16C8x
register addresses TMS 3202x
register & bit addresses TMS370xxx
peripheral addresses TMS 320C3x
command macros TLCS-47
definition of SFRs and bits for 8051/8052/ 80515
register addresses DSP56000
peripheral addresses TMS 320C5x
instruction macros & register addresses PowerPC
register addresses & macros ST6
register addresses 75K0
register & memory addresses TLCS-870
register & memory addresses TLCS-90
register & memory addresses TLCS-900
SFR & bit addresses Philips XA
register addresses Z8-family
addresses & command macros 80C166/167
addresses & bits 80C251
File
Function
REG29K.INC
REG53X.INC
REG683XX.INC
REG7000.INC
REG78K0.INC
REG96.INC
REGAVR.INC
REGCOP8.INC
REGHC12.INC
REGM16C.INC
REGMSP.INC
REGST9.INC
REGZ380.INCperipheral addresses AMD 2924x
register addresses H8/53x
register addresses 68332/68340
register addresses TMS70Cxx
register and memory addresses 78K0
register addresses 8096
register addresses Atmel AVR
register addresses COP8
register addresses Motorola 68HC12...
register addresses Mitsubishi M16C
instruction macros & register addresses MSP430
register addresses ST9
on-chip register Z380
CTYPE.INC
BITFUNCS.INCstandard functions to analyze characters
standard functions for bit manipulation
DEMOCODE.ASM
DEMOMAC.ASM
DEMOPHAS.ASM
DEMOLIST.ASMsample programs for this assembler
File
Function
ASX.EXE
DPMI16BI.OVL
DPMILOAD.EXE
RTM.EXE
RTMRES.EXE
DPMINST.EXE
DPMIUSER.DOC
protected mode assembler
DPMI server for the assembler
loader for the DPMI server
runtime module of the assembler
make DPMI server resident
install DPMI server for the computer
information about the DPMI servers's
usage
File
Function
AS2.EXE
PLIST2.EXE
BIND2.EXE
P2HEX2.EXE
P2BIN2.EXEassembler, OS/2 version
utility programs for AS2; function is analog
to the MSDOS versions
c:\as
c:\as\bin
c:\as\include
c:\as\lib
c:\as\doc
c:\as\demos
First, copy all EXE and OVR files from the archive
to the bin directory; extend your PATH statement
in AUTOEXEC.BAT by this directory. Move all INC
files to the include subdirectory. Create a file
AS.RC in the lib subdirectory that contains the following line:
-i c:\as\include
This so-called key file tells AS where to look for include files. You
have to extend your AUTOEXEC.BAT by the statement
set ASCMD=@c:\as\lib\as.rc
to tell AS where to find the key file on invocation. The following
section will describe what additional options can be set in the key
file. Finally, move all DOC files to the subdirectory with
the same name and all demo assembler files to the demos
subdirectory. That's all folks!
machine not in database (run DPMIINST)
ASX continually has to switch between real and protected mode; it is
therefore important to find the most efficient way to do this. For
this purpose (and only for this), you have to run the program
DPMIINST once. As DPMIINST wants to be the sole master
of the protected mode, you probably have to remove an installed HIMEM
driver before you can do the installation (you may insert HIMEM again
after the run). Simply follow the instructions DPMIINST
prints. You do not need this program after the installation any more,
only the following ones are still required:
2.4. Start-Up Command, Parameters
SET USEXMS=n
Since AS performs all in- and output via the operating system (and
therefore it should run also on not 100% compatible DOS-PC's) and
needs some basic display control, it emits ANSI control sequences
during the assembly. In case you should see strange characters in the
messages displayed by AS, your CONFIG.SYS is obviously
lacking a line like this:
device=ansi.sys
but the further functions of AS will not be influenced hereby.
Alternatively you are able to suppress the output of ANSI sequences
completely by setting the environment variable USEANSI
to n.
SET ASXSWAP=<size>[,file name]
The size specification has to be done in megabytes and has to
be done. The file name in contrast is optional; if it is missing, the
file is named ASX.TMP and placed in the current directory.
In any case, the swap file is deleted after program end.
If no file is found or the found message file is incompatible with
the version of AS, program execution is terminated.
Parameter switches are recognized by AS by starting with a slash (/)
or hyphen (-). There are switches that are only one character long
and additionally switches composed of a whole word. Whenever AS
cannot interpret a switch as a whole word, it tries to interprete
every letter as an individual switch. For example, if you write
-queit
instead of
-quiet
AS will take the letters q, u, e, i, and t as
individual switches. Multiple-letter switches additionally have the
difference to single-letter switches that AS will accept an arbitrary
mixture of upper and lower casing, whereas single-letter switches may
have a different meaning depending on whether upper or lower case is
used.
Concerning effect and function of the SHARED-symbols please see
chapters 2.11 resp. 3.8.1.
As long as switches require no arguments and their concatenation does
not result in a multi-letter switch, it is possible to specify
several switches at one time, as in the following example :
as test*.asm firstprog -cl /i c:\as\8051\include
All files TEST*.ASM as well as the file
FIRSTPROG.ASM will be assembled, whereby listings of all files
are displayed on the console terminal. Additional sharefiles will be
generated in the C- format. The assembler should search for
additional include files in the directory C:\AS\8051\INCLUDE.
as -g test.asm
The solution in this case would either be to move the -g option the
end or to specify an explicit MAP argument.
set ascmd=-L -i c:\as\8051\include
The environment options are processed before the command line, so
options in the command line can override contradicting ones in the
environment variable.
set ASCMD=@c:\as\as.key
In order to neutralize options in the ASCMD variable (or in
the key file), prefix the option with a plus sign. For example, if
you do not want to generate an assembly listing in an individual
case, the option can be retracted in this way:
as +L <file>
Naturally it is not consequently logical to deny an option by a plus
sign.... UNIX soit qui mal y pense.
/~I ---> /i
-#u ---> -U
In dependence of the assembly's outcome, the assembler ends with the
following return codes:
Similar to UNIX, OS/2 extends an application's data segment on demand
when the application really needs the memory. Therefore, an output
like
511 KByte available memory
does not indicate a shortly to come system crash due to memory lack,
it simply shows the distance to the limit when OS/2 will push up the
data segment's size again...
2.5. Format of the Input Files
[label[:]] <mnemonic>[.attr] [param[,param..]] [;comment]
The colon for the label is optional, in case the label starts in the
first column (the consequence is that a mnemonic must not start in
column 1). It is necessary to set the colon in case the label does
not start in the first column so that AS is able to distinguish it
from a mnemonic. In the latter case, there must be at least one space
between colon and mnemonic if the processor belongs to a family that
supports an attribute that denotes an instruction format and is
separated from the mnemonic by a colon. This restriction is necessary
to avoid ambiguities: a distinction between a mnemonic with format
and a label with mnemonic would otherwise be impossible.
attribute
arithmetic-logic instruction
jump instruction
B
W
L
Q
S
D
X
Pbyte (8 bits)
word (16 bits)
long word (32 bits)
quad word (64 bits)
single precision (32 bits)
double precision (64 bits)
extended precision (80/96 bits)
decimal floating point (80/96 bits)---------
---------
16-bit-displacement
---------
8-bit-displacement
---------
32-bit-displacement
---------
add.w:g rw10,rw8
This example does not show that there may be a format specification
without an operand size. In contrast, if an operand size is used
without a format specification, AS will automatically use the
shortest possible format. The allowed formats and operand sizes again
depend on the machine instruction and may be looked up e.g. in [85], [14], [30], resp. [31].
The two last ones are only generated if they have been demanded by
additional command line options.
[<n>] <line>/<address> <code> <source>
In the field n, AS displays the include nesting level. The
main file (the file where assembly was started) has the depth 0, an
included file from there has depth 1 etc.. Depth 0 is not displayed.
bit
part
0
1
2
3
4
5
7source file(s) + produced code
symbol table
macro list
function list
line numbering
register symbol list
character set table
-t <mask>
Bits set in <mask> are cleared, so that the respective
listing parts are suppressed. Accordingly it is possible to switch on
single parts again with a plus sign, in case you had switched off too
much with the ASCMD variable... If someone wants to have,
for example, only the symbol table, it is enough to write:
-t 2
The usage list issues the occupied areas hexadecimally for every
single segment. If the area has only one address, only this is
written, otherwise the first and last address.
symbol <symbol name> (=<value>,<file>/<line>):
file <file 1>:
<n1>[(m1)] ..... <nk>[(mk)]
.
.
file <file l>:
<n1>[(m1)] ..... <nk>[(mk)]
The cross reference list lists for every symbol in which files and
lines it has been used. If a symbol was used several times in the
same line, this would be indicated by a number in brackets behind the
line number. If a symbol was never used, it would not appear in the
list; The same is true for a file that does not contain any
references for the symbol in question.
2.7. Symbol Conventions
name
meaning
TRUE
FALSE
CONSTPI
VERSION
ARCHITECTURE
DATE
TIME
MOMCPU
MOMFILE
MOMLINE
MOMPASS
MOMSECTION
*, $ resp. PClogically ''true''
logically ''false''
Pi (3.1415.....)
version of AS in BCD-coding,
e.g. 1331 hex for version 1.33p1
target platform AS was compiled for, in
the style processor-manufacturer-operating
system
date and
time of the assembly (start)
current target CPU
(see the CPU instruction)
current source file
line number in source file
number of the currently running pass
name of the current section
or an empty string
current value of program counter
cnt set cnt+1
temp equ "\{CNT}"
jnz skip{temp}
.
.
skip{temp}: nop
CAUTION: The programmer has to assure that only valid symbol
names are generated!
Label:
.
.
Attr equ symtype(Label) ; results in 1
The individual segment types have the assigned numbers listed in
table 2.8. Register symbols which do
not really fit into the order of normal symbols are explained in
section 2.10. The SYMTYPE
function delivers -1 as result when called with an undefined symbol
as argument.
segment
return value
<none>
CODE
DATA
IDATA
XDATA
YDATA
BITDATA
IO
REG
ROMDATA
<register symbol>0
1
2
3
4
5
6
7
8
9
1282.8.1. Integer Constants
Intel mode
(Intel, Zilog,
Thomson Texas,
Toshiba, NEC,
Siemens, Philips)Motorola mode
(Rockwell, Motorola,
Microchip, Thomson,
Hitachi, Atmel)
C mode
(PowerPC,
AMD 29K,
National,
Symbios)
decimal
hexadecimal
binary
octaldirect
followed by H
followed by B
followed by Odirect
aheaded $
aheaded %
aheaded @direct
aheaded 0x
aheaded 0b
aheaded 0
'A' ==$41
'AB' ==$4142
'ABCD' ==$41424344
It is important to write the characters in single quotes, to
distinguish them from string constants (discussed somewhat later).
2.8.2. Floating Point Constants
[-]<integer digits>[.post decimal positions][E[-]exponent]
CAUTION! The assembler first tries to interprete a constant as
an integer constant and makes a floating-point format try only in
case the first one failed. If someone wants to enforce the evaluation
as a floating point number, this can be done by dummy post decimal
positions, e.g. 2.0 instead of 2.
\b : Backspace \a : Bell \e : Escape
\t : Tabulator \n : Linefeed \r : Carriage Return
\\ : Backslash \' or \H : Apostrophe
\" or \I : Quotation marks
Both upper and lower case characters may be used for the
identification letters.
message "root of 81 : \{sqrt(81)}"
results in
root of 81 : 9
AS chooses with the help of the formula result type the correct
output format, further string constants, however, are to be avoided
in the expression. Otherwise the assembler will get mixed up at the
transformation of capitals into lower case letters.
move.b #'\n',d0
However, everything has its limits, because the parser with higher
priority, which disassembles a line into op-code and parameters, does
not know what it is actually working with, e.g. here:
move.l #'\'abc',d0
After the third apostrophe, it will not find the comma any more,
because it presumes that it is the start of a further character
constant. An error message about a wrong parameter number is the
result. A workaround would be to write e.g., \i instead of \'.
operand
function
#operands
integer
float
string
rank
<>
>=
<=
<
>
=
==
!!
||
&&
~~
-
+
#
/
*
^
!
|
&
><
>>
<<
~ inequality
greater or equal
less or equal
truly smaller
truly greater
equality
alias for =
log. XOR
log. OR
log. AND
log. NOT
difference
sum
modulo division
quotient
product
power
binary XOR
binary OR
binary AND
mirror of bits
log. shift right
log. shift left
binary NOT2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes*)
yes
yes
yes
yes
yes
yes
yes
yes
yesyes
yes
yes
yes
yes
yes
no
no
no
no
yes
yes
no
yes
yes
yes
no
no
no
no
no
no
noyes
yes
yes
yes
yes
yes
no
no
no
no
no
yes
no
no
no
no
no
no
no
no
no
no
no14
14
14
14
14
14
13
12
11
2
10
10
9
9
9
8
7
6
5
4
3
3
1
*) remainder will be discarded
name
meaning
argument
result
SQRT
SIN
COS
TAN
COT
ASIN
ACOS
ATAN
ACOT
EXP
ALOG
ALD
SINH
COSH
TANH
COTH
LN
LOG
LD
ASINH
ACOSH
ATANH
ACOTH
INTsquare root
sine
cosine
tangent
cotangent
inverse sine
inverse cosine
inverse tangent
inverse cotangent
exponential function
10 power of argument
2 power of argument
hyp. sine
hyp. cosine
hyp. tangent
hyp. cotangent
nat. logarithm
dec. logarithm
bin. logarithm
inv. hyp. Sine
inv. hyp. Cosine
inv. hyp. Tangent
inv. hyp. Cotangent
integer partarg >= 0
arg in R
arg in R
arg <> (2n+1)*(Pi)/(2)
arg <> n*Pi
| arg | <= 1
| arg | <= 1
arg in R
arg in R
arg in R
arg in R
arg in R
arg in R
arg in R
arg in R
arg <> 0
arg > 0
arg > 0
arg > 0
arg in R
arg >= 1
arg < 1
arg > 1
arg in Rfloating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
floating point
BITCNT
FIRSTBITnumber of one's
lowest 1-bitinteger
integerinteger
integer
name
meaning
argument
result
LASTBIT
BITPOS
SGN
ABS
TOUPPER
TOLOWER
UPSTRING
LOWSTRING
STRLEN
SUBSTR
STRSTR
VAL
highest 1-bit
unique 1-bit
sign (0/1/-1)
absolute value
matching capital
matching lower case
changes all
characters
into capitals
changes all
characters
into to lower case
returns the length
of a string
extracts parts of a
string
searches a substring
in a string
evaluates contents
as expressioninteger
integer
floating point
or integer
integer or
floating point
integer
integer
string
string
string
string,
integer,
integer
string,
string
string
integer
integer
integer
integer or
floating point
integer
integer
string
string
integer
string
integer
depends on
argument
sqr2 equ sqrt(2)
In such cases an automatic type conversion is engaged. In the reverse
case the INT-function has to be applied to convert a
floating point number to an integer. When using this function, you
have to pay attention that the result produced always is a signed
integer and therefore has a value range of approximately +/-2.0E9.
2.9. Forward References and Other Disasters
move.l d0,#10
loop: move.l d1,(a1)
beq skip
neg.l d1
skip: move.l (a1+),d1
dbra d0,loop
If one overlooks the loop body with its branch statement, a program
remains that is extremely simple to assemble: the only reference is
the branch back to the body's beginning, and as an assembler
processes a program from the beginning to the end, the symbol's value
is already known before it is needed the first time. If one has a
program that only contains such backward references, one has the nice
situation that only one pass through the source code is needed to
generate a correct and optimal machine code. Some high level
languages like Pascal with their strict rule that everything has to
be defined before it is used exploit exactly this property to speed
up the compilation.
cpu 6811
org $8000
beq skip
rept 60
ldd Var
endm
skip: nop
Var equ $10
Due to the address position, AS assumes long addresses in the first
pass for the LDD instructions, what results in a code length
of 180 bytes and an out of branch error message in the second pass
(at the point of the BEQ instruction, the old value of
skip is still valid, i.e. AS does not know at this point that
the code is only 120 bytes long in reality) is the result. The error
can be avoided in three different ways:
Another tip regarding the EQU instruction: AS cannot know in
which context a symbol defined with EQU will be used, so
an EQU containing forward references will not be done at all
in the first pass. Thus, if the symbol defined with EQU gets
forward-referenced in the second pass:
move.l #sym2,d0
sym2 equ sym1+5
sym1 equ 0
one gets an error message due to an undefined symbol in the second
pass...but why on earth do people do such things?
myreg reg r17 ; definition of register symbol
addi myreg+1,3 ; does not work!
Additionally, a register symbol has to be defined prior to ist first
usage; a forward reference would have the result that AS suspects a
forward reference to a memory location in case a register symbol is
not found. Since the usage of memory operands is far more limited on
most processors, a bunch of errors would be the result...
2.12. Processor Aliases
VecCnt set 0 ; somewhere at the beginning
.
.
.
DefVec macro Name ; allocate a new vector
Name equ VecCnt
VecCnt set VecCnt+4
endm
.
.
.
DefVec Vec1 ; results in Vec1=0
DefVec Vec2 ; results in Vec2=4
constants and variables are internally stored in the same way, the
only difference is that they can be modified by SET and not
by EQU. It is therefore possible to define a symbol with
EQU and to change it with SET (even if this is not its
real business). There is also another reason to avoid this
explicitly: In contrast to SET, EQU checks whether
the newly assigned value is different to from a value that previously
might have existed. As the value should not change for constants
defined with EQU, AS assumes a phase error and starts
another pass...for example, if one would use EQU instead
of SET to initialize the counter in the previous example, AS
would not quarrel as a single reassignment per pass is valid (this
line is executed only n times in n passes), but endless repassing
would be the result as the counter's initial value is always
different to the final value in the previous pass.
IntTwo equ 2
FloatTwo equ 2.0
Some processors unfortunately have already a SET
instruction. For these targets, EVAL must be used instead
of SET.
PSW sfr 0d0h ; results in PSW = D0H (data segment)
PSW sfrb 0d0h ; results in extra PSW.0 = D0H (bit)
; to PSW.7 = D7H (bit)
The SFRB instruction is not any more defined for the 80C251
as it allows direct bit access to all SFRs without special bit
symbols; bits like PSW.0 to PSW.7 are automatically
present.
<name> label $
generates a symbol with correct attributes.
My_Carry bit PSW.7
would assign the value 0d7h to My_Carry on an 8051, a value
of 070000d0h would be generated on an 80C251, i.e. the address is
located in bits 0..7 and the bit position in bits 24..26. This
procedure is equal to the way the DBIT instruction handles
things on a TMS370 and is also used on the 80C166, with the only
difference that bit positions may range from 0..15:
MSB BIT r5.15
On a Philips XA, the bit's address is located in bits 0..9 just with
the same coding as used in machine instructions, and the 64K bank of
bits in RAM memory is placed in bits 16..23.
bit1 BIT @h+5.2
are allowed.
invbit BIT r6.!3
More about the ST9's BIT instruction can be found in the
processor specific hints.
INT3 EQU P019
INT3_ENABLE DBIT 0,INT3
defines the bit that enables interrupts via the INT3 pin. Bits
defined this way may be used in the instructions SBIT0, SBIT1,
CMPBIT, JBIT0, and JBIT.
PIO_port_A port 20h
PIO_port_B port PIO_port_A+1
PIO_port_C port PIO_port_A+2
PIO_ctrl port PIO_port_A+3
CHARSET 'ä',128
means that the target system codes the 'ä' into the number 128
(80H). If however two more integers are given, the first one
describes the last entry to be modified, and the second the new value
of the first table entry. All entries up to the index end are loaded
sequentially. For example, in case that the target system does not
support lower-case characters, a simple
CHARSET 'a','z','A'
translates all lower-case characters automatically into the matching
capital letters.
CHARSET 'a',"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
ENUM SymA,SymB,SymC
This instruction will assign the values 0, 1, and 2 to the
symbols SymA, SymB, and SymC.
ENUM January=1,February,March,April,May,June
The numeric values 1..6 are assigned to month names. One can continue
the enumeration in the following way:
ENUM July=June+1,August,September,October
ENUM November=October+1,December
A definition of a symbol with ENUM is equal to a definition
with EQU, i.e. it is not possible to assign a new value to a
symbol that already exists.
Rec STRUCT
Ident db ?
Pad db ?
Pointer dd ?
Rec ENDSTRUCT
the symbol Rec_len would obtain the value 6. CAUTION!
Inside of a structure definition, no instructions may be used that
generate code, as this is a pure placement of elements in the address
space!
pushv ,var1,var2,var3
.
.
popv ,var3,var2,var1
AS will then use a predefined internal default stack.
processor
CODE
DATA
IDATA
XDATA
YDATA
BITDATA
IO
REG
ROMDATA
68xxx
4G
---
---
---
---
---
---
---
---
DSP56000/
DSP5630064K/
16M---
---
64K/
16M64K/
16M---
---
---
---
PowerPC
4G
---
---
---
---
---
---
---
---
M*Core
4G
---
---
---
---
---
---
---
---
6800,6301,
6811,64K
---
---
---
---
---
---
---
---
6805/HC08
8K
---
---
---
---
---
---
---
---
6809,
630964K
---
---
---
---
---
---
---
---
68HC12
64K
---
---
---
---
---
---
---
---
68HC16
1M
---
---
---
---
---
---
---
---
H8/300
H8/300H64K
16M---
---
---
---
---
---
---
---
H8/500
(Min)
H8/500
(Max)64K
16M
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
SH7000/
7600/77004G
---
---
---
---
---
---
---
---
6502,
MELPS74064K
---
---
---
---
---
---
---
---
65816,
MELPS
770016M
---
---
---
---
---
---
---
---
MELPS
45008K
416
---
---
---
---
---
---
---
M16
4G
---
---
---
---
---
---
---
---
M16C
1M
---
---
---
---
---
---
---
---
4004
4K
256
---
---
---
---
---
---
---
processor
CODE
DATA
IDATA
XDATA
YDATA
BITDATA
IO
REG
ROMDATA
MCS-48,
MCS-41
MCS-51
4K
64K
---
256
256
256 *
In. 80H256
64K
---
---
---
256
---
---
---
---
---
---
MCS-251
16M
---
---
---
---
---
512
---
---
MCS-(1)96
196N/29664K
16M---
---
---
---
---
---
---
---
8080,
64K
---
---
---
---
---
256
---
---
80x86,
64K
64K
---
64K
---
---
64K
---
---
68xx0
4G
---
---
---
---
---
---
---
---
8X30x
8K
---
---
---
---
---
---
---
---
XA
16M
16M
---
---
---
---
2K
In. 1K---
---
AVR
8K
64K
---
---
---
---
64
---
---
29XXX
4G
---
---
---
---
---
---
---
---
80C166,
80C167256K
16M---
---
---
---
---
---
---
---
Z80,
Z180,
Z38064K
512K +
4G---
---
---
---
---
256
256
4G---
---
Z8
64K
256
---
64K
---
---
---
---
---
TLCS-
900(L)16M
---
---
---
---
---
---
---
---
TLCS-90
64K
---
---
---
---
---
---
---
---
TLCS-
87064K
---
---
---
---
---
---
---
---
TLCS-47
64K
1K
---
---
---
---
16
---
---
TLCS-
900016M
---
---
---
---
---
---
---
---
PIC
16C5x2K
32
---
---
---
---
---
---
---
*) As the 8051 does not have any RAM beyond 80h, this value has to be
adapted with ORG for the 8051 as target processor!!
+) As the Z180 still can address only 64K logically, the whole
address space can only be reached via PHASE instructions!
processor
CODE
DATA
IDATA
XDATA
YDATA
BITDATA
IO
REG
ROMDATA
PIC
16C5x
PIC
16C64,
16C862K
8K
32
512
---
---
---
---
---
---
---
---
---
---
---
---
---
---
PIC
17C4264K
256
---
---
---
---
---
---
---
ST6
4K
256
---
---
---
---
---
---
---
ST7
64K
---
---
---
---
---
---
---
---
ST9
64K
64K
---
---
---
---
---
256
---
6804
4K
256
---
---
---
---
---
---
---
32010
320154K
4K144
256---
---
---
---
8
8---
---
320C2x
64K
64K
---
---
---
---
16
---
---
320C3x
16M
---
---
---
---
---
---
---
---
320C5x
64K
64K
---
---
---
---
64K
---
---
TMS
990064K
---
---
---
---
---
---
---
---
TMS
70Cxx64K
---
---
---
---
---
---
---
---
370xxx
64K
---
---
---
---
---
---
---
---
MSP430
64K
---
---
---
---
---
---
---
---
SC/MP
64K
---
---
---
---
---
---
---
---
COP8
8K
256
---
---
---
---
---
---
---
µPD
78(C)1064K
---
---
---
---
---
---
---
---
75K0
16K
4K
---
---
---
---
---
---
---
78K0
64K
---
---
---
---
---
---
---
---
7720
512
128
---
---
---
---
---
---
512
processor
CODE
DATA
IDATA
XDATA
YDATA
BITDATA
IO
REG
ROMDATA
7725
2K
256
---
---
---
---
---
---
1024
77230
8K
---
---
512
512
---
---
---
1K
53C8XX
4G
---
---
---
---
---
---
---
---
The differences in this family lie in additional instructions and
addressing modes (starting from the 68020). A small exception is the
step to the 68030 that misses two instructions: CALLM
and RTM. The three representors of the 683xx family have the
same processor core (a slightly reduced 68020 CPU), however
completely different peripherals. MCF5200 represents the ColdFire
family from Motorola, RISC processors downwardly binary compatible to
the 680x0. For the 68040, additional control registers (reachable
via MOVEC) and instructions for control of the on-chip MMU
and caches were added.
a) 68008 -> 68000 -> 68010 -> 68012 -> MCF5200 -> 68332 -> 68340 -> 68360 -> 68020 -> 68030 -> 68040
b) 56000 --> 56002 --> 56300
While the 56002 only adds instructions for incrementing and
decrementing the accumulators, the 56300 core is almost a new
processor: all address spaces are enlarged from 64K words to 16M and
the number of instructions almost has been doubled.
c) PPC403 -> MPC505 -> MPC601 -> RS6000
The PPC403 is a reduced version of the PowerPC line without a
floating point unit, which is why all floating point instructions are
disabled for him; in turn, some microcontroller-specific instructions
have been added which are unique in this family. The MPC505 (a
microcontroller variant without a FPU) only differ in its peripheral
registers from the 601 as long as I do not know it better - [44] is a bit reluctant in this respect...
The RS6000 line knows a few instructions more (that are emulated on
many 601-based systems), IBM additionally uses different mnemonics
for their pure workstation processors, as a reminiscence of 370
mainframes...
d) MCORE
e) 6800 -> 6301 -> 6811
While the 6301 only offers a few additional instructions, the 6811
delivers a second index register and much more instructions.
f) 6809/6309 and 6805/68HC08
These processors are partially source-code compatible to the other
68xx processors, but they have a different binary code format and a
significantly reduced (6805) resp. enhanced (6809) instruction set.
The 6309 is a CMOS version of the 6809 which is officially only
compatible to the 6809, but inofficially offers more registers and a
lot of new instructions (see [24]).
g) 68HC12
h) 68HC16
i) HD6413308 -> HD6413309
These both names represent the 300 and 300H variants of the H8
family; the H version owns a larger address space (16Mbytes instead
of 64Kbytes), double-width registers (32 bits), and knows a few more
instructions and addressing modes. It is still binary upward
compatible.
j) HD6475328 -> HD6475348 -> HD6475368 -> HD6475388
These processors all share the same CPU core; the different types are
only needed to include the correct subset of registers in the
file REG53X.INC.
k) SH7000 -> SH7600 --> SH7700
The processor core of the 7600 offers a few more instructions that
close gaps in the 7000's instruction set (delayed conditional and
relative and indirect jumps, multiplications with 32-bit operands and
multiply/add instructions). The 7700 series (also known as SH3)
furthermore offers a second register bank, better shift instructions,
and instructions to control the cache.
l) 6502 -> 65(S)C02 / MELPS740 / 6502UNDOC
The CMOS version defines some additional instructions, as well as a
number of some instruction/addressing mode combinations were added
which were not possible on the 6502. The Mitsubishi micro controllers
in opposite expand the 6502 instruction set primarily to bit
operations and multiplication / division instructions. Except for the
unconditional jump and instructions to increment/decrement the
accumulator, the instruction extensions are orthogonal. The 65SC02
lacks the bit manipulation instructions of the 65C02. The 6502UNDOC
processor type enables access to the "undocumented" 6502
instructions, i.e. the operations that result from the usage of bit
combinations in the opcode that are not defined as instructions. The
variants supported by AS are listed in the appendix containing
processor-specific hints.
m) MELPS7700, 65816
Apart from a '16-bit-version' of the 6502's instruction set, these
processors both offer some instruction set extensions. These are
however orthogonal as they are oriented along their 8-bit
predecessors (65C02 resp. MELPS-740). Partially, different mnemonics
are used for the same operations.
n) MELPS4500
o) M16
p) M16C
q) 4004
r) 8021, 8022, 8039, 80C39, 8048, 80C48, 8041, 8042
For the ROM-less versions 8039 and 80C39, the commands which are
using the BUS (port 0) are forbidden. The 8021 and 8022 are special
versions with a strongly shrinked instruction set, for which the 8022
has two A/D- converters and the necessary control-commands. It is
possible to transfer the CMOS-versions with the IDL-command
into a stop mode with lower current consumption. The 8041 and 8042
have some additional instructions for controlling the bus interface,
but in turn a few other commands were omitted. Moreover, the code
address space of these processors is not externally extendable, and
so AS limits the code segment of these processors to 1 resp. 2
Kbytes.
The 87C750 can only access a maximum of 2 Kbytes program memory which
is why it lacks the LCALL and LJMP instructions. AS
does not make any distinction among the processors in the middle,
instead it only stores the different names in the MOMCPU
variable (see below), which allows to query the setting with
IF instructions. An exception is the 80C504 that has a mask flaw
in its current versions. This flaw shows up when an AJMP
or ACALL instruction starts at the second last address of a
2K page. AS will automatically use long instructions or issues an
error message in such situations. The 80C251 in contrast represents a
drastic progress in the the direction 16/32 bits, larger address
spaces, and a more orthogonal instruction set.
s) 87C750 -> 8051, 8052, 80C320, 80C501, 80C502, 80C504, 80515, and 80517 -> 80C251
t) 8096 -> 80196 -> 80196N -> 80296
Apart from a different set of SFRs (which however strongly vary from
version to version), the 80196 knows several new instructions and
supports a 'windowing' mechanism to access the larger internal RAM.
The 80196N family extends the address space to 16 Mbytes and
introduces a set of instructions to access addresses beyond 64Kbytes.
The 80296 extends the CPU core by instructions for signal processing
and a second windowing register, however removes the Peripheral
Transaction Server (PTS) and therefore looses again two machine
instructions.
u) 8080 and 8085
The 8085 knows the additional commands RIM and SIM
for controlling the interrupt mask and the two I/O-pins.
v) 8086 -> 80186 -> V30 -> V35
Only new instructions are added in this family. The corresponding
8-bit versions are not mentioned due to their instruction
compatibility, so one e.g. has to choose 8086 for an 8088-based
system.
w) 80960
x) 8X300 -> 8X305
The 8X305 features a couple of additional registers that miss on the
8X300. Additionally, it can do new operations with these registers
(like direct writing of 8 bit values to peripheral addresses).
y) XAG1, XAG2, XAG3
These processors only differ in the size of their internal ROM which
is defined in STDDEFXA.INC.
z) AT90S1200 -> AT90S2313 -> AT90S4414 -> AT90S8515
The first member of the AVR series represents a minimum configuration
without RAM memory and therefore lacks load/store instructions. The
other two processors only differ in their memory equipment and
on-chip peripherals, what is differentiated in REGAVR.INC.
aa) AM29245 -> AM29243 -> AM29240 -> AM29000
The further one moves to the right in this list, the fewer the
instructions become that have to be emulated in software. While e.g.
the 29245 not even owns a hardware multiplier, the two representors
in the middle only lack the floating point instructions. The 29000
serves as a 'generic' type that understands all instructions in
hardware.
ab) 80C166 -> 80C167,80C165,80C163
80C167 and 80C165/163 have an address space of 16 Mbytes instead of
256 Kbytes, and furthermore they know some additional instructions
for extended addressing modes and atomic instruction sequences. They
are 'second generation' processors and differ from each other only in
the amount of on-chip peripherals.
ac) Z80 -> Z80UNDOC -> Z180 -> Z380
While there are only a few additional instructions for the Z180, the
Z380 owns 32-bit registers, a linear address space of 4 Gbytes, a
couple of instruction set extensions that make the overall
instruction set considerably more orthogonal, and new addressing
modes (referring to index register halves, stack relative). These
extensions partially already exist on the Z80 as undocumented
extensions and may be switched on via the Z80UNDOC variant. A list
with the additional instructions can be found in the chapter with
processor specific hints.
ad) Z8601, Z8604, Z8608, Z8630, Z8631
These processors again only differ in internal memory size and
on-chip peripherals, i.e. the choice does not have an effect on the
supported instruction set.
ae) 96C141, 93C141
These two processors represent the two variations of the processor
family: TLCS-900 and TLCS-900L. The differences of these two
variations will be discussed in detail in section 4.21.
af) 90C141
ag) 87C00, 87C20, 87C40, 87C70The processors of the TLCS-870 series have an identical CPU core, but different peripherals depending on the type. In part registers with the same name are located at different addresses. The file STDDEF87.INC uses, similar to the MCS-51-family, the distinction possible by different types to provide the correct symbol set automatically.
ah) 47C00 -> 470C00 -> 470AC00These three variations of the TLCS-47-family have on-chip RAM and ROM of different size, which leads to several bank switching instructions being added or suppressed.
ai) 97C241
aj) 16C54 -> 16C55 -> 16C56 -> 16C57These processors differ by the available code area, i.e. by the address limit after which AS reports overruns.
ak) 16C84, 16C64Analog to the MCS-51 family, no distinction is made in the code generator, the different numbers only serve to include the correct SFRs in STDDEF18.INC.
al) 17C42
am) ST6210/ST6215->ST6220/ST6225The only distinction AS makes between the two pairs is the smaller addressing space (2K instead 4K) of the first ones. The detailed distinction serves to provide an automatic distinction in the source file which hardware is available (analog to the 8051/52/515).
an) ST7
ao) ST9020, ST9030, ST9040, ST9050These 4 names represent the four ''sub-families'' of the ST9 family, which only differ in their on-chip peripherals. Their processor cores are identical, which is why this distinction is again only used in the include file containing the peripheral addresses.
ap) 6804
aq) 32010->32015The TMS32010 owns just 144 bytes of internal RAM, and so AS limits addresses in the data segment just up to this amount. This restriction does not apply for the 32015, the full range from 0..255 can be used.
ar) 320C25 -> 320C26 -> 320C28These processors only differ slightly in their on-chip peripherals and in their configuration instructions.
as) 320C30, 320C31The 320C31 is a reduced version with the same instruction set, however fewer peripherals. The distinction is exploited in STDDEF3X.INC.
at) 320C50, 320C51, 320C53The distinction between these processors is currently not used by AS.
au) TMS9900
All members of this family share the same CPU core, they therefore do not differ in their instruction set. The differences manifest only in the file REG7000.INC where address ranges and peripheral addresses are defined. Types listed in the same row have the same amount of internal RAM and the same on-chip peripherals, they differ only in the amount of integrated ROM.
av) TMS70C00, TMS70C20, TMS70C40, TMS70CT20, TMS70CT40, TMS70C02, TMS70C42, TMS70C82, TMS70C08, TMS70C48
aw) 370C010, 370C020, 370C030, 370C040 and 370C050Similar to the MCS-51 family, the different types are only used to differentiate the peripheral equipment in STDDEF37.INC; the instruction set is always the same.
ax) MSP430
ay) SC/MP
az) COP87L84This is the only member of National Semiconductor's COP8 family that is currently supported. I know that the family is substantially larger and that there are representors with differently large instruction sets which will be added when a need occurs. It is a beginning, and National's documentation is quite extensive...
ba) 7810->78C10The NMOS version has no stop-mode; the respective command and the ZCM register are omitted. CAUTION! NMOS and CMOS version partially differ in the reset values of some registers!
This 'cornucopia' of processors differs only by the RAM size in one group; the groups themselves again differ by their on-chip peripherals on the one hand and by their instruction set's power on the other hand.
bb) 75402, 75004, 75006, 75008, 75268, 75304, 75306, 75308, 75312, 75316, 75328, 75104, 75106, 75108, 75112, 75116, 75206, 75208, 75212, 75216, 75512, 75516
bc) 78070This is currently the only member of NEC's 78K0 family I am familiar with. Similar remarks like for the COP8 family apply!
bd) 7720 -> 7725The µPD7725 offers larger address spaces and som more instructions compared to his predecessor. CAUTION! The processors are not binary compatible to each other!
be) 77230
The simpler members of this family of SCSI processors lack some instruction variants, furthermore they are different in their set of internal registers.
bf) SYM53C810, SYM53C860, SYM53C815, SYM53C825, SYM53C875, SYM53C895
The CPU instruction needs the processor type as a simple constant, a calculation like:
CPU 68010+10is not allowed. Valid calls are e.g.
CPU 8051or
CPU 6800Regardless of the processor type currently set, the integer variable MOMCPU contains the current status as a hexadecimal number. For example, MOMCPU=$68010 for the 68010 or MOMCPU=80C48H for the 80C48. As one cannot express all letters as hexadecimal digits (only A..F are possible), all other letters must must be omitted in the hex notation; for example, MOMCPU=80H for the Z80.
You can take advantage of this feature to generate different code depending on the processor type. For example, the 68000 does not have a machine instruction for a subroutine return with stack correction. With the variable MOMCPU you can define a macro that uses the machine instruction or emulates it depending on the processor type:
myrtd macro disp if MOMCPU<$68010 ; emulate for 68008 & 68000 move.l (sp),disp(sp) lea disp(sp),sp rts elseif rtd #disp ; direct use on >=68010 endif endm cpu 68010 myrtd 12 ; results in RTD #12 cpu 68000 myrtd 12 ; results in MOVE../LEA../RTSAs not all processor names are built only out of numbers and letters from A..F, the full name is additionally stored in the string variable named MOMCPUNAME.
The assembler implicitly switches back to the CODE segment when a CPU instruction is executed. This is done because CODE is the only segment all processors support.
The default processor type is 68008.
valid for: 680x0, FPU also for 80x86, i960, SUPMODE also for TLCS-900, SH7000, i960, 29K, XA, PowerPC, M*Core, and TMS9900
These three switches allow to define which parts of the instruction set shall be disabled because the necessary preconditions are not valid for the following piece of code. The parameter for these instructions may be either ON or OFF, the current status can be read out of a variable which is either TRUE or FALSE.
The commands have the following meanings in detail:
valid for: 680x0
Motorola integrated the MMU into the processor starting with the 68030, but the built-in FPU is equipped only with a relatively small subset of the 68851 instruction set. AS will therefore disable all extended MMU instructions when the target processor is 68030 or higher. It is however possible that the internal MMU has been disabled in a 68030-based system and the processor operates with an external 68851. One can the use a FULLPMMU ON to tell AS that the complete MMU instruction set is allowed. Vice versa, one may use a FULLPMMU OFF to disable all additional instruction in spite of a 68020 target platform to assure that portable code is written. The switch between full and reduced instruction set may be done as often as needed, and the current setting may be read from a symbol with the same name. CAUTION! The CPU instruction implicitly sets or resets this switch when its argument is a 68xxx processor! FULLPMMU therefore has to be written after the CPU instruction!
valid for: 680x0, M*Core, XA, H8, SH7000, MSP430, TMS9900, ST7
Processors of the 680x0 family are quite critical regarding odd addresses: instructions must not start on an odd address, and data accesses to odd addresses are only allowed bytewise up to the 68010. The H8/300 family simply resets the lowest address bit to zero when accessing odd addresses, the 500 in contrast 'thanks' with an exception... AS therefore tries to round up data structures built with DC or DS to an even number of bytes. This however means for DC.B and DS.B that a padding byte may have to be added. This behaviour can be turned on and off via the PADDING instruction. Similar to the previous instructions, the argument may be either ON or OFF, and the current setting may be read from a symbol with the same name. PADDING is by default only enabled for the 680x0 family, it has to be turned on explicitly for all other families!
valid for: TLCS-900, H8
The processors of the TLCS-900-family are able to work in 2 modes, the minimum and maximum mode. Depending on the actual mode, the execution environment and the assembler are a little bit different. Along with this instruction and the parameter ON or OFF, AS is informed that the following code will run in maximum resp. minimum mode. The actual setting can be read from the variable INMAXMODE. Presetting is OFF, i.e. minimum mode.
Similarly, one uses this instruction to tell AS in H8 mode whether the address space is 64K or 16 Mbytes. This setting is always OFF for the 'small' 300 version and cannot be changed.
valid for: Z380
The Z380 may operate in altogether 4 modes, which are the result of setting two flags: The XM flag rules whether the processor shall operate wit an address space of 64 Kbytes or 4 Gbytes and it may only be set to 1 (after a reset, it is set to 0 for compatibility with the Z80). The LW flag in turn rules whether word operations shall work with a word size of 16 or 32 bits. The setting of these two flags influences range checks of constants and addresses, which is why one has to tell AS the setting of these two flags via these instructions. The default assumption is that both flags are 0, the current setting (ON or OFF) may be read from the predefined symbols INEXTMODE resp. INLWORDMODE.
valid for: MCS-251
Intel substantially extended the 8051 instruction set with the 80C251, but unfortunately there was only a single free opcode for all these new instructions. To avoid a processor that will be eternally crippled by a prefix, Intel provided two operating modes: the binary and the source mode. The new processor is fully binary compatible to the 8051 in binary mode, all new instructions require the free opcode as prefix. In source mode, the new instructions exchange their places in the code tables with the corresponding 8051 instructions, which in turn then need a prefix. One has to inform AS whether the processor operates in source mode (ON) or binary mode (OFF) to enable AS to add prefixes when required. The current setting may be read from the variable INSRCMODE. The default is OFF.
valid for: MCS-51/251, PowerPC
Intel broke with its own principles when the 8051 series was designed: in contrast to all traditions, the processor uses big-endian ordering for all multi-byte values! While this was not a big deal for MCS-51 processors (the processor could access memory only in 8-bit portions, so everyone was free to use whichever endianess one wanted), it may be a problem for the 251 as it can fetch whole (long-)words from memory and expects the MSB to be first. As this is not the way of constant disposal earlier versions of AS used, one can use this instruction to toggle between big and little endian mode for the instructions DB, DW, DD, DQ, and DT. BIGENDIAN OFF (the default) puts the LSB first into memory as it used to be on earlier versions of AS, BIGENDIAN ON engages the big-endian mode compatible to the MCS-251. One may of course change this setting as often as one wants; the current setting can be read from the symbol with the same name.
valid for: all processors
Some microcontrollers and signal processors know various address ranges, which do not overlap with each other and require also different instructions and addressing modes for access. To manage these ones also, the assembler provides various program counters, you can switch among them to and from by the use of the SEGMENT instruction. For subroutines included with INCLUDE, this e.g. allows to define data used by the main program or subroutines near to the place they are used. In detail, the following segments with the following names are supported:
The bit segment is managed as if it would be a byte segment, i.e. the addresses will be incremented by 1 per bit.
Labels get the same type as attribute as the segment that was active when the label was defined. So the assembler has a limited ability to check whether you access symbols of a certain segment with wrong instructions. In such cases the assembler issues a warning.
Example:
CPU 8051 ; MCS-51-code segment code ; test code setb flag ; no warning setb var ; warning : wrong segment segment data var db ? segment bitdata flag db ?
valid for: all processors
For some applications (especially on Z80 systems), the code must be moved to another address range before execution. If the assembler didn't know about this, it would align all labels to the load address (not the start address). The programmer is then forced to write jumps within this area either independent of location or has to add the offset at each symbol manually. The first one is not possible for some processors, the last one is extremely error-prone. With the commands PHASE and DEPHASE, it is possible to inform the assembler at which address the code will really be executed on the target system:
phase <address>informs the assembler that the following code shall be executed at the specified address. The assembler calculates thereupon the difference to the real program counter and adds this difference for the following operations:
dephaseThe assembler manages phase values for all defined segments, although this instruction pair only makes real sense in the code segment.
valid for: all processors
The command SAVE forces the assembler to push the contents of following variables onto an internal stack:
SAVE ; save old status LISTING OFF ; save paper . ; the actual code . RESTORE ; restoreIn opposite to a simple LISTING OFF .. ON-pair, the correct status will be restored, in case the listing generation was switched off already before.
The assembler checks if the number of SAVE-and RESTORE-commands corresponds and issues error messages in the following cases:
valid for: various
This instruction allows to tell AS the current setting of certain registers whose contents cannot be described with a simple ON or OFF. These are typically registers that influence addressing modes and whose contents are important to know for AS in order to generate correct addressing. It is important to note that ASSUME only informs AS about these, no machine code is generated that actually loads these values into the appropriate registers!
In contrast to its 'predecessors' like the 6800 and 6502, the position of the direct page, i.e. the page of memory that can be reached with single-byte addresses, can be set freely. This is done via the 'direct page register' that sets the page number. One has to assign a corresponding value to this register via ASSUME is the contents are different from the default of 0, otherwise wrong addresses will be generated!
The 68HC16 employs a set of bank registers to address a space of 1 Mbyte with its registers that are only 16 bits wide. These registers supply the upper 4 bits. Of these, the EK register is responsible for absolute data accesses (not jumps!). AS checks for each absolute address whether the upper 4 bits of the address are equal to the value of EK specified via ASSUME. AS issues a warning if they differ. The default for EK is 0.
In maximum mode, the extended address space of these processors is addressed via a couple of bank registers. They carry the names DP (registers from 0..3, absolute addresses), EP (register 4 and 5), and TP (stack). AS needs the current value of DP to check if absolute addresses are within the currently addressable bank; the other two registers are only used for indirect addressing and can therefore not be monitored; it is a question of personal taste whether one specifies their values or not. The BR register is in contrast important because it rules which 256-byte page may be accessed with short addresses. It is common for all registers that AS does not assume any default value for them as they are undefined after a CPU reset. Everyone who wants to use absolute addresses must therefore assign values to at least DR and DP!
Microcontrollers of this series know a ''special page'' addressing mode for the JSR instruction that allows a shorter coding for jumps into the last page of on-chip ROM. The size of this ROM depends of course on the exact processor type, and there are more derivatives than it would be meaningful to offer via the CPU instruction...we therefore have to rely on ASSUME to define the address of this page, e.g.
ASSUME SP:$1fin case the internal ROM is 8K.
These processors contain a lot of registers whose contents AS has to know in order to generate correct machine code. These are the registers in question:
name | function | value range | default |
---|---|---|---|
DT PG DPR X M |
data bank code Bank directly addr. page index register width accumulator width |
0-$ff 0-$ff 0-$ffff 0 or 1 0 or 1 |
0 0 0 0 0 |
To avoid endless repetitions, see section 4.9 for instructions how to use these registers. The handling is otherwise similar to the 8086, i.e. multiple values may be set with one instruction and no code is generated that actually loads the registers with the given values. This is again up to the programmer!
Starting with the 80196, all processors of the MCS-96 family have a register 'WSR' that allows to map memory areas from the extended internal RAM or the SFR range into areas of the register file which may then be accessed with short addresses. If one informs AS about the value of the WSR register, it can automatically find out whether an absolute address can be addressed with a single-byte address via windowing; consequently, long addresses will be automatically generated for registers covered by windowing. The 80296 contains an additional register WSR1 to allow simultaneous mapping of two memory areas into the register file. In case it is possible to address a memory cell via both areas, AS will always choose the way via WSR!
The 8086 is able to address data from all segments in all instructions, but it however needs so-called ''segment prefixes'' if another segment register than DS shall be used. In addition it is possible that the DS register is adjusted to another segment, e.g. to address data in the code segment for longer parts of the program. As AS cannot analyze the code's meaning, it has to informed via this instruction to what segments the segment registers point at the moment, e.g.:
ASSUME CS:CODE, DS:DATA .It is possible to assign assumptions to all four segment registers in this way. This instruction produces no code, so the program itself has to do the actual load of the registers with the values.
The usage of this instruction has on the one hand the result that AS is able to automatically put ahead prefixes at sporadic accesses into the code segment, or on the other hand, one can inform AS that the DS-register was modified and you can save explicit CS:-instructions.
Valid arguments behind the colon are CODE, DATA and NOTHING. The latter value informs AS that a segment register contains no usable value (for AS). The following values are preinitialized:
CS:CODE, DS:DATA, ES:NOTHING, SS:NOTHING
The XA family has a data address space of 16 Mbytes, a process however can always address within a 64K segment only that is given by the DS register. One has to inform AS about the current value of this register in order to enable it to check accesses to absolute addresses.
The processors of the 29K family feature a register RBP that allows to protect banks of 16 registers against access from user mode. The corresponding bit has to be set to achieve the protection. ASSUME allows to tell AS which value RBP currently contains. AS can warn this way in case a try to access protected registers from user mode is made.
Though none of the 80C166/167's registers is longer than sixteen bits, this processor has 18/24 address lines and can therefore address up to 256Kbytes/16Mbytes. To resolve this contradiction, it neither uses the well-known (and ill-famed) Intel method of segmentation nor does it have inflexible bank registers...no, it uses paging! To accomplish this, the logical address space of 64 Kbytes is split into 4 pages of 16 Kbytes, and for each page there is a page register (named DPP0..DPP3) that rules which of the 16/1024 physical pages shall be mapped to this logical page. AS always tries to present the address space with a size of 256Kbytes/16MBytes in the sight of the programmer, i.e. the physical page is taken for absolute accesses and the setting of bits 14/15 of the logical address is deduced. If no page register fits, a warning is issued. AS assumes by default that the four registers linearly map the first 64 Kbytes of memory, in the following style:
ASSUME DPP0:0,DPP1:1,DPP2:2,DPP3:3The 80C167 knows some additional instructions that can override the page registers' function. The chapter with processor-specific hints describes how these instructions influence the address generation.
The direct data address space of these processors (it makes no difference whether you address directly or via the HL register) has a size of only 256 nibbles. Because the ''better'' family members have up to 1024 nibbles of RAM on chip, Toshiba was forced to introduce a banking mechanism via the DMB register. AS manages the data segment as a continuous addressing space and checks at any direct addressing if the address is in the currently active bank. The bank AS currently expects can be set by means of
ASSUME DMB:<0..3>The default value is 0.
The microcontrollers of the ST62 family are able to map a part (64
bytes) of the code area into the data area, e.g. to load constants
from the ROM. This means also that at one moment only one part of the
ROM can be addressed. A special register rules which part it is. AS
cannot check the contents of this register directly, but it can be
informed by this instruction that a new value has been assigned to
the register. AS then can test and warn if necessary, in case
addresses of the code segment are accessed, which are not located in
the ''announced'' window. If, for example, the variable VARI
has the value 456h, so
It is possible to assign a simple NOTHING instead of a
value, e.g. if the bank register is used temporarily as a memory
cell. This value is also the default.
The ST9 family uses exactly the same instructions to address code and
data area. It depends on the setting of the flag register's DP flag
which address space is referenced. To enable AS to check if one works
with symbols from the correct address space (this of course
only works with absolute accesses!), one has to inform AS whether
the DP flag is currently 0 (code) or 1 (data). The initial value of
this assumption is 0.
As all instruction words of this processor family are only 32 bits
long (of which only 16 bits were reserved for absolute addresses),
the missing upper 8 bits have to be added from the DP register. It is
however still possible to specify a full 24-bit address when
addressing, AS will check then whether the upper 8 bits are equal to
the DP register's assumed values. ASSUME is different to
the LDP instruction in the sense that one cannot specify an
arbitrary address out of the bank in question, one has to extract the
upper bits by hand:
These processors have a register (V) that allows to move the ''zero
page'', i.e. page of memory that is addressable by just one byte,
freely in the address space, within page limits. By reasons of
comforts you don't want to work with expressions such as
As the whole address space of 12 bits could not be addressed even by
the help of register pairs (8 bits), NEC had to introduce banking
(like many others too...): the upper 4 address bits are fetched from
the MBS register (which can be assigned values from 0 to 15 by
the ASSUME instruction), which however will only be regarded
if the MBE flag has been set to 1. If it is 0 (default), the lowest
and highest 128 nibbles of the address space can be reached without
banking. The ASSUME instruction is undefined for the 75402
as it contains neither a MBE flag nor an MBS register; the initial
values cannot be changed therefore.
valid for: 29K
AMD defined the 29000's series exception handling for undefined
instructions in a way that there is a separate exception vector for
each instruction. This allows to extend the instruction set of a
smaller member of this family by a software emulation. To avoid that
AS quarrels about these instructions as being undefined, the
EMULATED instruction allows to tell AS that certain instructions
are allowed in this case. The check if the currently set processors
knows the instruction is then skipped. For example, if one has
written a module that supports 32-bit IEEE numbers and the processor
does not have a FPU, one writes
valid for: XA
BRANCHEXT with either ON or OFF as
argument tells AS whether short branches that are only available with
an 8-bit displacement shall automatically be 'extended', for example
by replacing a single instruction like
The instructions described in this section partially overlap in their
functionality, but each processor family defines other names for the
same function. To stay compatible with the standard assemblers, this
way of implementation was chosen.
If not explicitly mentioned otherwise, all instructions for data
deposition (not those for reservation of memory!) allow an arbitrary
number of parameters which are being processed from left to right.
valid for: 680x0, M*Core, 68xx, H8, SH7x00, DSP56xxx, XA,
ST7
This instruction places one or several constants of the type
specified by the attribute into memory. The attributes are the same
ones as defined in section 2.5, and
there is additionally the possibility for byte constants to place
string constants in memory, like
The assembler can automatically add another byte of data in case the
byte sum should become odd, to keep the word alignment. This
behaviour may be turned on and off via the PADDING
instruction.
Decimal floating point numbers stored with this instruction
(DC.P...) can cover the whole range of extended precision,
one however has to pay attention to the detail that the coprocessors
currently available from Motorola (68881/68882) ignore the thousands
digit of the exponent at the read of such constants!
The default attribute is W, that means 16-bit-integer
numbers.
For the DSP56xxx, the data type is fixed to integer numbers (an
attribute is therefore neither necessary nor allowed), which may be
in the range of -8M up to 16M-1. String constants are also allowed,
whereby three characters are packed into each word.
valid for: 680x0, M*Core, 68xx, H8, SH7x00, DSP56xxx, XA,,
ST7
On the one hand, this instruction enables to reserve memory space for
the specified count of numbers of the type given by the attribute.
Therefore,
The other purpose is the alignment of the program counter which is
achieved by a count specification of 0. In this way, with a
The default for the operand length is - as usual - W, i.e.
16 bits.
For the 56xxx, the operand length is fixed to words (of 24 bit),
attributes therefore do not exist just as in the case of DC.
These commands are - one could say - the Intel counterpart to
DS and DC, and as expected, their logic is a little bit
different: First, the specification of the operand length is moved
into the mnemonic:
In order to be compatible to the M80, DEFB/DEFW may be used
instead of DB/DW in Z80-mode.
Similarly, BYTE/ADDR resp. WORD/ADDRW in COP8 mode
are an alias for DB resp. DW, with the pairs
differing in byte order: instructions defined by National for address
storage use big endian, BYTE resp. WORD in contrast
use little endian.
The NEC 77230 is special with its DW instruction: It more
works like the DATA statement of its smaller brothers, but
apart from string and integer arguments, it also accepts floating
point values (and stores them in the processor's proprietary 32-bit
format). There is no DUP operator!
With this instruction, you can reserve a memory area:
valid for: 6502, 68xx
By this instruction, byte constants or ASCII strings are placed in
65xx/68xx-mode, it therefore corresponds to DC.B on the
68000 or DB on Intel. Similarly to DC, a repetition
factor enclosed in brackets ([..]) may be prepended to every single
parameter.
valid for: ST6, 320C2x, 320C5x, MSP, TMS9900
Ditto. Note that when in 320C2x/5x mode, the assembler assumes that a
label on the left side of this instruction has no type, i.e. it
belongs to no address space. This behaviour is explained in the
processor-specific hints.
The PADDING instruction allows to set whether odd counts of
bytes shall be padded with a zero byte in MSP/TMS9900 mode.
valid for: 6502, 68xx
ADR resp. FDB stores word constants when in
65xx/68xx mode. It is therefore the equivalent to DC.W on
the 68000 or DW on Intel platforms. Similarly to
DC, a repetition factor enclosed in brackets ([..]) may be
prepended to every single parameter.
valid for: ST6, i960, 320C2x, 320C3x, 320C5x, MSP
If assembling for the 320C3x or i960, this command stores 32-bit
words, 16-bit words for the other families. Note that when in
320C2x/5x mode, the assembler assumes that a label on the left side
of this instruction has no type, i.e. it belongs to no address space.
This behaviour is explained at the discussion on processor-specific
hints.
valid for: 320C2x, 320C5x
LONG stores a 32-bit integer to memory with the order LoWord-HiWord.
Note that when in 320C2x/5x mode, the assembler assumes that a label
on the left side of this instruction has no type, i.e. it belongs to
no address space. This behaviour is explained in the
processor-specific hints.
valid for: 320C3x
Both commands store floating-point constants to memory. They are
not in IEEE-format. Instead the processor-specific formats with
32 and 40 bit are used. In case of EXTENDED the resulting
constant occupies two memory words. The most significant 8 bits (the
exponent) are written to the first word while the other ones (the
mantissa) are copied into the second word.
valid for: 320C2x, 320C5x
These two commands store floating-point constants in memory using the
standard IEEE 32-bit and 64-bit IEEE formats. The least significant
byte is copied to the first allocated memory location. Note that when
in 320C2x/5x mode the assembler assumes that all labels on the left
side of an instruction have no type, i.e. they belong to no address
space. This behaviour is explained in the processor-specific hints.
valid for: 320C2x, 320C5x
Another three floating point commands. All of them support non-IEEE
formats, which should be easily applicable on signal processors:
valid for: 320C2x, 320C5x
Qxx and LQxx can be used to generate constants in
a fixed point format. xx denotes a 2-digit number. The
operand is first multiplied by 2 xx before converting it
to binary notation. Thus xx can be viewed as the number of
bits which should be reserved for the fractional part of the constant
in fixed point format. Qxx stores only one word (16 bit)
while LQxx stores two words (low word first):
valid for: PIC, 320xx, AVR, MELPS-4500, 4004, µPD772x
This command stores data in the current segment. Both integer values
as well as character strings are supported. On 16C5x/16C8x, 17C4x in
data segment and on the 4500, characters occupy one word. On AVR,
17C4x in code segment, µPD772x in the data segments, and on
3201x/3202x, in general two characters fit into one word (LSB first).
The µPD77C25 can hold three bytees per word in the code
segment. When in 320C3x, mode the assembler puts four characters into
one word (MSB first). In contrast to this characters occupy two
memory locations in the data segment of the 4500, similar in the
4004. The range of integer values corresponds to the word width of
each processor in a specific segment. This means that DATA
has the same result than WORD on a 320C3x (and that of
SINGLE if AS recognizes the operand as a floating-point
constant).
valid for: PIC
Generates a continuous string of zero words in memory. The length is
given by the argument and must not exceed 512.
valid for: COP8
These instruction allow to fill memory blocks with a byte or word
constant. The first operand specifies the size of the memory block
while the second one sets the filling constant itself. The maximum
supported block size is 1024 elements for FB and 512
elements for FW.
valid for: ST6
Both commands store string constants to memory. While ASCII
writes the character information only, ASCIZ additionally
appends a zero to the end of the string.
valid for: 320C2x, 320C5x
These commands are functionally equivalent to DATA, but
integer values are limited to the range of byte values. This enables
two characters or numbers to be packed together into one word. Both
commands only differ in the order they use to write bytes:
STRING stores the upper one first then the lower one,
RSTRING does this vice versa. Note that when in 320C2x/5x mode
the assembler assumes that a label on the left side of this
instruction has no type, i.e. it belongs to no address space. This
behaviour is explained in the processor-specific hints.
valid for: 6502, 68xx
When in 65xx/68xx mode, string constants are generated using this
instruction. In contrast to the original assembler AS11 from Motorola
(this is the main reason why AS understands this command, the
functionality is contained within the BYT instruction) you
must enclose the string argument by double quotation marks instead of
single quotation marks or slashes. Similarly to DC, a
repetition factor enclosed in brackets ([..]) may be prepended to
every single parameter.
valid for: 6502, 68xx
Reserves a memory block when in 6502/68xx mode. It is therefore the
equivalent to DS.B on the 68000 or DB ? on Intel
platforms.
valid for: ST6
Ditto.
valid for: i960
Ditto.
valid for: PIC, MELPS-4500, 3201x, 320C2x, 320C5x, AVR,
µPD772x
This command allocates memory. When used in code segments the
argument counts words (10/12/14/16 bit). In data segments it counts
bytes for PICs, nibbles for 4500's and words for the TI devices.
valid for: 320C2x, 320C3x, 320C5x, MSP
BSS works like RES, but when in 320C2x/5x mode,
the assembler assumes that a label on the left side of this
instruction has no type, i.e it belongs to no address space. This
behaviour is explained in the processor-specific hints.
valid for: COP8
Both instructions allocate memory and ensure compatibility to ASMCOP
from National. While DSB takes the argument as byte
count, DSW uses it as word count (thus it allocates twice as
much memory than DSB).
valid for: all processors
Takes the argument to align the program counter to a certain address
boundary. AS increments the program counter to the next multiple of
the argument. So, ALIGN corresponds to DS.x on
68000, but is much more flexible at the same time.
Example:
valid for: SH7x00
Although the SH7000 processor can do an immediate register load with
8 bit only, AS shows up with no such restriction. This behaviour is
instead simulated through constants in memory. Storing them in the
code segment (not far away from the register load instruction) would
require an additional jump. AS Therefore gathers the constants an
stores them at an address specified by LTORG. Details are
explained in the processor-specific section somewhat later.
valid for: all processors
Now we finally reach the things that make a macro assembler different
from an ordinary assembler: the ability to define macros (guessed it
!?).
When speaking about 'macros', I generally mean a sequence of (machine
or pseudo) instructions which are united to a block by special
statements and can then be treated in certain ways. The assembler
knows the following statements to work with such blocks:
is probably the most important instruction for macro programming. The
instruction sequence
ASSUME ROMBASE:VARI>>6
sets the AS-internal variable to 11h, and an access to VARI
generates an access to address 56h in the data segment.
ST9
320C3x
ldp @addr
assume dp:addr>>16
.
.
ldi @addr,r2
µPD78(C)10
inrw Lo(counter)
so AS takes over this job, but only under the premise that it is
informed via the ASSUME-command about the contents of the V
register. If an instruction with short addressing is used, it will be
checked if the upper half of the address expression corresponds to
the expected content. A warning will be issued if both do not match.
75K0
EMULATED FADD,FSUB,FMUL,FDIV
EMULATED FEQ,FGE,FGT,SQRT,CLASS
bne target
with a longer sequence of same functionality, in case the branc
target is out of reach for the instruction's displacement. For
example, the replacement sequence for bne would be
beq skip
jmp target
skip:
In case there is no fitting 'opposite' for an instruction, the
sequence may become even longer, e.g. for jbc:
jbc dobr
bra skip
dobr: jmp target
skip:
This feature however has the side effect that there is no unambigious
assignment between machine and assembly code any more. Furthermore,
additional passes may be the result if there are forward branches.
One should therefore use this feature with caution!
String dc.B "Hello world!\0"
The parameter count may be between 1 and 20. A repeat count enclosed
in brackets may additionally be prefixed to each parameter; for
example, one can for example fill the area up to the next page
boundary with zeroes with a statement like
dc.b [(*+255)&$ffffff00-*]0
CAUTION! This function easily allows to reach the limit of 1
Kbyte of generated code per line!
DS.B 20
for example reserves 20 bytes of memory, but
DS.X 20
reserves 240 bytes!
DS.W 0 ,
the program counter will be rounded up to the next even address, with
a
DS.D 0
in contrast to the next double word boundary. Memory cells possibly
staying unused thereby are neither zeroed nor filled with NOPs, they
simply stay undefined.
valid for: Intel, Zilog, Toshiba, NEC, TMS370, Siemens, AMD, MELPS7700/65816, M16(C), National, ST9, TMS70Cxx, µPD77230
Second, the distinction between constant definition and memory
reservation is done by the operand. A reservation of memory is marked
by a ? :
db ? ; reserves a byte
dw ?,? ; reserves memory for 2 words (=4 byte)
dd -1 ; places the constant -1 (FFFFFFFFH) !
Reserved memory and constant definition must not be mixed
within one instruction:
db "hello",? ; --> error message
Additionally, the DUP Operator permits the repeated placing
of constant sequences or the reservation of whole memory blocks:
db 3 dup (1,2) ; --> 1 2 1 2 1 2
dw 20 dup (?) ; reserves 40 bytes of memory
As you can see, the DUP-argument must be enclosed in
parentheses, which is also why it may consist of several components,
that may themselves be DUPs...the stuff therefore works
recursively. DUP is however also a place where one can get
in touch with another limit of the assembler: a maximum of 1024 bytes
of code or data may be generated in one line. This is not valid for
the reservation of memory, only for the definition of constant
arrays!
valid for: Intel, Zilog, Toshiba, NEC, TMS370, Siemens, AMD, M16(C), National, ST9, TMS7000
DS <count>
It is an abbreviation of
DB <count> DUP (?)
Although this could easily be made by a macro, some people grown up
with Motorola CPUs (Hi Michael!) suggest DS to be a built-in
instruction...I hope they are satisfied now ;-)
3.3.12. EFLOAT, BFLOAT, and TFLOAT
The three commands share a common storage strategy. In all cases the
mantissa precedes the exponent in memory, both are stored as 2's
complement with the least significant byte first. Note that when in
320C2x/5x mode the assembler assumes that all labels on the left side
of an instruction have no type, i.e. they belong to no address space.
This behaviour is explained in the processor-specific hints.
q05 2.5 ; --> 0050h
lq20 ConstPI ; --> 43F7h 0032h
Please do not flame me in case I calculated something wrong on my
HP28...
align 2
aligns to an even address (PC mod 2 = 0). The contents of the skipped
addresses is left undefined.
<name> MACRO [parameter list]
<instructions>
ENDM
defines the macro <name> to be the enclosed
instruction sequence. This definition by itself does not generate any
code! In turn, from now on the instruction sequence can simply be
called by the name, the whole construct therefore shortens and
simplifies programs. A parameter list may be added to the macro
definition to make things even more useful. The parameters' names
have to be separated by commas (as usual) and have to conform to the
conventions for symbol names (see section 2.7) - like the macro name itself.
A switch to case-sensitive mode influences both macro names and parameters.
Similar to symbols, macros are local, i.e. they are only known in a section and its subsections when the definition is done from within a section. This behaviour however can be controlled in wide limits via the options PUBLIC and GLOBAL described below.
Apart from the macro parameters themselves, the parameter list may contain control parameters which influence the processing of the macro. These parameters are distinguished from normal parameters by being enclosed in braces. The following control parameters are defined:
When a macro is called, the parameters given for the call are textually inserted into the instruction block and the resulting assembler code is assembled as usual. Zero length parameters are inserted in case too few parameters are specified. It is important to note that string constants are not protected from macro expansions. The old IBM rule:
It's not a bug, it's a feature!applies for this detail. The gap was left to allow checking of parameters via string comparisons. For example, one can analyze a macro parameter in the following way:
mul MACRO para,parb IF UpString("PARA")<>"A" MOV a,para ENDIF IF UpString("PARB")<>"B" MOV b,parb ENDIF mul ab ENDMIt is important for the example above that the assembler converts all parameter names to upper case when operating in case-insensitive mode, but this conversion never takes place inside of string constants. Macro parameter names therefore have to be written in upper case when they appear in string constants.
The same naming rules as for usual symbols also apply for macro parameters, with the exception that only letters and numbers are allowed, i.e. dots and underscores are forbidden. This constraint has its reason in a hidden feature: the underscore allows to concatenate macro parameter names to a symbol, like in the following example:
concat macro part1,part2 call part1_part2 endmThe call
concat module,functionwill therefore result in
call module_functionA small example to remove all clarities ;-)
A programmer braindamaged by years of programming Intel processors wants to have the instructions PUSH/POP also for the 68000. He solves the 'problem' in the following way:
push macro op move op,-(sp) endm pop macro op move (sp)+,op endmIf one writes
push d0 pop a2 ,this results in
move.w d0,-(sp) move.w (sp)+,a2A macro definition must not cross include file boundaries.
Labels defined in macros always are regarded as being local, an explicit LOCAL instruction is therefore not necessary (it even does not exist). In case there is a reason to make a label global, one may define it with LABEL which always creates global symbols (similar to BIT, SFR...):
<Name> label $When parsing a line, the assembler first checks the macro list afterwards looks for processor instructions, which is why macros allow to redefine processor instructions. However, the definition should appear previously to the first invocation of the instruction to avoid phase errors like in the following example:
bsr target bsr macro targ jsr targ endm bsr targetIn the first pass, the macro is not known when the first BSR instruction is assembled; an instruction with 4 bytes of length is generated. In the second pass however, the macro definition is immediately available (from the first pass), a JSR of 6 bytes length is therefore generated. As a result, all labels following are too low by 2 and phase errors occur for them. An additional pass is necessary to resolve this.
Because a machine or pseudo instruction becomes hidden when a macro of same name is defined, there is a backdoor to reach the original meaning: the search for macros is suppressed if the name is prefixed with an exclamation mark (!). This may come in handy if one wants to extend existing instructions in their functionality, e.g. the TLCS-90's shift instructions:
srl macro op,n ; shift by n places rept n ; n simple instructions !srl op endm endmFrom now on, the SRL instruction has an additional parameter...
is a simplified macro definition for the case that an instruction sequence shall be applied to a couple of operands and the the code is not needed any more afterwards. IRP needs a symbol for the operand as its first parameter, and an (almost) arbitrary number of parameters that are sequentially inserted into the block of code. For example, one can write
irp op, acc,b,dpl,dph push op endmto push a couple of registers to the stack, what results in
push acc push b push dpl push dphAgain, labels used are automatically local for every pass.
IRPC is a variant of IRP where the first argument's occurences in the lines up to ENDM are successively replaced by the characters of a string instead of further parameters. For example, an especially complicated way of placing a string into memory would be:
irpc char,"Hello World" db 'CHAR' endmCAUTION! As the example already shows, IRPC only inserts the pure character; it is the programmer's task to assure that valid code results (in this example by inserting quotes, including the detail that no automatic conversion to uppercase characters is done).
is the simplest way to employ macro constructs. The code between REPT and ENDM is assembled as often as the integer argument of REPT specifies. This statement is commonly used in small loops to replace a programmed loop to save the loop overhead.
An example for the sake of completeness:
rept 3 rr a endmrotates the accumulator to the right by three digits.
In case REPT's argument is equal to or smaller than 0, no expansion at all is done. This is different to older versions of AS which used to be a bit 'sloppy' in this respect and always made a single expansion.
WHILE operates similarly to REPT, but the fixed number of repetitions given as an argument is replaced by a boolean expression. The code framed by WHILE and ENDM is assembled until the expression becomes logically false. This may mean in the extreme case that the enclosed code is not assembled at all in case the expression was already false when the construct was found. On the other hand, it may happen that the expression stays true forever and AS will run infinitely...one should apply therefore a bit of accuracy when one uses this construct, i.e. the code must contain a statement that influences the condition, e.g. like this:
cnt set 1 sq set cnt*cnt while sq<=1000 dc.l sq cnt set cnt+1 sq set cnt*cnt endmThis example stores all square numbers up to 1000 to memory.
Currently there exists a little ugly detail for WHILE: an additional empty line that was not present in the code itself is added after the last expansion. This is a 'side effect' based on a weakness of the macro processor and it is unfortunately not that easy to fix. I hope noone minds...
EXITM offers a way to terminate a macro expansion or one of the instructions REPT, IRP, or WHILE prematurely. Such an option helps for example to replace encapsulations with IF-ENDIF-ladders in macros by something more readable. Of course, an EXITM itself always has to be conditional, what leads us to an important detail: When an EXITM is executed, the stack of open IF and SWITCH constructs is reset to the state it had just before the macro expansion started. This is imperative for conditional EXITM's as the ENDIF resp. ENDCASE that frames the EXITM statement will not be reached any more; AS would print an error message without this trick. Please keep also in mind that EXITM always only terminates the innermost construct if macro constructs are nested! If one want to completely break out of a nested construct, one has to use additional EXITM's on the higher levels!
Though FUNCTION is not a macro statement in the inner sense, I will describe this instruction at this place because it uses similar principles like macro replacements.
This instruction is used to define new functions that may then be used in formula expressions like predefined functions. The definition must have the following form:
<name> FUNCTION <arg>,..,<arg>,<expression>The arguments are the values that are 'fed into' the function. The definition uses symbolic names for the arguments. The assembler knows by this that where to insert the actual values when the function is called. This can be seen from the following example:
isdigit FUNCTION ch,(ch>='0')&&(ch<='9')This function checks whether the argument (interpreted as a character) is a number in the currently valid character set (the character set can be modified via CHARSET, therefore the careful wording).
The arguments' names (CH in this case) must conform to the stricter rules for macro parameter names, i.e. the special characters . and _ are not allowed.
User-defined functions can be used in the same way as builtin functions, i.e. with a list of parameters, separated by commas, enclosed in parentheses:
IF isdigit(char) message "\{char} is a number" ELSEIF message "\{char} is not a number" ENDIFWhen the function is called, all parameters are calculated once and are then inserted into the function's formula. This is done to reduce calculation overhead and to avoid side effects. The individual arguments have to be separated by commas when a function has more than one parameter.
CAUTION! Similar to macros, one can use user-defined functions to override builtin functions. This is a possible source for phase errors. Such definitions therefore should be done before the first call!
The result's type may depend on the type of the input arguments as the arguments are textually inserted into the function's formula. For example, the function
double function x,x+xmay have an integer, a float, or even a string as result, depending on the argument's type!
When AS operates in case-sensitive mode, the case matters when defining or referencing user-defined functions, in contrast to builtin functions!
valid for: all processors
The assembler supports conditional assembly with the help of statements like IF... resp. SWITCH... . These statements work at assembly time allowing or disallowing the assembly of program parts based on conditions. They are therefore not to be compared with IF statements of high-level languages (though it would be tempting to extend assembly language with structurization statements of higher level languages...).
The following constructs may be nested arbitrarily (until a memory overflow occurs).
IF is the most common and most versatile construct. The general style of an IF statement is as follows:
IF <expression 1> . . <block 1> . . ELSEIF <expression 2> . . <block 2> . . (possibly more ELSEIFs) . . ELSEIF . . <block n> . . ENDIFIF serves as an entry, evaluates the first expression, and assembles block 1 if the expression is true (i.e. not 0). All further ELSEIF-blocks will then be skipped. However, if the expression is false, block 1 will be skipped and expression 2 is evaluated. If this expression turns out to be true, block 2 is assembled. The number of ELSEIF parts is variable and results in an IF-THEN-ELSE ladder of an arbitrary length. The block assigned to the last ELSEIF (without argument) only gets assembled if all previous expressions evaluated to false; it therefore forms a 'default' branch. It is important to note that only one of the blocks will be assembled: the first one whose IF/ELSEIF had a true expression as argument.
The ELSEIF parts are optional, i.e. IF may directly be followed by an ENDIF. An ELSEIF without parameters must be the last branch.
ELSEIF always refers to the innermost, unfinished IF construct in case IF's are nested.
In addition to IF, the following further conditional statements are defined:
It is valid to write ELSE instead of ELSEIF since everybody seems to be used to it...
3.5.2. SWITCH / CASE / ELSECASE / ENDCASE
CASE is a special case of IF and is designed for situations when an expression has to be compared with a couple of values. This could of course also be done with a series of ELSEIFs, but the following form
SWITCH <expression> . . CASE <value 1> . <block 1> . CASE <value 2> . <block 2> . (further CASE blocks) . CASE <value n-1> . <block n-1> . ELSECASE . <block n> . ENDCASEhas the advantage that the expression is only written once and also only gets evaluated once. It is therefore less error-prone and slightly faster than an IF chain, but obviously not as flexible.
It is possible to specify multiple values separated by commas to a CASE statement in order to assemble the following block in multiple cases. The ELSECASE branch again serves as a 'trap' for the case that none of the CASE conditions was met. AS will issue a warning in case it is missing and all comparisons fail.
Even when value lists of CASE branches overlap, only one branch is executed, which is the first one in case of ambiguities.
SWITCH only serves to open the whole construct; an arbitrary number of statements may be between SWITCH and the first CASE (but don't leave other IFs open!), for the sake of better readability this should however not be done.
valid for: all processors
PAGE is used to tell AS the dimensions of the paper that is used to print the assembly listing. The first parameter is thereby the number of lines after which AS shall automatically output a form feed. One should however take into account that this value does not include heading lines including an eventual line specified with TITLE. The minimum number of lines is 5, and the maximum value is 255. A specification of 0 has the result that AS will not do any form feeds except those triggered by a NEWPAGE instruction or those implicitly engaged at the end of the assembly listing (e.g. prior to the symbol table).
The specification of the listing's length in characters is an optional second parameter and serves two purposes: on the one hand, the internal line counter of AS will continue to run correctly when a source line has to be split into several listing lines, and on the other hand there are printers (like some laser printers) that do not automatically wrap into a new line at line end but instead simply discard the rest. For this reason, AS does line breaks by itself, i.e. lines that are too long are split into chunks whose lengths are equal to or smaller than the specified width. This may lead to double line feeds on printers that can do line wraps on their own if one specifies the exact line width as listing width. The solution for such a case is to reduce the assembly listing's width by 1. The specified line width may lie between 5 and 255 characters; a line width of 0 means similarly to the page length that AS shall not do any splitting of listing lines; lines that are too long of course cannot be taken into account of the form feed then any more.
The default setting for the page length is 60 lines, the default for the line width is 0; the latter value is also assumed when PAGE is called with only one parameter.
CAUTION! There is no way for AS to check whether the specified listing length and width correspond to the reality!
NEWPAGE can be used to force a line feed though the current line is not full up to now. This might be useful to separate program parts in the listing that are logically different. The internal line counter is reset and the page counter is incremented by one. The optional parameter is in conjunction with a hierarchical page numbering AS supports up to a chapter depth of 4. 0 always refers to the lowest depth, and the maximum value may vary during the assembly run. This may look a bit puzzling, as the following example shows:
NEWPAGE <number> may therefore result in changes in different digits, depending on the current chapter depth. An automatic form feed due to a line counter overflow or a NEWPAGE without parameter is equal to NEWPAGE 0. Previous to the output of the symbol table, an implicit NEWPAGE <maximum up to now> is done to start a new 'main chapter'.
page 1, instruction NEWPAGE 0 -> page 2 page 2, instruction NEWPAGE 1 -> page 2.1 page 2.1, instruction NEWPAGE 1 -> page 3.1 page 3.1, instruction NEWPAGE 0 -> page 3.2 page 3.2, instruction NEWPAGE 2 -> page 4.1.1
One can achieve by the statement
macexp offthat only the macro call and not the expanded text is listed for macro expansions. This is sensible for macro intensive codes to avoid that the listing grows beyond all bounds. The full listing can be turned on again with a
macexp on .This is also the default.
There is a subtle difference between the meaning of MACEXP for macros and for all other macro-like constructs (e.g. REPT): while a macro contain an internal flag that rules whether expansions of this macro shall be listed or not, MACEXP directly influences all other constructs that are resolved 'in place'. The reason for this differentiation is that there may be macros that are tested and their expansion is therefore unnecessary, but all other macros still shall be expanded. MACEXP serves as a default for the macro's internal flag when it is defined, and it may be overridden by the NOEXPAND resp. EXPAND directives.
The current setting may be read from the symbol MACEXP.
works like MACEXP and accepts the same parameters, but is much more radical: After a
listing off ,nothing at all will be written to the listing. This directive makes sense for tested code parts or include files to avoid a paper consumption going beyond all bounds. CAUTION! If one forgets to issue the counterpart somewhere later, even the symbol table will not be written any more! In addition to ON and OFF, LISTING also accepts NOSKIPPED and PURECODE as arguments. Program parts that were not assembled due to conditional assembly will not be written to the listing when NOSKIPPED is set, while PURECODE - as the name indicates - even suppresses the IF directives themselves in the listing. These options are useful if one uses macros that act differently depending on parameters and one only wants to see the used parts in the listing.
The current setting may be read from the symbol LISTING (0=OFF, 1=ON, 2=NOSKIPPED, 3=PURECODE).
Quite often it makes sense to switch to another printing mode (like compressed printing) when the listing is sent to a printer and to deactivate this mode again at the end of the listing. The output of the needed control sequences can be automated with these instructions if one specifies the sequence that shall be sent to the output device prior to the listing with PRTINIT <string> and similarly the deinitialization string with PRTEXIT <string>. <string> has to be a string expression in both cases. The syntax rules for string constants allow to insert control characters into the string without too much tweaking.
When writing the listing, the assembler does not differentiate where the listing actually goes, i.e. printer control characters are sent to the screen without mercy!
Example:
For Epson printers, it makes sense to switch them to compressed printing because listings are so wide. The lines
prtinit "\15" prtexit "\18"assure that the compressed mode is turned on at the beginning of the listing and turned off afterwards.
The assembler normally adds a header line to each page of the listing that contains the source file's name, date, and time. This statement allows to extend the page header by an arbitrary additional line. The string that has to be specified is an arbitrary string expression.
Example:
For the Epson printer already mentioned above, a title line shall be written in wide mode, which makes it necessary to turn off the compressed mode before:
title "\18\14Wide Title\15"(Epson printers automatically turn off the wide mode at the end of a line.)
RADIX with a numerical argument between 2 and 36 sets the default numbering system for integer constants, i.e. the numbering system used if nothing else has been stated explicitly. The default is 10, and there are some possible pitfalls to keep in mind which are described in section 2.8.1.
Independent of the current setting, the argument of RADIX is always decimal; furthermore, no symbolic or formula expressions may be used as argument. Only use simple constant numbers!
valid for: all processors
local symbols and the section concept introduced with them are a
completely new function that was introduced with version 1.39. One
could say that this part is version ''1.0'' and therefore probably
not the optimum. Ideas and (constructive) criticism are therefore
especially wanted. I admittedly described the usage of sections how I
imagined it. It is therefore possible that the reality is not
entirely equal to the model in my head. I promise that in case of
discrepancies, changes will occur that the reality gets adapted to
the documentation and not vice versa (I was told that the latter
sometimes takes place in larger companies...).
AS does not generate linkable code (and this will probably not change
in the near future :-(). This fact forces one to always
assemble a program in a whole. In contrast to this technique, a
separation into linkable modules would have several advantages:
A section represents a part of the assembler program enclosed by
special statements and has a unique name chosen by the programmer:
AS will by default not differentiate between upper and lower case in
section names; if one however switches to case-sensitive mode, the
case will be regarded just like for symbols.
The organization described up to now roughly corresponds to what is
possible in the C language that places all functions on the same
level. However, as my ''high-level'' ideal was Pascal and not C, I
went one step further:
It is valid to define further sections within a section. This is
analog to the option given in Pascal to define procedures inside a
procedure or function. The following example shows this:
3.7. Local Symbols
Especially the last item was something that always nagged me: once
there was a label's name defined at the beginning of a 2000-lines
program, there was no way to reuse it somehow - even not at the
file's other end where routines with a completely different context
were placed. I was forced to use concatenated names in the style of
<subprogram name>_<symbol name>
that had lengths ranging from 15 to 25 characters and made the
program difficult to overlook. The concept of section described in
detail in the following text was designed to cure at least the second
and third item of the list above. It is completely optional: if you
do not want to use sections, simply forget them and continue to work
like you did with previous versions of AS.
3.7.1. Basic Definition (SECTION/ENDSECTION)
.
.
<other code>
.
.
SECTION <section's name>
.
.
<code inside of the section>
.
.
ENDSECTION [section's name]
.
.
<other code>
.
.
The name of a section must conform to the conventions for s symbol
name; AS stores section and symbol names in separate tables which is
the reason why a name may be used for a symbol and a section at the
same time. Section names must be unique in a sense that there must
not be more than one section on the same level with the same name (I
will explain in the next part what ''levels'' mean). The argument
of ENDSECTION is optional, it may also be omitted; if it is
omitted, AS will show the section's name that has been closed with
this ENDSECTION. Code inside a section will be processed by
AS exactly as if it were outside, except for three decisive
differences:
This mechanism e.g. allows to split the code into modules as one
might have done it with linkable code. A more fine-grained approach
would be to pack every routine into a separate section. Depending on
the individual routines' lengths, the symbols for internal use may
obtain very short names.
3.7.2. Nesting and Scope Rules
sym EQU 0
SECTION ModuleA
SECTION ProcA1
sym EQU 5
ENDSECTION ProcA1
SECTION ProcA2
sym EQU 10
ENDSECTION ProcA2
ENDSECTION ModuleA
SECTION ModuleB
sym EQU 15
SECTION ProcB
ENDSECTION ProcB
ENDSECTION ModuleB
When looking up a symbol, AS first searches for a symbol assigned to
the current section, and afterwards traverses the list of parent
sections until the global symbols are reached. In our example, the
individual sections see the values given in table 3.5 for the symbol sym:
section | value | from section... |
---|---|---|
Global | 0 | Global |
ModuleA | 0 | Global |
ProcA1 | 5 | ProcA1 |
ProcA2 | 10 | ProcA2 |
ModuleB | 15 | ModuleB |
ProcB | 15 | ModuleB |
This rule can be overridden by explicitly appending a section's name to the symbol's name. The section's name has to be enclosed in brackets:
move.l #sym[ModulB],d0Only sections that are in the parent section path of the current section may be used. The special values PARENT0..PARENT9 are allowed to reference the n-th ''parent'' of the current section; PARENT0 is therefore equivalent to the current section itself, PARENT1 the direct parent and so on. PARENT1 may be abbreviated as PARENT. If no name is given between the brackets, like in this example:
move.l #sym[],d0 ,one reaches the global symbol. CAUTION! If one explicitly references a symbol from a certain section, AS will only seek for symbols from this section, i.e. the traversal of the parent sections path is omitted!
Similar to Pascal, it is allowed that different sections have subsections of the same name; the principle of locality avoids irritations. One should IMHO still use this feature as seldom as possible: Symbols listed in the symbol resp. cross reference list are only marked with the section they are assigned to, not with the ''section hierarchy'' lying above them (this really would have busted the available space); a differentiation is made very difficult this way.
As a SECTION instruction does not define a label by itself, the section concept has an important difference to Pascal's concept of nested procedures: a pascal procedure can automatically ''see'' its subprocedures(functions), AS requires an explicit definition of an entry point. This can be done e.g. with the following macro pair:
proc MACRO name SECTION name name LABEL $ ENDM endp MACRO name ENDSECTION name ENDMThis example also shows that the locality of labels inside macros is not influenced by sections. It makes the trick with the LABEL instruction necessary.
This does of course not solve the problem completely. The label is still local and not referencable from the outside. Those who think that it would suffice to place the label in front of the SECTION statement should be quiet because they would spoil the bridge to the next theme:
The PUBLIC statement allows to change the assignment of a symbol to a certain section. It is possible to treat multiple symbols with one statement, but I will use an example with only one symbol in the following (not hurting the generality of this discussion). In the simplest case, one declares a symbol to be global, i.e. it can be referenced from anywhere in the program:
PUBLIC <name>As a symbol cannot be moved in the symbol table once it has been sorted in, this statement has to appear before the symbol itself is defined. AS stores all PUBLICs in a list and removes an entry from this list when the corresponding symbol is defined. AS prints errors at the end of a section in case that not all PUBLICs have been resolved.
Regarding the hierarchical section concept, the method of defining a symbol as purely global looks extremely brute. There is fortunately a way to do this in a bit more differentiated way: by appending a section name:
PUBLIC <name>:<section>The symbol will be assigned to the referenced section and therefore also becomes accessible for all its subsections (except they define a symbol of the same name that hides the ''more global'' symbol). AS will naturally protest if several subsections try to export a symbol of same name to the same level. The special PARENTn values mentioned in the previous section are also valid for <section> to export a symbol exactly n levels up in the section hierarchy. Otherwise only sections that are parent sections of the current section are valid for <section>. Sections that are in another part of the section tree are not allowed. If several sections in the parent section path should have the same name (this is possible), the lowest level will be taken.
This tool lets the abovementioned macro become useful:
proc MACRO name SECTION name PUBLIC name:PARENT name LABEL $ ENDMThis setting is equal to the Pascal model that also only allows the ''father'' to see its children, but not the ''grandpa''.
AS will quarrel about double-defined symbols if more than one section attempts to export a symbol of a certain name to the same upper section. This is by itself a correct reaction, and one needs to ''qualify'' symbols somehow to make them distinguishable if these exports were deliberate. A GLOBAL statement does just this. The syntax of GLOBAL is identical to PUBLIC, but the symbol stays local instead of being assigned to a higher section. Instead, an additional symbol of the same value but with the subsection's name appended to the symbol's name is created, and only this symbol is made public according to the section specification. If for example two sections A and B both define a symbol named SYM and export it with a GLOBAL statement to their parent section, the symbols are sorted in under the names A_SYM resp. B_SYM .
In case that source and target section are separated by more than one level, the complete name path is prepended to the symbol name.
The model described so far may look beautiful, but there is an additional detail not present in Pascal that may spoil the happiness: Assembler allows forward references. Forward references may lead to situations where AS accesses a symbol from a higher section in the first pass. This is not a disaster by itself as long as the correct symbol is used in the second pass, but accidents of the following type may happen:
loop: . <code> . . SECTION sub . ; *** . bra.s loop . . loop: . . ENDSECTION . . jmp loop ; main loopAS will take the global label loop in the first pass and will quarrel about an out-of-branch situation if the program part at <code> is long enough. The second pass will not be started at all. One way to avoid the ambiguity would be to explicitly specify the symbol's section:
bra.s loop[sub]If a local symbol is referenced several times, the brackets can be saved by using a FORWARD statement. The symbol is thereby explicitly announced to be local, and AS will only look in the local symbol table part when this symbol is referenced. For our example, the statement
FORWARD loopshould be placed at the position marked with ***.
FORWARD must not only be stated prior to a symbol's definition, but also prior to its first usage in a section to make sense. It does not make sense to define a symbol private and public; this will be regarded as an error by AS.
The multi-stage lookup in the symbol table and the decision to which section a symbol shall be assigned of course cost a bit of time to compute. An 8086 program of 1800 lines length for example took 34.5 instead of 33 seconds after a modification to use sections (80386 SX, 16MHz, 3 passes). The overhead is therefore limited. As it has already been stated at the beginning, is is up to the programmer if (s)he wants to accept it. One can still use AS without sections.
valid for: all processors
This statement instructs AS to write the symbols given in the parameter list (regardless if they are integer, float or string symbols) together with their values into the share file. It depends upon the command line parameters described in section 2.4 whether such a file is generated at all and in which format it is written. If AS detects this instruction and no share file is generated, a warning is the result.
CAUTION! A comment possibly appended to the statement itself will be copied to the first line outputted to the share file (if SHARED's argument list is empty, only the comment will be written). In case a share file is written in C or Pascal format, one has to assure that the comment itself does not contain character sequences that close the comment (''*/'' resp. ''*)''). AS does not check for this!
valid for: all processors
This instruction inserts the file given as a parameter into the just as if it would have been inserted with an editor (the file name may optionally be enclosed with '' characters). This instruction is useful to split source files that would otherwise not fit into the editor or to create ''tool boxes''.
In case that the file name does not have an extension, it will automatically be extended with INC.
Via the -i <path list> option, one can specify a list of directories that will automatically be searched for the file. If the file is not found, a fatal error occurs, i.e. assembly terminates immediately.
For compatibility reasons, it is valid to enclose the file name in '' characters, i.e.
include stddef51and
include "stddef51.inc"are equivalent. CAUTION! This freedom of choice is the reason why only a string constant but no string expression is allowed!
The search list is ignored if the file name itself contains a path specification.
valid for: all processors
BINCLUDE can be used to embed binary data generated by other programs into the code generated by AS (this might theoretically even be code created by AS itself...). BINCLUDE has three forms:
BINCLUDE <file>This way, the file is completely included.
BINCLUDE <file>,<offset>This way, the file's contents are included starting at <offset> up to the file's end.
BINCLUDE <file>,<offset>,<length>This way, <length> bytes are included starting at <offset>.
The same rules regarding search paths apply as for INCLUDE.
3.8.4. MESSAGE, WARNING, ERROR, and FATAL
Though the assembler checks source files as strict as possible and delivers differentiated error messages, it might be necessary from time to time to issue additional error messages that allow an automatic check for logical error. The assembler distinguishes among three different types of error messages that are accessible to the programmer via the following three instructions:
These instructions generally only make sense in conjunction wit conditional assembly. For example, if there is only a limited address space for a program, one can test for overflow in the following way:
ROMSize equ 8000h ; 27256 EPROM ProgStart: . . <the program itself> . . ProgEnd: if ProgEnd-ProgStart>ROMSize error "\athe program is too long!" endifApart from the instructions generating errors, there is also an instruction MESSAGE that simply prints a message to the console resp. to the assembly listing. Its usage is equal to the other three instructions.
3.8.5. READ
One could say that READ is the counterpart to the previous instruction group: it allows to read values from the keyboard during assembly. You might ask what this is good for. I will break with the previous principles and put an example before the exact description to outline the usefulness of this instruction:
A program needs for data transfers a buffer of a size that should be set at assembly time. One could store this size in a symbol defined with EQU, but it can also be done interactively with READ:
IF MomPass=1 READ "buffer size",BufferSize ENDIFPrograms can this way configure themselves dynamically during assembly and one could hand over the source to someone who can assemble it without having to dive into the source code. The IF conditional shown in the example should always be used to avoid bothering the user multiple times with questions.
READ is quite similar to SET with the difference that the value is read from the keyboard instead of the instruction's arguments. This for example also implies that AS will automatically set the symbol's type (integer, float or string) or that it is valid to enter formula expressions instead of a simple constant.
READ may either have one or two parameters because the prompting message is optional. AS will print a message constructed from the symbol's name if it is omitted.
valid for: all processors
By default, AS assigns a distinct syntax for integer constants to a processor family (which is in general equal to the manufacturer's specifications, as long as the syntax is not too bizarre...). Everyone however has his own preferences for another syntax and may well live with the fact that his programs cannot be translated any more with the standard assembler. If one places the instruction
RELAXED ONright at the program's beginning, one may furtherly use any syntax for integer constants, even mixed in a program. AS tries to guess automatically for every expression the syntax that was used. This automatism does not always deliver the result one might have in mind, and this is also the reason why this option has to be enable explicitly: if there are no prefixes or postfixes that unambiguously identify either Intel or Motorola syntax, the C mode will be used. Leading zeroes that are superfluous in other modes have a meaning in this mode:
move.b #08,d0This constant will be understood as an octal constant and will result in an error message as octal numbers may only contain digits from 0 to 7. One might call this a lucky case; a number like 077 would result in trouble without getting a message about this. Without the relaxed mode, both expressions unambiguously would have been identified as decimal constants.
The current setting may be read from a symbol with the same name.
3.8.7. END
END marks the end of an assembler program. Lines that eventually follow in the source file will be ignored. IMPORTANT: END may be called from within a macro, but the IF-stack for conditional assembly is not cleared automatically. The following construct therefore results in an error:
IF DontWantAnymore END ELSEIFEND may optionally have an integer expression as argument that marks the program's entry point. AS stores this in the code file with a special record and it may be post-processed e.g. with P2HEX.
END has always been a valid instruction for AS, but the only reason for this in earlier releases of AS was compatibility; END had no effect.
When writing the individual code generators, I strived for a maximum amount of compatibility to the original assemblers. However, I only did this as long as it did not mean an unacceptable additional amount of work. I listed important differences, details and pitfalls in the following chapter.
''Where can I buy such a beast, a HC11 in NMOS?'', some of you might ask. Well, of course it does not exist, but an H cannot be represented in a hexadecimal number (older versions of AS would not have accepted such a name because of this), and so I decided to omit all the letters...
''Someone stating that something is impossible should be at least as cooperative as not to hinder the one who currently does it.''From time to time, one is forced to revise one's opinions. Some versions earlier, I stated at his place that I couldn't use AS's parser in a way that it is also possible to to separate the arguments of BSET/BCLR resp. BRSET/BRCLR with spaces. However, it seems that it can do more than I wanted to believe...after the n+1th request, I sat down once again to work on it and things seem to work now. You may use either spaces or commas, but not in all variants, to avoid ambiguities: for every variant of an instruction, it is possible to use only commas or a mixture of spaces and commas as Motorola seems to have defined it (their data books do not always have the quality of the corresponding hardware...):
Bxxx abs8 #mask is equal to Bxxx abs8,#mask Bxxx disp8,X #mask is equal to Bxxx disp8,X,#mask BRxxx abs8 #mask addr is equal to BRxxx abs8,#mask,addr BRxxx disp8,X #mask addr is equal to BRxxx disp8,X,#mask,addrIn this list, xxx is a synonym either for SET or CLR; #mask is the bit mask to be applied (the # sign is optional). Of course, the same statements are also valid for Y-indexed expression (not listed here).
Of course, it is a bit crazy idea to add support in AS for a processor that was mostly designed for usage in work stations. Remember that AS mainly is targeted at programmers of single board computers. But things that today represent the absolute high end in computing will be average tomorrow and maybe obsolete the next day, and in the meantime, the Z80 as the 8088 have been retired as CPUs for personal computers and been moved to the embedded market; modified versions are marketed as microcontrollers. With the appearance of the MPC505 and PPC403, my suspicion has proven to be true that IBM and Motorola try to promote this architecture in as many fields as possible.
However, the current support is a bit incomplete: Temporarily, the Intel-style mnemonics are used to allow storage of data and the more uncommon RS/6000 machine instructions mentioned in [43] are missing (hopefully noone misses them!). I will finish this as soon as information about them is available!
Motorola, which devil rode you! Which person in your company had the ''brilliant'' idea to separate the parallel data transfers with spaces! In result, everyone who wants to make his code a bit more readable, e.g. like this:
move x:var9 ,r0 move y:var10,r3 ,is p****ed because the space gets recognized as a separator for parallel data transfers!
Well...Motorola defined it that way, and I cannot change it. Using tabs instead of spaces to separate the parallel operations is also allowed, and the individual operations' parts are again separated with commas, as one would expect it.
[38] states that instead of using MOVEC, MOVEM, ANDI or ORI, it is also valid to use the more general Mnemonics MODE, AND or OR. AS (currently) does not support this.
Regarding the assembler syntax of these processors, Hitachi generously copied from Motorola (that wasn't by far the worst choice...), unfortunately the company wanted to introduce its own format for hexadecimal numbers. To make it even worse, it is a format that uses unbalanced single quotes, just like Microchip does. This is something I could not (I even did not want to) reproduce with AS, as AS uses single quotes to surround ASCII character sequences. Instead, one has to write hexadecimal numbers in the well-known Motorola syntax: with a leading dollar sign.
Unfortunately, Hitachi once again used their own format for hexadecimal numbers, and once again I was not able to reproduce this with AS...please use Motorola syntax!
When using literals and the LTORG instruction, a few things have to be kept in mind if you do not want to suddenly get confronted with strange error messages:
Literals exist due to the fact that the processor is unable to load constants out of a range of -128 to 127 with immediate addressing. AS (and the Hitachi assembler) hide this inability by the automatic placement of constants in memory which are then referenced via PC-relative addressing. The question that now arises is where to locate these constants in memory. AS does not automatically place a constant in memory when it is needed; instead, they are collected until an LTORG instruction occurs. The collected constants are then dumped en bloc, and their addresses are stored in ordinary labels which are also visible in the symbol table. Such a label's name is of the form
LITERAL_s_xxxx_n .In this name, s represents the literal's type. Possible values are W for 16-bit constants, L for 32-bit constants and F for forward references where AS cannot decide in anticipation which size is needed. In case of s=W or L, xxxx denotes the constant's value in a hexadecimal notation, whereas xxxx is a simple running number for forward references (in a forward reference, one does not know the value of a constant when it is referenced, so one obviously cannot incorporate its value into the name). n is a counter that signifies how often a literal of this value previously occurred in the current section. Literals follow the standard rules for localization by sections. It is therefore absolutely necessary to place literals that were generated in a certain section before the section is terminated!
The numbering with n is necessary because a literal may occur multiple times in a section. One reason for this situation is that PC-relative addressing only allows positive offsets; Literals that have once been placed with an LTORG can therefore not be referenced in the code that follows. The other reason is that the displacement is generally limited in length (512 resp. 1024 bytes).
An automatic LTORG at the end of a program or previously to switching to a different target CPU does not occur; if AS detects unplaced literals in such a situation, an error message is printed.
As the PC-relative addressing mode uses the address of the current instruction plus 4, it is not possible to access a literal that is stored directly after the instruction, like in the following example:
mov #$1234,r6 ltorgThis is a minor item since the CPU anyway would try to execute the following data as code. Such a situation should not occur in a real program...another pitfall is far more real: if PC-relative addressing occurs just behind a delayed branch, the program counter is already set to the destination address, and the displacement is computed relative to the branch target plus 2. Following is an example where this detail leads to a literal that cannot be addressed:
bra Target mov #$12345678,r4 ; is executed . . ltorg ; here is the literal . . Target: mov r4,r7 ; execution continues hereAs Target+2 is on an address behind the literal, a negative displacement would result. Things become especially hairy when one of the branch instructions JMP, JSR, BRAF, or BSRF is used: as AS cannot calculate the target address (it is generated at runtime from a register's contents), a PC value is assumed that should never fit, effectively disabling any PC-relative addressing at this point.
It is not possible to deduce the memory usage from the count and size of literals. AS might need to insert a padding word to align a long word to an address that is evenly divisible by 4; on the other hand, AS might reuse parts of a 32-bit literal for other 16-bit literals. Of course multiple use of a literal with a certain value will create only one entry. However, such optimizations are completely suppressed for forward references as AS does not know anything about their value.
As literals use the PC-relative addressing which is only allowed for the MOV instruction, the usage of literals is also limited to MOV instructions. The way AS uses the operand size is a bit tricky: A specification of a byte or word move means to generate the shortest possible instruction that results in the desired value placed in the register's lowest 8 resp. 16 bits. The upper 24 resp. 16 bits are treated as ''don't care''. However, if one specifies a longword move or omits the size specification completely, this means that the complete 32-bit register should contain the desired value. For example, in the following sequence
mov.b #$c0,r0 mov.w #$c0,r0 mov.l #$c0,r0 ,the first instruction will result in true immediate addressing, the second and third instruction will use a word literal: As bit 7 in the number is set, the byte instruction will effectively create the value $FFFFFFC0 in the register. According to the convention, this wouldn't be the desired value in the second and third example. However, a word literal is also sufficient for the third case because the processor will copy a cleared bit 15 of the operand to bits 16..31.
As one can see, the whole literal stuff is rather complex; I'm sorry but there was no chance of making things simpler. It is unfortunately a part of its nature that one sometimes gets error messages about literals that were not found, which logically should not occur because AS does the literal processing completely on his own. However, if other errors occur in the second pass, all following labels will move because AS does not generate any code any more for statements that have been identified as erroneous. As literal names are partially built from other symbols' values, other errors might follow because literal names searched in the second pass differ from the names stored in the first pass and AS quarrels about undefined symbols...if such errors should occur, please correct all other errors first before you start cursing on me and literals...
People who come out of the Motorola scene and want to use PC-relative addressing explicitly (e.g. to address variables in a position-independent way) should know that if this addressing mode is written like in the programmer's manual:
mov.l @(Var,PC),r8no implicit conversion of the address to a displacement will occur, i.e. the operand is inserted as-is into the machine code (this will probably generate a value range error...). If you want to use PC-relative addressing on the SH7x00, simply use ''absolute'' addressing (which does not exist on machine level):
mov.l Var,r8In this example, the displacement will be calculated correctly (of course, the same limitations apply for the displacement as it was the case for literals).
The program memory of these microcontrollers is organized in pages of 128 words. Honestly said, this organization only exists because there are on the one hand branch instructions with a target that must lie within the same page, and on the other hand ''long'' branches that can reach the whole address space. The standard syntax defined by Mitsubishi demands that page number and offset have to be written as two distinct arguments for the latter instructions. As this is quite inconvenient (except for indirect jumps, a programmer has no other reason to deal with pages), AS also allows to write the target address in a ''linear'' style, for example
bl $1234instead of
bl $24,$34 .
Since the 6502's undocumented instructions naturally aren't listed in any data book, they shall be listed shortly at this place. Of course, you are using them on your own risk. There is no guarantee that all mask revisions will support all variants! They anyhow do not work for the CMOS successors of the 6502, since they allocated the corresponding bit combinations with "official" instructions...
The following symbols are used:
& | binary AND |
| | binary OR |
^ | binary XOR |
<< | logical shift left |
>> | logical shift right |
<<< | rotate left |
>>> | rotate right |
<- | assignment |
(..) | contents of .. |
.. | bits .. |
A | accumulator |
X,Y | index registers X,Y |
S | stack pointer |
An | accumulator bit n |
M | operand |
C | carry |
PCH | upper half of program counter |
Instruction | : | JAM or KIL or CRS |
Function | : | none, prozessor is halted |
Addressing Modes | : | implicit |
Instruction | : | SLO |
Function | : | M<-((M)<<1)|(A) |
Addressing Modes | : | absolute long/short, X-indexed long/short, |
Y-indexed long, X/Y-indirect | ||
Instruction | : | ANC |
Function | : | A<-(A)&(M), C<- A7 |
Addressing Modes | : | immediate |
Instruction | : | RLA |
Function | : | M<-((M)<<1)&(A) |
Addressing Modes | : | absolute long/short, X-indexed long/short, |
Y-indexed long, X/Y-indirect | ||
Instruction | : | SRE |
Function | : | M<-((M)>>1)^(A) |
Addressing Modes | : | absolute long/short, X-indexed long/short, |
Y-indexed long, X/Y-indirect | ||
Instruction | : | ASR |
Function | : | A<-((A)&(M))>>1 |
Addressing Modes | : | immediate |
Instruction | : | RRA |
Function | : | M<-((M)>>>1)+(A)+(C) |
Addressing Modes | : | absolute long/short, X-indexed long/short, |
Y-indexed long, X/Y-indirect | ||
Instruction | : | ARR |
Function | : | A<-((A)&(M))>>>1 |
Addressing Modes | : | immediate |
Instruction | : | SAX |
Function | : | M<-(A)&(X) |
Addressing Modes | : | absolute long/short, Y-indexed short, |
Y-indirect | ||
Instruction | : | ANE |
Function | : | M<-((A)&$ee)|((X)&(M)) |
Addressing Modes | : | immediate |
Instruction | : | SHA |
Function | : | M<-(A)&(X)&(PCH+1) |
Addressing Modes | : | X/Y-indexed long |
Instruction | : | SHS |
Function | : | X<-(A)&(X), S<-(X), M<-(X)&(PCH+1) |
Addressing Modes | : | Y-indexed long |
Instruction | : | SHY |
Function | : | M<-(Y)&(PCH+1) |
Addressing Modes | : | Y-indexed long |
Instruction | : | SHX |
Function | : | M<-(X)&(PCH+1) |
Addressing Modes | : | X-indexed long |
Instruction | : | LAX |
Function | : | A,X<-(M) |
Addressing Modes | : | absolute long/short, Y-indexed long/short, |
X/Y-indirect | ||
Instruction | : | LXA |
Function | : | X04<-(X)04&(M)04, |
A04<-(A)04&(M)04 | ||
Addressing Modes | : | immediate |
Instruction | : | LAE |
Function | : | X,S,A<-((S)&(M)) |
Addressing Modes | : | Y-indexed long |
Instruction | : | DCP |
Function | : | M<-(M)-1, Flags<-((A)-(M)) |
Addressing Modes | : | absolute long/short, X-indexed long/short, |
Y-indexed long, X/Y-indirect | ||
Instruction | : | SBX |
Function | : | X<-((X)&(A))-(M) |
Addressing Modes | : | immediate |
Instruction | : | ISB |
Function | : | M<-(M)+1, A<-(A)-(M)-(C) |
Addressing Modes | : | absolute long/short, X-indexed long/short, |
Y-indexed long, X/Y-indirect | ||
Microcontrollers of this family have a quite nice, however well-hidden feature: If one sets bit 5 of the status register with the SET instruction, the accumulator will be replaced with the memory cell addressed by the X register for all load/store and arithmetic instructions. An attempt to integrate this feature cleanly into the assembly syntax has not been made so far, so the only way to use it is currently the ''hard'' way (SET...instructions with accumulator addressing...CLT).
Not all MELPS-740 processors implement all instructions. This is a place where the programmer has to watch out for himself that no instructions are used that are unavailable for the targeted processor; AS does not differentiate among the individual processors of this family. For a description of the details regarding special page addressing, see the discussion of the ASSUME instruction.
As it seems, these two processor families took disjunct development
paths, starting from the 6502 via their 8 bit predecessors. Shortly
listed, the following differences are present:
Especially tricky are the instructions PHB, PLB and
TSB: these instructions have a totally different encoding and
meaning on both processors!
Unfortunately, these processors address their memory in a way that is
IMHO even one level higher on the open-ended chart of perversity than
the Intel-like segmentation: They do banking! Well, this seems to be
the price for the 6502 upward-compatibility; before one can use AS to
write code for these processors, one has to inform AS about the
contents of several registers (using the ASSUME
instruction):
The M flag rules whether the accumulators A and B should be used with
8 bits (1) or 16 bits (0) width. Analogously, the X flag decides the
width of the X and Y index registers. AS needs this information for
the decision about the argument's width when immediate addressing
(#<constant>) occurs.
The memory is organized in 256 banks of 64 KBytes. As all registers
in the CPU core have a maximum width of 16 bits, the upper 8 bits
have to be fetched from 2 special bank registers: DT delivers the
upper 8 bits for data accesses, and PG extends the 16-bit program
counter to 24 bits. A 16 bits wide register DPR allows to move the
zero page known from the 6502 to an arbitrary location in the first
bank. If AS encounters an address (it is irrelevant if this address
is part of an absolute, indexed, or indirect expression), the
following addressing modes will be tested:
The automatic determination of the address length described above may
be overridden by the usage of prefixes. If one prefixes the address
by a <, >, or >> without a separating space, an address
with 1, 2, or 3 bytes of length will be used, regardless if this is
the optimal length. If one uses an address length that is either not
allowed for the current instruction or too short for the address, an
error message is the result.
To simplify porting of 6502 programs, AS uses the Motorola syntax for
hexadecimal constants instead of the Intel/IEEE syntax that is the
format preferred by Mitsubishi for their 740xxx series. I still think
that this is the better format, and it looks as if the designers of
the 65816 were of the same opinion (as the RELAXED
instruction allows the alternative use of Intel notation, this
decision should not hurt anything). Another important detail for the
porting of programs is that it is valid to omit the accumulator A as
target for operations. For example, it is possible to simply write
LDA #0 instead of LDA A,#0.
A real goodie in the instruction set are the instructions
MVN resp. MVP to do block transfers. However, their
address specification rules are a bit strange: bits 0--15 are stored
in index registers, bits 16--23 are part of the instruction. When one
uses AS, one simply specifies the full destination and source
addresses. AS will then automatically grab the correct bits. This is
a fine yet important difference Mitsubishi's assembler where you have
to extract the upper 8 bits on your own. Things become really
convenient when a macro like the following is used:
The PSH and PUL instructions are also very handy
because they allow to save a user-defined set to be saved to the
stack resp. to be restored from the stack. According to the
Mitsubishi data book [27], the bit mask has
to be specified as an immediate operand, so the programmer either has
to keep all bit<->register assignments in mind or he has to define
some appropriate symbols. To make things simpler, I decided to extend
the syntax at this point: It is valid to use a list as argument which
may contain an arbitrary sequence of register names or immediate
expressions. Therefore, the following instructions
One thing I did not fully understand while studying the Mitsubishi
assembler is the treatment of the PER instruction: this
instruction allows to push a 16-bit variable onto the stack whose
address is specified relative to the program counter. Therefore, it
is an absolute addressing mode from the programmer's point of view.
Nevertheless, the Mitsubishi assembler requests immediate addressing,
and the instructions argument is placed into the code just as-is. One
has to calculate the address in his own, which is something symbolic
assemblers were designed for to avoid...as I wanted to stay
compatible, AS contains a compromise: If one chooses immediate
addressing (with a leading # sign), AS will behave like the original
from Mitsubishi. But if the # sign is omitted, as will calculate the
difference between the argument's value and the current program
counter and insert this difference instead.
A similar situation exists for the PEI instruction that
pushes the contents of a 16-bit variable located in the zero page:
Though the operand represents an address, once again immediate
addressing is required. In this case, AS will simply allow both
variants (i.e. with or without a # sign).
The M16 family is a family of highly complex CISC processors with an
equally complicated instruction set. One of the instruction set's
properties is the detail that in an instruction with two operands,
both operands may be of different sizes. The method of appending the
operand size as an attribute of the instruction (known from Motorola
and adopted from Mitsubishi) therefore had to be extended: it is
valid to append attributes to the operands themselves. For example,
the following instruction
The chained addressing modes are also rather complex; the ability of
AS to automatically assign address components to parts of the chain
keeps things at least halfway manageable. The only way of influencing
AS allows (the original assembler from Mitsubishi/Green Hills allows
a bit more in this respect) is the explicit setting of displacement
lengths by appending :4, :16 and :32.
Another part of history...unfortunately, I wasn't able up to now to
get my hands on official documentation for world's first
microprocessor, and there are some details lacking: I'm not
absolutely sure about the syntax for register pairs (for 8-bit
operations). The current syntax is RRn with n being
an even integer in the range from 0 to 14.
The maximum address space of these processors is 4 Kbytes large. This
address space is not organized in a linear way (how could this be on
an Intel CPU...). Instead, it is split into 2 banks of 2 Kbytes. The
only way to change the program counter from one bank to the other are
the instructions CALL and JMP, by setting the most
significant bit of the address with the instructions SEL MB0
resp. SEL MB1.
To simplify jumps between these two banks, the instructions
JMP and CALL contain an automatism that inserts one of
these two instructions if the current program counter and the target
address are in different banks. Explicit usage of these SEL
MBx instructions should therefore not be necessary (though it is
possible), and it can puzzle the automatism, like in the following
example:
Furthermore, one should keep in mind that a jump instruction might
become longer (3 instead of 2 bytes).
The assembler is accompanied by the files STDDEF51.INC
resp. 80C50X.INC that define all bits and SFRs of the
processors 8051, 8052, and 80515 resp. 80C501, 502, and 504.
Depending on the target processor setting (made with the CPU
statement), the correct subset will be included. Therefore, the
correct order for the instructions at the beginning of a program is
As the 8051 does not have instructions to to push the registers 0..7
onto the stack, one has to work with absolute addresses. However,
these addresses depend on which register bank is currently active. To
make this situation a little bit better, the include files define the
macro USING that accepts the symbols Bank0...Bank3
as arguments. In response, the macro will assign the registers'
correct absolute addresses to the symbols AR0..AR7. This
macro should be used after every change of the register banks. The
macro itself does not generate any code to switch to the bank!
The macro also makes bookkeeping about which banks have been used.
The result is stored in the integer variable RegUsage: bit 0
corresponds to bank 0, bit 1 corresponds to bank 1. and so on. To
output its contents after the source has been assembled, use
something like the following piece of code:
When designing the 80C251, Intel really tried to make the move to the
new family as smooth as possible for programmers. This culminated in
the fact that old applications can run on the new processor without
having to recompile them. However, as soon as one wants to use the
new features, some details have to be regarded which may turn into
hidden pitfalls.
The most important thing is the absence of a distinct address space
for bits on the 80C251. All SFRs can now be addressed bitwise,
regardless of their address. Furthermore, the first 128 bytes of the
internal RAM are also bit addressable. This has become possible
because bits are not any more handled by a separate address space
that overlaps other address spaces. Instead, similar to other
processors, bits are addressed with a two-dimensional address that
consists of the memory location containing the bit and the bit's
location in the byte. One result is that in an expression like
PSW.7, AS will do the separation of address and bit position
itself. Unlike to the 8051, it is not any more necessary to
explicitly generate 8 bit symbols. This has the other result that
the SFRB instruction does not exist any more. If it is used
in a program that shall be ported, it may be replaced with a
simple SFR instruction.
Furthermore, Intel cleaned up the cornucopia of different address
spaces on the 8051: the internal RAM (DATA resp.
IDATA), the XDATA space and the former CODE
space were unified to a single CODE space that is now 16
Mbytes large. The internal RAM starts at address 0, the internal ROM
starts at address ff0000h, which is the address code has to be
relocated to. In contrast, the SFRs were moved to a separate address
space (which AS refers to as the IO segment). However, they
have the same addresses in this new address space as they used to
have on the 8051. The SFR instructions knows of this
difference and automatically assigns symbols to either the
DATA or IO segment, depending on the target processor.
As there is no BIT segment any more, the BIT
instruction operates completely different: Instead of a linear
address ranging from 0..255, a bit symbol now contains the byte's
address in bit 0..7, and the bit position in bits 24..26.
Unfortunately, creating arrays of flags with a symbolic address is
not that simple any more: On an 8051, one simply wrote:
Intel would like everyone to write absolute addresses in a syntax
of XX:YYYY, where XX is a 64K bank in the address
space resp. signifies addresses in the I/O space with an S.
As one might guess, I am not amused about this, which is why it is
legal to alternitavely use linear addresses in all places. Only
the S for I/O addresses is incircumventable, like in this
case:
Like for the 8051, the generic branch instructions CALL
and JMP exist that automatically choose the shortest machine
code depending on the address layout. However, while JMP
also may use the variant with a 24-bit address, CALL will
not do this for a good reason: In contrast to ACALL and
LCALL, ECALL places an additional byte onto the stack.
A CALL instruction would result where you would not know
what it will do. This problem does not exist for the JMP
instructions.
There is one thing I did not understand: The 80251 is also able to
push immediate operands onto the stack, and it may push either single
bytes or complete words. However, the same mnemonic (PUSH)
is assigned to both variants - how on earth should an assembler know
if an instruction like
Another well-meant advise: If you use the extended instruction set,
be sure to operate the processor in source mode; otherwise, all
instructions will become one byte longer! The old 8051 instructions
that will in turn become one byte longer are not a big matter: AS
will either replace them automatically with new, more general
instructions or they deal with obsolete addressing modes (indirect
addressing via 8 bit registers).
Actually, I had sworn myself to keep the segment disease of Intel's
8086 out of the assembler. However, as there was a request and as
students are more flexible than the developers of this processor
obviously were, there is now a rudimentary support of these
processors in AS. When saying, 'rudimentary', it does not mean that
the instruction set is not fully covered. It means that the whole
pseudo instruction stuff that is available when using MASM, TASM, or
something equivalent does not exist. To put it in clear words, AS was
not primarily designed to write assembler programs for PC's (heaven
forbid, this really would have meant reinventing the wheel!);
instead, the development of programs for single-board computers was
the main goal (which may also be equipped with an 8086 CPU).
For die-hards who still want to write DOS programs with AS, here is a
small list of things to keep in mind:
Another big problem of these processors is their assembler syntax,
which is sometimes ambiguous and whose exact meaning can then only be
deduced by looking at the current context. In the following example,
either absolute or immediate addressing may be meant, depending on
the symbol's type:
4.9. MELPS-7700/65816
The following instructions have identical function, yet different
names:
65816
MELPS-7700
65816
MELPS-7700
REP
TCS
TCD
PHB
WAI CLP
TAS
TAD
PHT
WIT PHK
TSC
TDC
PLB
PHG
TSA
TDA
PLT
As one can see from this enumeration, the knowledge about the current
values of DT, PG and DPR is essential for a correct operation of AS;
if the specifications are incorrect, the program will probably do
wrong addressing at runtime. This enumeration also implied that all
three address lengths are available; if this is not the case, the
decision chain will become shorter.
mvpos macro src,dest,len
if MomCPU=$7700
lda #len
elseif
lda #(len-1)
endif
ldx #(src&$ffff)
ldy #(dest&$ffff)
mvp dest,src
endm
Caution, possible pitfall: if the accumulator contains the value n,
the Mitsubishi chip will transfer n bytes, but the 65816 will
transfer n+1 bytes!
psh #$0f
psh a,b,#$0c
psh a,b,x,y
are equivalent. As immediate expressions are still valid, AS stays
upward compatible to the Mitsubishi assemblers.
mov r0.b,r6.w
reads the lowest 8 bits of register 0, sign-extends them to 32 bits
and stores the result into register 6. However, as one does not need
this feature in 9 out of 10 cases, it is still valid to append the
operand size to the instruction itself, e.g.
mov.w r0,r6
Both variants may be mixed; in such a case, an operand size appended
to an operand overrules the ''default''. An exception are
instructions with two operands. For these instructions, the default
for the source operand is the destination operand's size. For
example, in the following example
mov.h r0,r6.w
register 0 is accessed with 32 bits, the size specification appended
to the instruction is not used at all. If an instruction does not
contain any size specifications, word size (w) will be used.
Remember: in contrast to the 68000 family, this means 32 bits instead
of 16 bits!
000: SEL MB1
JMP 200h
AS assumes that the MB flag is 0 and therefore does not insert a
SEL MBO instruction, with the result that the CPU jumps to
address A00h.
CPU <processor type>
INCLUDE stddef51.inc .
Otherwise, the MCS-51 pseudo instructions will lead to error
messages.
irp BANK,Bank0,Bank1,Bank2,Bank3
if (RegUsage&(2^BANK))<>0
message "bank \{BANK} has been used"
endif
endm
The multipass feature introduced with version 1.38 allowed to
introduce the additional instructions JMP and CALL.
If branches are coded using these instructions, AS will automatically
use the variant that is optimal for the given target address. The
options are SJMP, AJMP, or LJMP for JMP
resp. ACALL or LCALL for CALL. Of course
it is still possible to use these variants directly, in case one
wants to force a certain coding.
segment bitdata
bit1 db ?
bit2 db ?
or
defbit macro name
name bit cnt
cnt set cnt+1
endm
On a 251, only the second way still works, like this:
adr set 20h ; start address of flags
bpos set 0 ; in the internal RAM
defbit macro name
name bit adr.bpos
bpos set bpos+1
if bpos=8
bpos set 0
adr set adr+1
endif
endm
Another small detail: Intel now prefers CY instead of
C as a symbolic name for the carry, so you might have to rename
an already existing variable of the same name in your program.
However, AS will continue to understand also the old variant when
using the instructions CLR, CPL, SETB, MOV, ANL, or
ORL. The same is conceptually true for the additional
registers R8..R15, WR0..WR30, DR0..DR28, DR56, DR60, DPX,
and SPX.
Carry bit s:0d0h.7
Without the prefix, AS would assume an address in the CODE
segment, and only the first 128 bits in this space are
bit-addressable...
push #10
shall push a byte or a word containing the value 10? So the current
rule is that PUSH always pushes a byte; if one wants to push
a word, simply use PUSHW instead of PUSH.
For these processors, AS only supports a small programming model,
i.e. there is one code segment with a maximum of 64 Kbytes and
a data segment of equal size for data (which cannot be set to initial
values for COM files). The SEGMENT instruction
allows to switch between these two segments. From this facts results
that branches are always intrasegment branches if they refer to
targets in this single code segment. In case that far jumps should be
necessary, they are possible via CALLF or JMPF with
a memory address or a Segment:Offset value as argument.
mov ax,value
When using AS, an expression without brackets always is interpreted
as immediate addressing. For example, when either a variable's
address or its contents shall be loaded, the differences listed in
table 4.1 are present between MASM and AS:
assembler | address | contents |
---|---|---|
MASM AS |
mov ax,offset vari lea ax,vari lea ax,[vari] mov ax,vari lea ax,[vari] |
mov ax,vari mov ax,[vari] mov ax,[vari] |
When addressing via a symbol, the assembler checks whether they are assigned to the data segment and tries to automatically insert an appropriate segment prefix. This happens for example when symbols from the code segment are accessed without specifying a CS segment prefix. However, this mechanism can only work if the ASSUME instruction (see there) has previously been applied correctly.
The Intel syntax also requires to store whether bytes or words were stored at a symbol's address. AS will do this only when the DB resp. DW instruction is in the same source line as the label. For any other case, the operand size has to be specified explicitly with the BYTE PTR, WORD PTR,... operators. As long as a register is the other operator, this may be omitted, as the operand size is then clearly given by the register's name.
In an 8086-based system, the coprocessor is usually synchronized via via the processor's TEST input line which is connected to toe coprocessor's BUSY output line. AS supports this type of handshaking by automatically inserting a WAIT instruction prior to every 8087 instruction. If this is undesired for any reason, an N has to be inserted after the F in the mnemonic; for example,
FINIT FSTSW [vari]becomes
FNINIT FNSTSW [vari]This variant is valid for all coprocessor instructions.
The processors of this family have been optimized for an easy
manipulation of bit groups at peripheral addresses. The
instructions LIV and RIV were introduced to deal
with such objects in a symbolic fashion. They work similar to
EQU, however they need three parameters:
Regarding the machine code, length and position are expressed vis a 3
bit field in the instruction word and a proper register number
(LIVx resp. RIVx). If one uses a symbolic object,
AS will automatically assign correct values to this field, but it is
also allowed to specify the length explicitly as a third operand if
one does not work with symbolic objects. If AS finds such a length
specification in spite of a symbolic operand, it will compare both
lengths and issue an error if they do not match (the same will happen
for the MOVE instruction if two symbolic operands with different
lengths are used - the instruction simply only has a single length
field...).
Apart from the real machine instructions, AS defines similarly to its
''idol'' MCCAP some pseudo instructions that are implemented as
builtin macros:
Similar to its predecessor MCS/51, but in contrast to its
'competitor' MCS/251, the Philips XA has a separate address space for
bits, i.e. all bits that are accessible via bit instructions have a
certain, one-dimensional address which is stored as-is in the machine
code. However, I could not take the obvious opportunity to offer this
third address space (code and data are the other two) as a separate
segment. The reason is that - in contrast to the MCS/51 - some bit
addresses are ambiguous: bits with an address from 256 to 511 refer
to the bits of memory cells 20h..3fh in the current data segment.
This means that these addresses may correspond to different physical
bits, depending on the current state. Defining bits with the help
of DC instructions - something that would be possible with a
separate segment - would not make too much sense. However, the
BIT instruction still exists to define individual bits
(regardless if they are located in a register, the RAM or SFR space)
that can then be referenced symbolically. If the bit is located in
RAM, the address of the 64K-bank is also stored. This way, AS can
check whether the DS register has previously be assigned a correct
value with an ASSUME instruction.
In contrast, nothing can stop AS's efforts to align potential branch
targets to even addresses. Like other XA assemblers, AS does this by
inserting NOPs right before the instruction in question.
As one might guess, Zilog did not make any syntax definitions for the
undocumented instructions; furthermore, not everyone might know the
full set. It might therefore make sense to list all instructions at
this place:
Similar to a Z380, it is possible to access the byte halves of IX and
IY separately. In detail, these are the instructions that allow this:
The coding of shift instructions leaves an undefined bit combination
which is now accessible as the SLIA instruction.
SLIA works like SLA with the difference of entering a 1
into bit position 0. Like all other shift instructions, SLIA
also allows another undocumented variant:
In contrast to the AVR assembler, AS by default uses the Intel format
to write hexadecimal contants instead of the C syntax. All right, I
did not look into the (free) AVR assembler before, but when I started
with the AVR part, there was hardly mor einformation about the AVR
than a preliminary manual describing processor types that were never
sold...this problem can be solved with a simple RELAXED ON.
Optionally, AS can generate so-called "object files" for the AVRs (it
also works for other CPUs, but it does not make any sense for
them...). These are files containing code and source line info what
e.g. allows a step-by-step execution on source level with the WAVRSIM
simulator delivered by Atmel. Unfortunately, the simulator seems to
have trouble with source file names longer than approx. 20
characters: Names are truncated and/or extended by strange special
characters when the maximum length is exceeded. AS therefore stores
file name specifications in object files without a path
specification. Therefore, problems may arise when files like includes
are not in the current directory.
As this processor was designed as a grandchild of the still most
popular 8-bit microprocessor, it was a sine-qua-non design target to
execute existing Z80 programs without modification (of course, they
execute a bit faster, roughly by a factor of 10...). Therefore, all
extended features can be enabled after a reset by setting two bits
which are named XM (eXtended Mode, i.e. a 32-bit instead of a 16-bit
address space) respectively LW (long word mode, i.e. 32-bit instead
of 16-bit operands). One has to inform AS about their current setting
with the instructions EXTMODE resp. LWORDMODE, to
enable AS to check addresses and constants against the correct upper
limits. The toggle between 32- and 16-bit instruction of course only
influences instructions that are available in a 32-bit variant.
Unfortunately, the Z380 currently offers such variants only for load
and store instructions; arithmetic can only be done in 16 bits. Zilog
really should do something about this, otherwise the most positive
description for the Z380 would be ''16-bit processor with 32-bit
extensions''...
The whole thing becomes complicated by the ability to override the
operand size set by LW with the instruction prefixes DDIR W
resp. DDIR LW. AS will note the occurrence of such
instructions and will toggle setting for the instruction following
directly. By the way, one should never explicitly use other
DDIR variants than W resp. LW, as AS will
introduce them automatically when an operand is discovered that is
too long. Explicit usage might puzzle AS. The automatism is so
powerful that in a case like this:
These processors may run in two operating modes: on the one hand, in
minimum mode, which offers almost complete source code compatibility
to the Z80 and TLCS-90, and on the other hand in maximum mode, which
is necessary to make full use of the processor's capabilities. The
main differences between these two modes are:
From this follows that, depending on the operating mode, the 16-bit
resp. 32-bit versions of the bank registers have to be used for
addressing, i.e. WA, BC, DE and HL for the minimum mode resp. XWA,
XBC, XDE and XHL for the maximum mode. The registers XIX..XIZ and XSP
are always 32 bits wide and therefore always have to to be
used in this form for addressing; in this detail, existing Z80 code
definitely has to be adapted (not including that there is no I/O
space and all I/O registers are memory-mapped...).
The syntax chosen by Toshiba is a bit unfortunate in the respect of
choosing an single quote (') to reference the previous register bank.
The processor independent parts of AS already use this character to
mark character constants. In an instruction like
For many instructions, the syntax checking of AS is less strict than
the checking of TAS900. In some (rare) cases, the syntax is slightly
different. These extensions and changes are on the one hand for the
sake of a better portability of existing Z80 codes, on the other hand
they provide a simplification and better orthogonality of the
assembly syntax:
The macro processor of TAS900 is an external program that operates
like a preprocessor. It consists of two components: The first one is
a C-like preprocessor, and the second one is a special macro language
(MPL) that reminds of high level languages. The macro processor of AS
instead is oriented towards ''classic'' macro assemblers like MASM or
M80 (both programs from Microsoft). It is a fixed component of AS.
TAS900 generates relocatable code that allows to link separately
compiled programs to a single application. AS instead generates
absolute machine code that is not linkable. There are currently no
plans to extend AS in this respect.
Due to the missing linker, AS lacks a couple of pseudo instructions
needed for relocatable code TAS900 implements. The following
instructions are available with equal meaning:
Toshiba manufactures two versions of the processor core, with the L
version being an ''economy version''. AS will make the following
differences between TLCS-900 and TLCS-900L:
Maybe some people might ask themselves if I mixed up the order a
little bit, as Toshiba first released the TLCS-90 as an extended Z80
and afterwards the 16-bit version TLCS-900. Well, I discovered the
'90 via the '900 (thank you Oliver!). The two families are quite
similar, not only regarding their syntax but also in their
architecture. The hints for the '90 are therefore a subset of of the
chapter for the '900: As the '90 only allows shifts, increments, and
decrements by one, the count need not and must not be written as the
first argument. Once again, Toshiba wants to omit parentheses for
memory operands of LDA, JP, and CALL, and once again AS
requires them for the sake of orthogonality (the exact reason is of
course that this way, I saved an extra in the address parser, but one
does not say such a thing aloud).
Principally, the TLCS-90 series already has an address space of 1
Mbyte which is however only accessible as data space via the index
registers. AS therefore does not regard the bank registers and limits
the address space to 64 Kbytes. This should not limit too much as
this area above is anyway only reachable via indirect addressing.
Once again Toshiba...a company quite productive at the moment!
Especially this branch of the family (all Toshiba microcontrollers
are quite similar in their binary coding and programming model) seems
to be targeted towards the 8051 market: the method of separating the
bit position from the address expression with a dot had its root in
the 8051. However, it creates now exactly the sort of problems I
anticipated when working on the 8051 part: On the one hand, the dot
is a legal part of symbol names, but on the other hand, it is part of
the address syntax. This means that AS has to separate address and
bit position and must process them independently. Currently, I solved
this conflict by seeking the dot starting at the end of the
expression. This way, the last dot is regarded as the separator, and
further dots stay parts of the address. I continue to urge everyone
to omit dots in symbol names, they will lead to ambiguities:
This family of 4-bit microcontrollers should mark the low end of what
is supportable by AS. Apart from the ASSUME instruction for
the data bank register (see there), there is only one thing that is
worth mentioning: In the data and I/O segment, nibbles are reserved
instead of byte (it's a 4-bitter...). The situation is similar to the
bit data segment of the 8051, where a DB reserves a single
bit, with the difference that we are dealing with nibbles.
Toshiba defined an ''extended instruction set'' for this processor
family to facilitate the work with their limited instruction set. In
the case of AS, it is defined in the include file
STDDEF47.INC. However, some instructions that could not be
realized as macros are ''builtins'' and are therefore also available
without the include file:
This is the first time that I implemented a processor for AS which
was not available at that point of time. Unfortunately, Toshiba
decided to put this processor ''on ice'', so we won't see any silicon
in the near future. This has of course the result that this part
4.16. 8X30x
CAUTION! The 8X30x does not support bit groups that span over
more than one memory address. Therefore, the valid value range for
the length can be stricter limited, depending on the start position.
AS does not perform any checks at this point, you simply get
strange results at runtime!
The CALL and RTN instructions MCCAP also implements
are currently missing due to sufficient documentation. The same is
true for a set of pseudo instructions to store constants to memory.
Time may change this...
INC Rx LD R,Rx LD Rx,n
DEC Rx LD Rx,R LD Rx,Ry
ADD/ADC/SUB/SBC/AND/XOR/OR/CP A,Rx
Rx and Ry are synonyms for IXL, IXU, IYL
or IYU. Keep however in mind that in the case of LD
Rx,Ry, both registers must be part of the same index register.
SLIA R,(XY+d)
In this case, R is an arbitrary 8-bit register (excluding
index register halves...), and (XY+d) is a normal indexed
address. This operation has the additional effect of copying the
result into the register. This also works for the RES
and SET instructions:
SET/RES R,n,(XY+d)
Furthermore, two hidden I/O instructions exist:
IN (C) resp. TSTI
OUT (C),0
Their operation should be clear. CAUTION! Noone can guarantee
that all mask revisions of the Z80 execute these instructions, and
the Z80's successors will react with traps if they find one of these
instructions. Use them on your own risk...
DDIR LW
LD BC,12345678h ,
the necessary IW prefix will automatically be merged into
the previous instruction, resulting in
DDIR LW,IW
LD BC,12345668h .
The machine code that was first created for DDIR LW is
retracted and replaced, which is signified with an R in the
listing.
4.21. TLCS-900(L)
To allow AS to check against the correct limits, one has to inform
him about the current execution mode via the MAXMODE
instruction (see there). The default is the minimum mode.
ld wa',wa ,
AS will not recognize the comma for parameter separation. This
problem can be circumvented by usage of an inverse single quote (`),
for example
ld wa`,wa
Toshiba delivers an own assembler for the TLCS-900 series (TAS900),
which is different from AS in the following points:
Symbol Conventions
Syntax
Macro Processor
Output Format
Pseudo Instructions
EQU, DB, DW, ORG, ALIGN, END, TITLE, SAVE, RESTORE
The latter two have an extended functionality for AS. Some TAS900
pseudo instructions can be replaced with equivalent AS instructions
(see table 4.2).
TAS900
AS
meaning/function
DL <Data>
DD <Data>
define longword constants
DSB <number>
DB <number> DUP (?)
reserve bytes of memory
DSW <number>
DW <number> DUP (?)
reserve words of memory
DSD <number>
DD <number> DUP (?)
reserve longwords of memory
$MIN[IMUM]
MAXMODE OFF
following code runs
in minimum mode
$MAX[IMUM]
MAXMODE ON
following code runs
in maximum mode
$SYS[TEM]
SUPMODE ON
following code runs
in system mode
$NOR[MAL]
SUPMODE OFF
following code runs
in user mode
$NOLIST
LISTING OFF
turn off assembly listing
$LIST
LISTING ON
turn on assembly listing
$EJECT
NEWPAGE
start new page in listing
The instructions SUPMODE and MAXMODE are not
influenced, just as their initial setting OFF. The
programmer has to take care of the fact that the L version starts in
maximum mode and does not have a normal mode. However, AS shows a bit
of mercy against the L variant by suppressing warnings for privileged
instructions.
LD CF,A.7 ; accumulator bit 7 to carry
LD C,A.7 ; constant 'A.7' to accumulator
Therefore, errors in this code generator are quite possible (and will
of course be fixed if it should ever become possible!). At least the
few examples listed in [88] are
assembled correctly.
As it was already described in the discussion of the ASSUME instruction, AS can use the information about the current setting of the RBP register to detect accesses to privileged registers in user mode. This ability is of course limited to direct accesses (i.e. without using the registers IPA...IPC), and there is one more pitfall: as local registers (registers with a number >127) are addressed relative to the stack pointer, but the bits in RBP always refer to absolute numbers, the check is NOT done for local registers. An extension would require AS to know always the absolute value of SP, which would at least fail for recursive subroutines...
As it was already explained in the discussion of the ASSUME instruction, AS tries to hide the fact that the processor has more physical than logical RAM as far as possible. Please keep in mind that the DPP registers are valid only for data accesses and only have an influence on absolute addressing, neither on indirect nor on indexed addresses. AS cannot know which value the computed address may take at runtime... The paging unit unfortunately does not operate for code accesses so one has to work with explicit long or short CALLs, JMPs, or RETs. At least for the ''universal'' instructions CALL and JMP, AS will automatically use the shortest variant, but at least for the RET one should know where the call came from. JMPS and CALLS principally require to write segment and address separately, but AS is written in a way that it can split an address on its own, e.g. one can write
jmps 12345hinstead of
jmps 1,2345hUnfortunately, not all details of the chip's internal instruction pipeline are hidden: if CP (register bank address), SP (stack), or one of the paging registers are modified, their value is not available for the instruction immediately following. AS tries to detect such situations and will issue a warning in such cases. Once again, this mechanism only works for direct accesses.
Bits defined with the BIT instruction are internally stored as a 12-bit word, containing the address in bits 4..11 and the bit position in the four LSBs. This order allows to refer the next resp. previous bit by incrementing or decrementing the address. This will however not work for explicit bit specifications when a word boundary is crossed. For example, the following expression will result in a range check error:
bclr r5.15+1We need a BIT in this situation:
msb bit r5.15 . . bclr msb+1The SFR area was doubled for the 80C167/165/163: bit 12 flags that a bit lies in the second part. Siemens unfortunately did not foresee that 256 SFRs (128 of them bit addressable) would not suffice for successors of the 80C166. As a result, it would be impossible to reach the second SFR area from F000H..F1DFH with short addresses or bit instructions if the developers had not included a toggle instruction:
EXTR #nThis instruction has the effect that for the next n instructions (0<n<5), it is possible to address the alternate SFR space instead of the normal one. AS does not only generate the appropriate machine code when it encounters this instruction. It also sets an internal flag that will only allow accesses to the alternate SFR space for the next n instructions. Of course, they may not contain jumps... Of course, it is always possible to define bits from either area at any place, and it is always possible to reach all registers with absolute addresses. In contrast, short and bit addressing only works for one area at a time, attempts contradicting to this will result in an error message.
The situation is similar for prefix instructions and absolute resp. indirect addressing: as the prefix argument and the address expression cannot always be evaluated at assembly time, chances for checking are limited and AS will limit itself to warnings...in detail, the situation is as follows:
extp #7,#1 ; range from 112K..128K mov r0,1cdefh ; results in address 0defh in code mov r0,1cdefh ; -->warning exts #1,#1 ; range from 64K..128K mov r0,1cdefh ; results in address 0cdefh in code mov r0,1cdefh ; -->warning
Similar to the MCS-48 family, the PICs split their program memory into several banks because the opcode does not offer enough space for a complete address. AS uses the same automatism for the instructions CALL and GOTO, i.e. the PA bits in the status word are set according to the start and target address. However, this procedure is far more problematic compared to the 48's:
COMF, DECF, DECFSZ, INCF, INCFSZ, RLF, RRF, and SWAPFThe other operations by default regard W as an accumulator:
ADDWF, ANDWF, IORWF, MOVF, SUBWF, and XORWFThe syntax defined by Microchip to write literals is quite obscure and reminds of the syntax used on IBM 360/370 systems (greetings from the stone-age...). To avoid introducing another branch into the parser, with AS one has to write constants in the Motorola syntax (optionally Intel or C in RELAXED mode).
With two exceptions, the same hints are valid as for its two smaller brothers: the corresponding include file only contains register definitions, and the problems concerning jump instructions are much smaller. The only exception is the LCALL instruction, which allows a jump with a 16-bit address. It is translated with the following ''macro'':
MOVLW <addr15..8> MOWF 3 LCALL <addr0..7>
These processors have the ability to map their code ROM pagewise into the data area. I am not keen on repeating the whole discussion of the ASSUME instruction at this place, so I refer to the corresponding section (3.2.13) for an explanation how to read constants out of the code ROM without too much headache.
Some builtin ''macros'' show up when one analyzes the instruction set a bit more in detail. The instructions I found are listed in table 4.3 (there are probably even more...):
instruction | in reality |
---|---|
CLR A SLA A CLR addr NOP |
SUB A,A ADD A,A LDI addr,0 JRZ PC+1 |
Especially the last case is a bit astonishing...unfortunately, some instructions are really missing. For example, there is an AND instruction but no OR...not to speak of an XOR. For this reason, the include file STDDEF62.INC contains also some helping macros (additionally to register definitions).
The original assembler AST6 delivered by SGS-Thomson partially uses different pseudo instructions than AS. Apart from the fact that AS does not mark pseudo instructions with a leading dot, the following instructions are identical:
ASCII, ASCIZ, BLOCK, BYTE, END, ENDM, EQU, ERROR, MACRO, ORG, TITLE, WARNINGTable 4.4 shows the instructions which have AS counterparts with similar function.
AST6 | AS | meaning/function |
---|---|---|
.DISPLAY | MESSAGE | output message |
.EJECT | NEWPAGE | new page in assembly listing |
.ELSE | ELSEIF | conditional assembly |
.ENDC | ENDIF | conditional assembly |
.IFC | IF... | conditional assembly |
.INPUT | INCLUDE | insert include file |
.LIST | LISTING, MACEXP | settings for listing |
.PL | PAGE | page length of listing |
.ROMSIZE | CPU | set target processor |
.VERS | VERSION (symbol) | query version |
.SET | EVAL | redefine variables |
In [63], the .w postfix to signify 16-bit addresses is only defined for memory indirect operands. It is used to mark that a 16-bit address is stored at a zero page address. AS additionally allows this postfix for absolute addresses or displacements of indirect address expressions to force 16-bit displacements in spite of an 8-bit value (0..255).
The ST9's bit addressing capabilities are quite limited: except for the BTSET instruction, only bits within the current set of working registers are accessible. A bit address is therefore of the following style:
rn.[!]b ,whereby ! means an optional complement of a source operand. If a bit is defined symbolically, the bit's register number is stored in bits 7..4, the bit's position is stored in bits 3..1 and the optional complement is kept in bit 0. AS distinguishes explicit and symbolic bit addresses by the missing dot. A bit's symbolic name therefore must not contain a dot, thought it would be legal in respect to the general symbol name conventions. It is also valid to invert a symbolically referred bit:
bit2 bit r5.3 . . bld r0.0,!bit2This opportunity also allows to undo an inversion that was done at definition of the symbol.
The include file REGST9.INC defines the symbolic names of all on-chip registers and their associated bits. Keep however in mind that the bit definitions only work after previously setting the working register bank to the address of these peripheral registers!
In contrast to the definition file delivered with the AST9 assembler from SGS-Thomson, the names of peripheral register names are only defined as general registers (R...), not also as working registers (r...). The reason for this is that AS does not support register aliases; a tribute to assembly speed.
To be honest: I only implemented this processor in AS to quarrel about SGS-Thomson's peculiar behaviour. When I first read the 6804's data book, the ''incomplete'' instruction set and the built-in macros immediately reminded me of the ST62 series manufactured by the same company. A more thorough comparison of the opcodes gave surprising insights: A 6804 opcode can be generated by taking the equivalent ST62 opcode and mirroring all the bits! So Thomson obviously did a bit of processor core recycling...which would be all right if they would not try to hide this: different peripherals, motorola instead of Zilog-style syntax, and the awful detail of not mirroring operand fields in the opcode (e.g. bit fields containing displacements). The last item is also the reason that finally convinced me to support the 6804 in AS. I personally can only guess which department at Thomson did the copy...
In contrast to its ST62 counterpart, the include file for the 6804 does not contain instruction macros that help a bit to deal with the limited machine instruction set. This is left as an exercise to the reader!
It seems that every semiconductor's ambition is to invent an own notation for hexadecimal numbers. Texas Instrument took an especially eccentric approach for these processors: a > sign as prefix! The support of such a format in AS would have lead to extreme conflicts with AS's compare and shift operators. I therefore decided to use the Intel notation, which is what TI also uses for the 340x0 series and the 3201x's successors...
The instruction word of these processors unfortunately does not have enough bits to store all 8 bits for direct addressing. This is why the data address space is split into two banks of 128 words. AS principally regards the data address space as a linear segment of 256 words and automatically clears bit 7 on direct accesses (an exception is the SST instruction that can only write to the upper bank). The programmer has to take care that the bank flag always has the correct value!
Another hint that is well hidden in the data book: The SUBC instruction internally needs more than one clock for completion, but the control unit already continues to execute the next instruction. An instruction following SUBC therefore may not access the accumulator. AS does not check for such conditions!
4.35. TMS320C2x
BSS, STRING, RSTRING, BYTE, WORD , LONGIf one needs a typed label in front of one of these instructions, one can work around this by placing the label in a separate line just before the pseudo instruction itself. On the other hand, it is possible to place an untyped label in front of another pseudo instruction by defining the label with EQU, e.g.
FLOAT, DOUBLE, EFLOAT, BFLOAT and TFLOAT
<name> EQU $ .
The syntax detail that created the biggest amount of headache for me while implementing this processor family is the splitting of parallel instructions into two separate source code lines. Fortunately, both instructions of such a construct are also valid single instructions. AS therefore first generates the code for the first instruction and replaces it by the parallel machine code when a parallel construct is encountered in the second line. This operation can be noticed in the assembly listing by the machine code address that does not advance and the double dot replaced with a R.
Compared to the TI assembler, AS is not as flexible regarding the position of the double lines that signify a parallel operation (||): One either has to place them like a label (starting in the first column) or to prepend them to the second mnemonic. The line parser of AS will run into trouble if you do something else...
Similar to most older TI microprocessor families, TI used an own format for hexadecimal and binary constants. AS instead favours the Intel syntax which is also common for newer processor designs from TI.
The TI syntax for registers allows to use a simple integer number between 0 and 15 instead of a real name (Rx or WRx). This has two consequences:
This processor family belongs to the older families developed by TI and therefore TI's assemblers use their proprietary syntax for hexadecimal resp. binary constants (a prefixed < resp. ? character). As this format could not be realized for AS, the Intel syntax is used by default. This is the format TI to which also switched over when introducing the successors, of this family, the 370 series of microcontrollers. Upon a closer inspection of both's machine instruction set, one discovers that about 80% of all instruction are binary upward compatible, and that also the assembly syntax is almost identical - but unfortunately only almost. TI also took the chance to make the syntax more orthogonal and simple. I tried to introduce the majority of these changes also into the 7000's instruction set:
Though these processors do not have specialized instructions for bit manipulation, the assembler creates (with the help of the DBIT instruction - see there) the illusion as if single bits were addressable. To achieve this, the DBIT instructions stores an address along with a bit position into an integer symbol which may then be used as an argument to the pseudo instructions SBIT0, SBIT1, CMPBIT, JBIT0, and JBIT1. These are translated into the instructions OR, AND, XOR, BTJZ, and BTJO with an appropriate bit mask.
There is nothing magic about these bit symbols, they are simple integer values that contain the address in their lower and the bit position in their upper half. One could construct bit symbols without the DBIT instruction, like this:
defbit macro name,bit,addr name equ addr+(bit<<16) endmbut this technique would not lead to the EQU-style syntax defined by TI (the symbol to be defined replaces the label field in a line). CAUTION! Though DBIT allows an arbitrary address, the pseudo instructions can only operate with addresses either in the range from 0..255 or 1000h..10ffh. The processor does not have an absolute addressing mode for other memory ranges...
The MSP was designed to be a RISC processor with a minimal power
consumption. The set of machine instructions was therefore reduced to
the absolute minimum (RISC processors do not have a microcode ROM so
every additional instruction has to be implemented with additional
silicon that increases power consumption). A number of instructions
that are hardwired for other processors are therefore emulated with
other instructions. For AS, these instructions are defined in the
include file REGMSP.INC. You will get error messages for
more than half of the instructions defined by TI if you forget to
include this file!
National unfortunately also decided to use the syntax well known from
IBM mainframes (and much hated by me..) to write non-decimal integer
constants. Just like with other processors, this does not work with
AS's parser. ASMCOP however fortunately also seems to allow the C
syntax, which is why this became the default for the COP series and
the SC/MP...
Similar to other processors, the assembly language of the 75 series
also knows pseudo bit operands, i.e. it is possible to assign a
combination of address and bit number to a symbol that can then be
used as an argument for bit oriented instructions just like explicit
expressions. The following three instructions for example generate
the same code:
The storage format of bit symbols mostly accepts the binary coding in
the machine instructions themselves: 16 bits are used, and there is a
''long'' and a ''short'' format. The short format can store the
following variants:
NEC uses different ways to mark absolute addressing in its data
books:
Both the 7720 and 7725 are provided by the same code generator and
are extremely similar in their instruction set. One should however
not beleive that they are binary compatible: To get space for the
longer address fields and additional instructions, the bit positions
of some fields in the instruction word have changed, and the
instruction length has changed from 23 to 24 bits. The code format
therefore uses different header ids for both CPUs.
They both have in common that in addition to the code and data
segment, there is also a ROM for storage of constants. In the case of
AS, it is mapped onto the ROMDATA segment!
In this chapter, the formats of files AS generates shall be explained
whose formats are not self-explanatory.
The format for code files generated by the assembler must be able to
separate code parts that were generated for different target
processors; therefore, it is a bit different from most other formats.
Though the assembler package contains tools to deal with code files,
I think is a question of good style to describe the format in short:
If a code file contains multibyte values, they are stored in little
endian order. This rule is already valid for the 16-bit magic word
$1489, i.e. every code file starts with the byte sequence $89/$14.
This magic word is followed by an arbitrary number of ''records''. A
record may either contain a continuous piece of the code or certain
additional information. Even without switching to different processor
types, a file may contain several code-containing records, in case
that code or constant data areas are interrupted by reserved memory
areas that should not be initialized. This way, the assembler tries
to keep the file as short as possible.
Common to all records is a header byte which defines the record's
type and its contents. Written in a PASCALish way, the record
structure can be described in the following way:
A record with a header byte of $81 is a record that may contain code
or data from arbitrary segments. The first byte (Header)
describes the processor family the following code resp. data was
generated for (see table 5.1).
The Segment field signifies the address space the following
code belongs to. The assignment defined in table 5.2 applies.
The Gran field describes the code's ''granularity'', i.e.
the size of the smallest addressable unit in the following set of
data. This value is a function of processor type and segment and is
an important parameter for the interpretation of the following two
fields that describe the block's start address and its length: While
the start address refers to the granularity, the Length
value is always expressed in bytes! For example, if the start address
is $300 and the length is 12, the resulting end address would be $30b
for a granularity of 1, however $303 for a granularity of 4!
Granularities that differ from 1 are rare and mostly appear in DSP
CPU's that are not designed for byte processing. For example, a
DSP56K's address space is organized in 64 Kwords of 16 bits. The
resulting storage capacity is 128 Kbytes, however it is organized as
2 16 words that are addressed with addresses
0,1,2,...65535!
The start address is always 32 bits in size, independent of the
processor family. In contrast, the length specification has only 16
bits, i.e. a record may have a maximum length of 4+4+2+(64K-1) =
65545 bytes.
Data records with a Header ranging from $01 to $7f present a shortcut
and preserve backward compatibility to earlier definitions of the
file format: in their case, the Header directly defines the processor
type, the target segment is fixed to CODE and the
granularity is implicitly given by the processor type, rounded up to
the next power of two. AS prefers to use these records whenever data
or code should go into the CODE segment.
A record with a Header of $80 defines an entry point, i.e. the
address where execution of the program should start. Such a record is
the result of an END statement with a corresponding address
as argument.
The last record in a file bears the Header $00 and has only a string
as data field. This string does not have an explicit length
specification; its end is equal to the file's end. The string
contains only the name of the program that created the file and has
no further meaning.
Debug files may optionally be generated by AS. They deliver important
information for tools used after assembly, like disassemblers or
debuggers. AS can generate debug files in one of three formats: On
the one hand, the object format used by the AVR tools from Atmel
respectively a NoICE-compatible command file, and on the other hand
an own format. The first two are described in detail in [4] resp. the NoICE documentations, which is
why the following description limits itself to the AS-specific MAP
format:
The information in a MAP file is split into three groups:
The first field is the symbol's name, possibly extended by a section
number enclosed in brackets. Such a section number limits the range
of validity for a symbol. The second field designates the symbol's
type: Int stands for integer values, Float for
floating point numbers, and String for character arrays. The
third field finally contains the symbol's value. If the symbol
contains a string, it is necessary to use a special encoding for
control characters and spaces. Without such a coding, spaces in a
string could be misinterpreted as delimiters to the next field. AS
uses the same syntax that is also valid for assembly source files:
Instead of the character, its ASCII value with a leading backslash
(\) is inserted. For example, the string
The fourth field specifies - if available - the size of the data
structure placed at the address given by the symbol. A debugger may
use this information to automatically display variables in their
correct length when they are referred symbolically. In case AS does
not have any information about the symbol size, this field simply
contains the value -1.
Finally,the fifth field states via the values 0 or 1 if the symbol
has been used during assembly. A program that reads the symbol table
can use this field to skip unused symbols as they are probably unused
during the following debugging/disassembly session.
The third section in a debug file describes the program's sections in
detail. The need for such a detailed description arises from the
sections' ability to limit the validity range of symbols. A symbolic
debugger for example cannot use certain symbols for a reverse
translation, depending on the current PC value. It may also have to
regard priorities for symbol usage when a value is represented by
more than one symbol. The definition of a section starts with a line
of the following form:
Program parts that lie out of any section are not listed separately.
This implicit ''root section'' carries the number -1 and is also used
as parent section for sections that do not have a real parent
section.
It is possible that the file contains empty lines or comments (semi
colon at line start). A program reading the file has to ignore such
lines.
To simplify the work with the assembler's code format a bit, I added
some tools to aid processing of code files. These programs are
released under the same license terms as stated in section 1.1!
Common to all programs are the possible return codes they may deliver
upon completion (see table 6.1).
Just like AS, all programs take their input from STDIN and write
messages to STDOUT (resp. error messages to STDERR). Therefore, input
and output redirections should not be a problem.
In case that numeric or address specifications have to be given in
the command line, they may also be written in hexadecimal notation
when they are prefixed with a dollar sign (e.g. $10 instead of 16).
Unix shells however assign a special meaning to the dollar sign,
which makes it necessary to escape a dollar sign with a backslash.
Otherwise, calling conventions and variations are equivalent to those
of AS (except for PLIST and AS2MSG); i.e. it is possible to store
frequently used parameters in an environment variable (whose name is
constructed by appending CMD to the program's name, i.e.
BINDCMD for BIND), to negate options, and to use all upper-
resp. lower-case writing (for details on this, see section 2.4).
Address specifications always relate to the granularity of the
processor currently in question; for example, on a PIC, an address
difference of 1 means a word and not a byte.
PLIST is the simplest one of the five programs supplied: its purpose
is simply to list all records that are stored in a code file. As the
program does not do very much, calling is quite simple:
CAUTION! At this place, no wildcards are allowed! If there is
a necessity to list several files with one command, use the following
''mini batch'':
Finally, PLIST will print a copyright remark (if there is one in the
file), together with a summaric code length.
Simply said, PLIST is a sort of DIR for code files. One can use it to
examine a file's contents before one continues to process it.
BIND is a program that allows to concatenate the records of several
code files into a single file. A filter function is available that
can be used to copy only records of certain types. Used in this way,
BIND can also be used to split a code file into several files.
The general syntax of BIND is
Currently, BIND defines only one command line option:
P2HEX is an extension of BIND. It has all command line options of
BIND and uses the same conventions for file names. In contrary to
BIND, the target file is written as a Hex file, i.e. as a sequence of
lines which represent the code as ASCII hex numbers.
P2HEX knows 8 different target formats, which can be selected via the
command line parameter F:
For the PIC microcontrollers, the switch
Apart form this filter function, P2HEX also supports an address
filter, which is useful to split the code into several parts (e.g.
for a set of EPROMs):
CAUTION! This type of splitting does not change the absolute
addresses that will be written into the files! If the addresses in
the individual hex files should rather start at 0, one can force this
with the additional switch
By using an offset, it is possible to move a file's contents to an
arbitrary position. This offset is simply appended to a file's name,
surrounded with parentheses. For example, if the code in a file
starts at address 0 and you want to move it to address 1000 hex in
the hex file, append ($1000) to the file's name (without
spaces!).
As the TI DSK format has the ability to distinguish between data and
code, there is a switch
Unfortunately, one finds different statements about the last line of
an Intel-Hex file in literature. Therefore, P2HEX knows three
different variants that may be selected via the command-line
parameter i and an additional number:
If the target file name does not have an extension, an extension
of HEX is supposed.
By default, P2HEX will print a maximum of 16 data bytes per line,
just as most other tools that output Hex files. If you want to change
this, you may use the switch
In most cases, the temporary code files generated by AS are not of
any further need after P2HEX has been run. The command line option
In contrast to BIND, P2HEX will not produce an empty target file if
only one file name (i.e. the target name) has been given. Instead,
P2HEX will use the corresponding code file. Therefore, a minimal call
in the style of
P2BIN works similar to P2HEX and offers the same options (except for
the a and i options that do not make sense for binary files),
however, the result is stored as a simple binary file instead of a
hex file. Such a file is for example suitable for programming an
EPROM.
P2BIN knows three additional options to influence the resulting
binary file:
In case the code file does not contain an entry address, one may set
it via the -e command line option just like with P2HEX. Upon
request, P2BIN prepends the resulting image with this address. The
command line option
AS2MSG is not a tool in the real sense, it is a filter that was
designed to simplify the work with the assembler for (fortunate)
users of Borland Pascal 7.0. The DOS IDEs feature a 'tools' menu that
can be extended with own programs like AS. The filter allows to
directly display the error messages paired with a line specification
delivered by AS in the editor window. A new entry has to be added to
the tools menu to achieve this (Options/Tools/New). Enter the
following values:
I assume that AS and AS2MSG are located in a directory listed in
the PATH variable. After pressing the appropriate hotkey (or
selecting AS from the tools menu), as will be called with the name of
the file loaded in the active editor window as parameter. The error
messages generated during assembly are redirected to a special window
that allows to browse through the errors. Ctrl-Enter jumps
to an erroneous line. The window additionally contains the statistics
AS prints at the end of an assembly. These lines obtain the dummy
line number 1.
TURBO.EXE (Real Mode) and BP.EXE (Protected Mode)
may be used for this way of working with AS. I recommend however BP,
as this version does not have to 'swap' half of the DOS memory before
before AS is called.
Here is a list of all error messages emitted by AS. Each error
message is described by:
The following error messages are generated not only by AS, but also
by the auxiliary programs, like PLIST, BIND, P2HEX, and P2BIN. Only
the most probable error messages are here explained. Should you meet
an undocumented error message, then you probably met a program bug!
Please inform us immediately about this!!
In this chapter, I tried to collect some questions that arise very
often together with their answers. Answers to the problems presented
in this chapter might also be found at other places in this manual,
but one maybe does not find them immediately...
This appendix is designed as a quick reference to look up all pseudo
instructions provided by AS. The list is ordered in two parts: The
first part lists the instructions that are always available, and this
list is followed by lists that enumerate the instructions
additionally available for a certain processor family.
To be exact, boolean symbols are just ordinary integer symbols with
the difference that AS will assign only two different values to them
(0 or 1, corresponding to False or True). AS does not store special
symbols in the symbol table. For performance reasons, they are
realized with hardcoded comparisons directly in the parser. They
therefore do not show up in the assembly listing's symbol table.
Predefined symbols are only set once at the beginning of a pass. The
values of dynamic symbols may in contrast change during assembly as
they reflect settings made with related pseudo instructions. The
values added in parentheses give the value present at the beginning
of a pass.
The names given in this table also reflect the valid way to reference
these symbols in case-sensitive mode.
The names listed here should be avoided for own symbols; either one
can define but not access them (special symbols), or one will receive
an error message due to a double-defined symbol. The ugliest case is
when the redefinition of a symbol made by AS at the beginning of a
pass leads to a phase error and an infinite loop...
The distribution of AS contains a couple of include files. Apart from
include files that only refer to a specific processor family (and
whose function should be immediately clear to someone who works with
this family), there are a few processor-independent files which
include useful functions. The functions defined in these files shall
be explained briefly in the following sections:
This file defines a couple of bit-oriented functions that might be
hardwired for other assemblers. In the case of AS however, thaey are
implemented with the help of user-defined functions:
This include file is similar to the C include file ctype.h
which offers functions to classify characters. All functions deliver
either TRUE or FALSE:
If one decides to rewrite a chapter that has been out of date for two
years, it is almost unavoidable that one forgets to mention some of
the good ghosts who contributed to the success this project had up to
now. The first ''thank you'' therefore goes to the people whose names
I unwillingly forgot in the following enumeration!
The concept of AS as a universal cross assembler came from Bernhard
(C.) Zschocke who needed a ''student friendly'', i.e. free cross
assembler for his microprocessor course and talked me into extending
an already existing 68000 assembler. The rest is history... The
microprocessor course held at RWTH Aachen also always provided the
most engaged users (and bug-searchers) of new AS features and
therefore contributed a lot to today's quality of AS.
The internet and FTP have proved to be a big help for spreading AS
and reporting of bugs. My thanks therefore go to the FTP admins
(Bernd Casimir in Stuttgart, Norbert Breidor in Aachen, and
Jürgen Meißburger in Jülich). Especially the last one
personally engaged a lot to establish a practicable way in
Jülich.
As we are just talking about the ZAM: Though Wolfgang E. Nagel is not
personally involved into AS, he is at least my boss and always puts
at least four eyes on what I am doing. Regarding AS, there seems to
be at least one that smiles...
A program like AS cannot be done without appropriate data books and
documentation. I received information from an enormous amount of
people, ranging from tips up to complete data books. An enumeration
follows (as stated before, without guarantee for completelessness!):
Ernst Ahlers, Charles Altmann, Rolf Buchholz, Bernd Casimir, Gunther
Ewald, Stephan Hruschka, Peter Kliegelhöfer, Ulf Meinke,
Matthias Paul, Norbert Rosch, Steffen Schmid, Leonhard Schneider,
Michael Schwingen, Oliver Sellke, Christian Stelter, Oliver Thamm,
Thorsten Thiele.
...and an ironic ''thank you'' to Rolf-Dieter-Klein and Tobias Thiel
who demonstrated with their ASM68K how one should not do it
and thereby indirectly gave me the impulse to write something better!
I did not entirely write AS on my own. AS contains the OverXMS
routines from Wilbert van Leijen which can move the overlay modules
into the extended memory. A really nice library, easy to use without
problems!
The TMS320C2x/5x code generators and the file STDDEF2x.INC
come from Thomas Sailer, ETH Zurich. It's surprising, he only needed
one weekend to understand my coding and to implement the new code
generator. Either that was a long nightshift or I am slowly getting
old...
As I already mentioned in the introduction, I release the source code
of AS on request. the following shall give a few hints to their
usage.
AS has been implemented in Turbo-Pascal. ''Ouch'', I hear the C
freaks crying, ''such a thing is easier done in C, and it would be
universally portable!''. Welllll, if things were that simple...AS is
a project I have been working on for several years now, and the
operating system world of a PC user at that time consisted of DOS,
DOS, and again - DOS. My experience that I had to juggle with 4
floppies to use Turbo-C and that Turbo-Pascal still fitted on one was
still very alive. The jungle of memory models offered by a DOS C
compiler also did not make the language more attractive. Pascal was
furthermore the language I knew best, and it was the language that
gave me the highest productivity. I had tried C a few times, and it
gave me an impression of chaos and archaism:
An important hint for users of version 7.0 of Turbo/Borland Pascal:
As one should know, this version has some bugs, which was also the
reason for Borland having to release a version 7.01. AS is affected
by the problem because it partially uses longint shifts of more than
16 bits; the version 7.0 contains a faulty 386 optimization for this
case. If you did not want to spend the money for the update (like me)
but have the runtime library's sources, you can fix the bug on you
own by replacing the erroneous routines in LONG.ASM, like
this:
Programs that have a size like AS necessarily have to be split into
several modules, not only to get things better structured, but also
to surpass DOS's eternal 64Kbytes limit. In detail, AS consists of
the following modules:
This is an include file and not a module in the word's real sense. It
contains the unavoidable compiler switches that also unavoidably vary
from one platform to another.
This file contains the main module of AS and therefore has to be
entered as ''Primary File'' in the IDE. It contains the overall
control mechanisms for the individual passes, routines to read from
the source file(s), and parts of the macro processor. This part is
independent of the target processor.
This module only contains declarations of commonly used constants and
global variables.
This is the place where some frequently used subroutines are
collected. They mainly cover string manipulation and error handling.
This is the place where you really go into AS's guts: This module
stores the symbol tables (global and local symbols) in two binary
trees. It furthermore contains a rather large procedure
EvalExpression that analyzes and evaluates a formula expression.
The procedure delivers the result (integer, floating point or string)
as a variable record. One should however instead use the
functions EvalIntExpression, EvalFloatExpression,
and EvalStringExpression for code generation. No
modifications at this place are necessary to add a new processor.
Changes at this place should generally done with great caution, as
you are working at the base of AS!
This module contains several routines to store and retrieve macros.
The real macro processor is part of AS.PAS!!
All routines that control conditional assembly are grouped in this
module. The only exported variable of importance is the flag
IfAsm that signifies whether code generation is currently turned
on or off.
This module contains the housekeeping necessary for the code output
file. It exports an interface that allows to open and close a code
file and to write code to the file (resp. to retract code already
written). An important task of this module is to buffer the write
operation. This buffering increases the speed of output by writing
the code in larger chunks.
This module deals with pseudo instructions defined for all target
processors, like EQU or ORG. The CPU
instruction that switches among different target processors is also
located in this module.
You will find at this place pseudo instructions that are used by a
subset of code generators. On the one hand, this is the Intel group
of DB..DT, and on the other hand their counterparts for 8/16
bit CPUs from Motorola or Rockwell. Someone who wants to extend AS by
a processor fitting into one of these groups can get the biggest part
of the necessary pseudo instructions with one call to this module.
This module implements the mechanism of command line parameters. It
needs a specification of all valid parameters, parses the command
line and makes calls to the appropriate callbacks. The mechanism will
in detail do the following:
This is by far the shortest module and exists only for one purpose
(which may make it also useful for other programs): it extends Pascal
by the ability known from C to write to other channels than the
standard output:
This module deals with an issue Borland and other programmers usually
pay very little attention to: the national language support DOS
offers starting with version 2.0. This unit delivers information
about things that are different from one country to another:
This is only a small ''hack'' to supply routines that deal with
linear lists of strings, something needed e.g. in the macro processor
of AS.
This is the place where some very common string manipulations are
located.
This module defines a data type that can store a list of address
ranges. AS uses this function to create usage lists, and P2BIN and
P2HEX use it to warn about overlaps in memory.
This module defines the list structure that enables AS to print a
list about which include files have been included together with their
nesting.
AS tokenizens file names in many situations, i.e. they are stored as
numerical codes instead of their full name. This module keeps lists
that allow a translation between file names and tokens.
Last but not least, these files contain the code generators for the
different target processors. They are built according to the same
scheme (but obviously with different contents!) which is explained in
the following section.
The probably most common reason for a modification of AS's source
code is probably the extension to a new target processor. The way to
do this has dramatically changed with version 1.39p5: it has become a
lot more indirect but it has the advantage that there is only
one place in the standard modules left that has to be modified.
The method heavily relies on indirect references and procedural
variables, so a bit more advanced knowledge about (Turbo) Pascal is
required. Nevertheless, it should not bear any principal problems. I
will describe the steps necessary to introduce a new target CPU in a
cookbook style:
The name chosen for the new processor has to fulfill two criterias:
The unit's name that shall be responsible for the new processor
should bear at least some similarity to the processor's name (just
for the sake of uniformity) and should be named in the style of
CODExxxx. The unit head with compiler switches and Uses
statements is best taken from another existing code generator.
The unit neither has to export variable nor procedures or functions
as the complete communication is done at runtime via indirect calls.
The initializations necessary for this have to be done in the unit's
initialization part. They are simply done by a call to the
function AddCPU for each processor type that shall be
treated by this unit:
The switcher's task is to ''reorganize'' AS for the new target
processor. This is done by changing the values of several global
variables:
Apart from these variables, there are three procedure variables to be
set that form the link form AS to the ''active'' parts of the code
generator:
If needed, the unit may also use its initialization part to hook into
a list of procedures that are called prior to each pass of assembly.
Such a need for example arises when the module's code generation
depends on certain flags that can be modified via pseudo
instructions. An example is a processor that can operate in either
user or supervisor mode. In user mode, some instructions are
disabled. The flag that tells AS whether the following code executes
in user or supervisor mode might be set via a special pseudo
instruction. But there must also be an initialization that assures
that all passes start with the same state. The hook offered via
InitPassProc offers a chance to do such initializations. The
principle is similar to the redirection of an interrupt vector: the
unit saves the old value prior to pointing the procedure variable to
its own routine (the routine must be parameter-less and FAR
coded). The new routine first calls the old chain of procedures and
afterwards does its own operations.
From time to time, a processor architecture provides symbols that do
not have to be defined explicitly. An example for this is the TMS370
series that allows to reference the first 256 memory addresses as
''registers'' R0..R255 resp. R0FF. AS provides the
procedure variable InternSymbol that may be set by the code
generator. The parser calls this procedure directly after the check
for constants whenever an expression has to be analyzed. The routine
has to store the result in a record of type TempResult
passed by reference when an internal symbol has been detected. The
record's Type field has to carry the result type
(TempInt for integral values, TempFloat for
floating point values, or TempString for strings) and the
result itself has to be stored in the respective fields
Int, Float, or Ascii. If the check was without
success, simply set the Type field to TempNone.
Errors messages from this routine should be avoided as unidentified
names could signify ordinary symbols (the parser will check this
afterwards). Be extreme careful with this routine as it allows you to
intervene into the parser's heart!
By the way: People who want to become immortal may add a copyright
string. This is done by adding a call to the procedure
AddCopyright in the unit's initialization part (right next to
the AddCPU calls):
While it may be difficult to write a code generator that is formally
correct, its addition to AS becomes trivial: The only thing that has
to be done is to add the unit to the USES list of the main
module AS.PAS. The definition as resident or overlay module
is a decision between speed and free memory.
Now we finally reached the point where your creativity is challenged:
It is up to you how you manage to translate mnemonic and parameters
into a sequence of machine code. The symbol tables are of course
accessible (via the formula parser) just like everything exported
from ASMSUB. Some general rules (take them as advises and
not as laws...):
PLIST's array of code headers has to be expanded to enable it to list
code files containing code for the new processor.
In case one of the segments should have a granularity other than one
byte per address (see the discussion of the Gran variable above), the
function Granularity in TOOLS.PAS has to be
extended. On further has to decide which hex output format should be
used by default. Without an extension of the CASE statement
in ProcessFile, P2HEX will terminate with an error message if no hex
format is specified explicitly in the command line. P2HEX up to now
knows how to create Motorola S-Records (up to 32 bits), Intel-,
Tektronix-, and MOS hex.
It is necessary to modify all string constants AS uses in case one
wants to adapt AS to a different language. These strings are
collected in RSC files to make such a translation
simpler. IOERRORS.RSC contains all I/O error messages and is
used by AS and the tools. TOOLS.RSC contains string
constants that are used by all tools. Otherwise, every program has
its own RSC file. You do not have to care about RSC
files as long as you do not intend to translate as to another
language.
4.40. MSP430
4.41. COP8 & SC/MP
4.42. 75K0
ADM sfr 0fd8h
SOC bit ADM.3
skt 0fd8h.3
skt ADM.3
skt SOC
AS distinguishes direct and symbolic bit accesses by the missing dot
in symbolic names; it is therefore forbidden to use dots in symbol
names to avoid misunderstandings in the parser.
The upper byte is set to 0, the lower byte contains the bit
expression coded according to [56]. The
long format in contrast only knows direct addressing, but it can
cover the whole address space (given a correct setting of MBS and
MBE). A long expression stores bits 0..7 of the address in the lower
byte, the bit position in bits 8 and 9, and a constant value of 01 in
bits 10 and 11. The highest bits allow to distinguish easily between
long and short addresses via a check if the upper byte is 0. Bits
12..15 contain bits 8..11 of the address; they are not needed to
generate the code, but they have to be stored somewhere as the check
for correct banking can only take place when the symbol is actually
used.
4.43. 78K0
Under AS, these prefixes are only necessary if one wants to force a
certain addressing mode and the instruction allows different
variants. Without a prefix, AS will automatically select the shortest
variant. It should therefore rarely be necessary to use a prefix in
practice.
5.1. Code Files
FileRecord = RECORD CASE Header:Byte OF
$00:(Creator:ARRAY[] OF Char);
$01..
$7f:(StartAdr : LongInt;
Length : Word;
Data : ARRAY[0..Length-1] OF Byte);
$80:(EntryPoint:LongInt);
$81:(Header : Byte;
Segment : Byte;
Gran : Byte;
StartAdr : LongInt;
Length : Word;
Data : ARRAY[0..Length-1] OF Byte);
END
This description does not express fully that the length of data
fields is variable and depends on the value of the Length
entries.
header
Headerfamily
Familieheader
Headerfamily
Familie
$01
$05
$11
$13
$19
$25
$2a
$32
$39
$3b
$3f
$42
$48
$4a
$51
$53
$55
$61
$63
$65
$68
$6c
$6f
$71
$73
$75
$77
$79
$7b
$7d
$7f680x0, 6833x
PowerPC
65xx/MELPS-740
M16
65816/MELPS-7700
SYM53C8xx
i960
ST9
MCS-96/196/296
AVR
4004/4040
8086..V35
TMS9900
MSP430
Z80/180/380
TLCS-90
TLCS-47
6800, 6301 oder 6811
6809
68HC16
H8/300(H)
SH7000
COP8
PIC16C5x
TMS-7000
TMS320C2x
TMS320C5x
Z8
75K0
µPD7720
µPD77230$03
$09
$12
$14
$21
$29
$31
$33
$3a
$3c
$41
$47
$49
$4c
$52
$54
$56
$62
$64
$66
$69
$6e
$70
$72
$74
$76
$78
$7a
$7c
$7e
M*Core
DSP56xxx
MELPS-4500
M16C
MCS-48
29xxx
MCS-51
ST7
8X30x
XA
8080/8085
TMS320C6x
TMS370xxx
80C166/167
TLCS-900
TLCS-870
TLCS-9000
6805/HC08
6804
68HC12
H8/500
SC/MP
PIC16C8x
PIC17C4x
TMS3201x
TMS320C3x
ST6
µPD78(C)10
78K0
µPD7725
number
segment
number
segment
$00
$02
$04
$06
$08<undefined>
DATA
XDATA
BDATA
REG$01
$03
$05
$07
$09 CODE
IDATA
YDATA
IO
ROMDATA5.2. Debug Files
The second item is listed first in the file. A single entry in this
list consists of two numbers that are separated by a :
character:
<line number>:<address>
Such an entry states that the machine code generated for the source
statement in a certain line is stored at the mentioned address
(written in hexadecimal notation). With such an information, a
debugger can display the corresponding source lines while stepping
through a program. As a program may consist of several include files,
and due to the fact that a lot of processors have more than one
address space (though admittedly only one of them is used to store
executable code), the entries described above have to be sorted. AS
does this sorting in two levels: The primary sorting criteria is the
target segment, and the entries in one of these sections are sorted
according to files. The sections resp. subsections are separated by
special lines in the style of
Segment <segment name>
resp.
File <file name> .
The source line info is followed by the symbol table. Similar to the
source line info, the symbol table is primarily sorted by the
segments individual symbols are assigned to. In contrast to the
source line info, an additional section NOTHING exists which
contains the symbols that are not assigned to any specific segment
(e.g. symbols that have been defined with a simple EQU
statement). A section in the symbol table is started with a line of
the following type:
Symbols in Segment <segment name>
The symbols in a section are sorted according to the alphabetical
order of their names, and one symbol entry consists of exactly one
line. Such a line consists of 5 fields witch are separated by at
least a single space:
This is a test
becomes
This\032is\032\a\032test .
The numerical value always has three digits and has to be interpreted
as a decimal value. Naturally, the backslash itself also has to be
coded this way.
Info for Section nn ssss pp
nn specifies the section's number (the number that is also
used in the symbol table as a postfix for symbol names),
ssss gives its name and pp the number of its parent
section. The last information is needed by a retranslator to step
upward through a tree of sections until a fitting symbol is found.
This first line is followed by a number of further lines that
describe the code areas used by this section. Every single entry
(exactly one entry per line) either describes a single address or an
address range given by a lower and an upper bound (separation of
lower and upper bound by a minus sign). These bounds are
''inclusive'', i.e. the bounds themselves also belong to the area. Is
is important to note that an area belonging to a section is not
additionally listed for the section's parent sections (an exception
is of course a deliberate multiple allocation of address areas, but
you would not do this, would you?). On the one hand, this allows an
optimized storage of memory areas during assembly. On the other hand,
this should not be an obstacle for symbol backtranslation as the
single entry already gives an unambiguous entry point for the symbol
search path. The description of a section is ended by an empty line
or the end of the debug file.
6. Utility Programs
return code
error condition
0
1
2
3no errors
error in command line parameters
I/O error
file format error
PLIST <file name>
The file name will automatically be extended with the extension
P if it doesn't already have one.
for %n in (*.p) do plist %n
PLIST prints the code file's contents in a table style, whereby
exactly one line will be printed per record. The individual rows have
the following meanings:
All outputs are in hexadecimal notation.
BIND <source file(s)> <target file> [options]
Just like AS, BIND regards all command line arguments that do not
start with a +, - or / as file specifications, of
which the last one must designate the destination file. All other
file specifications name sources, which may again contain wildcards.
For example, to filter all MCS-51 code out of a code file, use BIND
in the following way:
BIND <source name> <target name> -f $31
If a file name misses an extension, the extension P will be
added automatically.
If no target format is explicitly specified, P2HEX will automatically
choose one depending in the processor type: S-Records for Motorola
CPUs, Hitachi, and TLCS-900, MOS for 65xx/MELPS, DSK for the 16 bit
signal processors from Texas, Atmel Generic for the AVRs, and Intel
Hex for the rest. Depending on the start addresses width, the
S-Record format will use Records of type 1, 2, or 3, however, records
in one group will always be of the same type. The Intel, MOS and
Tektronix formats are limited to 16 bit addresses, the 16-bit Intel
format reaches 4 bits further. Addresses that are to long for a given
format will be reported by P2HEX with a warning; afterwards, they
will be truncated (!).
-m <0..3>
allows to generate the three different variants of the Intel Hex
format. Format 0 is INHX8M which contains all bytes in a Lo-Hi-Order.
Addresses become double as large because the PICs have a
word-oriented address space that increments addresses only by one per
word. This format is also the default. With Format 1 (INHX16M), bytes
are stored in their natural order. This is the format Microchip uses
for its own programming devices. Format 2 (INHX8L) resp. 3 (INHX8H)
split words into their lower resp. upper bytes. With these formats,
P2HEX has to be called twice to get the complete information, like in
the following example:
p2hex test -m 2
rename test.hex test.obl
p2hex test -m 3
rename test.hex test.obh
For the Motorola format, P2HEX additionally uses the S5 record type
mentioned in [6]. This record contains the
number of data records (S1/S2/S3) to follow. As some programs might
not know how to deal with this record, one can suppress it with the
option
+5 .
In case a source file contains code record for different processors,
the different hex formats will also show up in the target file - it
is therefore strongly advisable to use the filter function.
-r <start address>-<end address>
The start address is the first address in the window, and the end
address is the last address in the window, not the first
address that is out of the window. For example, to split an 8051
program into 4 2764 EPROMs, use the following commands:
p2hex <source file> eprom1 -f $31 -r $0000-$1fff
p2hex <source file> eprom2 -f $31 -r $2000-$3fff
p2hex <source file> eprom3 -f $31 -r $4000-$5fff
p2hex <source file> eprom4 -f $31 -r $6000-$7fff
By default, the address window is 32 Kbytes large and starts at
address 0.
-a .
A special value for start and stop address arguments is a single
dollar sign ($). This stands for the very first resp. last
address that has been used in the code file. So, if you want to be
sure that always the whole program is stored in the hex file, set the
address filter
-r $-$
and you do not have to worry about address filters any more. Dollar
signs and fixed addresses may of course me mixed. For example, the
setting
-r $-$7fff
limits the upper end to 32 Kbytes.
-d <start>-<end>
to designate the address range that should be written as data instead
of code. For this option, single dollar signs are not allowed!
While this switch is only relevant for the DSK format, the option
-e <address>
is also valid for the Intel and Motorola formats. Its purpose is to
set the entry address that will be inserted into the hex file. If
such a command line parameter is missing, P2HEX will search a
corresponding entry in the code file. If even this fails, no entry
address will be written to the hex file (DSK/Intel) or the field
reserved for the entry address will be set to 0 (Motorola).
0 :00000001FF
1 :00000001
2 :0000000000
By default, variant 0 is used which seems to be the most common one.
-l <count> .
The allowed range of values goes from 2 to 254 data bytes; odd values
will implicitly be rounded down to an even count.
-k
allows to instruct P2HEX to erase them automatically after
conversion.
P2HEX <name>
is possible, to generate <name>.hex out of
<name>.p.
To avoid confusions: If you use this option, the resulting binary
file will become smaller because only a part of the source will be
copied. Therefore, the resulting file will be smaller by a factor of
2 or 4 compared to ALL. This is just natural...
-S
activates this function. It expects a numeric specification ranging
from 1 to 4 as parameter which specifies the length of the address
field in bytes. This number may optionally be prepended wit a
L or B letter to set the endian order of the address.
For example, the specification B4 generates a 4 byte address
in big endian order, while a specification of L2 or
simply 2 creates a 2 byte address in little endian order.
- Title: ~m~acro assembler
- Program path: AS
- Command line:
-E !1 $EDNAME $CAP MSG(AS2MSG) $NOSWAP $SAVE ALL
- assign a hotkey if wanted (e.g. Shift-F7)
The -E option assures that Turbo Pascal will not become puzzled by
STDIN and STDERR.
A. Error Messages of AS
warning
680x0, 6809 and COP8 CPUs: an address displacement of 0 was
given. An address expression without displacement is
generated, and a convenient number of NOPs are emitted to
avoid phasing errors.
none
warning
680x0-, 6502 and 68xx CPUs: a given memory location can be
reached using short addressing. A short addressing
instruction is emitted, together with the required number of
NOPs to avoid phasing errors.
none
warning
680x0- and 8086 CPUs can execute jumps using a short or long
displacement. If a shorter jump was not explicitly requested,
in the first pass room for the long jump is reserved. Then
the code for the shorter jump is emitted, and the remaining
space is filled with NOPs to avoid phasing errors.
none
warning
A SHARED directive was found, but on the command
line no options were specified, to generate a shared
file.
none
warning
The BCD-floating point format used by the 680x0-FPU allows
such a large exponent, but according to the latest databooks,
this cannot be fully interpreted. The corresponding word is
assembled, but the associated function is not expected to
produce the correct result.
none
warning
A Supervisor-mode directive was used, that was not preceded
by an explicit SUPMODE ON directive
none
warning
A short jump with a jump distance equal to 0 is not allowed
by 680x0 resp. COP8 processors, since the associated code
word is used to identify long jump instruction. Instead of a
jump instruction, AS emits a NOP
none
warning
The symbol used as an operand comes from an address space
that cannot be addressed together with the given
instruction
none
warning
The symbol used as an operand belongs to an address space
that cannot be accessed with any of the segment registers of
the 8086
The name of the inaccessible segment
warning
A symbol changed value, with respect to previous pass. This
warning is emitted only if the -r option is
used.
name of the symbol that changed value.
warning
The analysis of the usage list shows that part of the program
memory was used more than once. The reason can be an
excessive usage of ORG directives.
none
warning
A SWITCH...CASE directive without ELSECASE
clause was executed, and none of the CASE conditions
was found to be true.
none
warning
The symbol used as an operand was not found in the memory
page defined by an ASSUME directive (ST6,
78(C)10).
none
warning
The CPU allows to concatenate only register pairs, whose
start address is even (RR0, RR2, ..., only for Z8).
none
warning
The instruction used, although supported, was superseded by a
new instruction. Future versions of the CPU could no more
implement the old instruction.
none
warning
The addressing mode used for this instruction is allowed,
however a register is used in such a way that its contents
cannot be predicted after the execution of the
instruction.
none
warning
An aheaded @ must be used, so that it is explicitly referred
to the local symbols used in the section. When the operator
is used out of a section, there are no local symbols, because
this operator is useless in this context.
none
warning
The instruction used has no meaning, or it can be substituted
by an other instruction, shorter and more rapidly
executed.
none
warning
AS expects a forward definition of a symbol, i.e. a symbol
was used before it was defined. A further pass must be
executed. This warning is emitted only if the -r
option was used.
none
warning
An address was used that is not an exact multiple of the
operand size. Although the CPU databook forbids this, the
address could be stored in the instruction word, so AS simply
emits a warning.
none.
warning
The addressing mode or the address used are correct, but the
address refers to the peripheral registers, and it cannot be
used in this circumstance.
none.
warning
A register is used in a series of instructions, so that a
sequence of instructions probably does not generate the
desired result. This usually happens when a register is used
before its new content was effectively loaded in it.
the register probably causing the problem.
warning
A register used for the addressing is used once more in the
same instruction, in a way that results in a modification of
the register value. The resulting address does not have a
well defined value.
the register used more than once.
warning
Via a SFRB statement, it was tried to declare a
memory cell as bit addressable which is not bit addressable
due to the 8051's architectural limits.
none
warning
At the end of a pass, a stack defined by the program is not
empty.
the name of the stack and its remaining depth
warning
A string constant contains a NUL character. Though this works
with the Pascal version, it is a problem for the C version of
AS since C itself terminates strings with a NUL character.
i.e. the string would have its end for C just at this
point...
none
warning
The parts of a machine statement partiallly lie on different
pages. As the CPU's instruction counter does not get
incremented across page boundaries, the processor would fetch
at runtime the first byte of the old page instead of the
instruction's following byte; the program would execute
incorrectly.
none
warning
A numeric value was out of the allowed range. AS brought the
value back into the allowed range by truncating upper bits,
but it is not guaranteed that meaningful and correct code is
generated by this.
none
warning
The repetition argument of a DUP directive was smaller than
0. Analogous to a count of exactly 0, no data is stored.
none
error
A new value is assigned to a symbol, using a label or a
EQU, PORT, SFR, LABEL, SFRB or BIT instruction:
however this can be done only using SET/EVAL.
the name of the offending symbol, and the line number where
it was defined for the first time, according to the symbol
table.
error
A symbol is still not defined in the symbol table, also after
a second pass.
the name of the undefined symbol.
error
A symbol does not fulfill the requirements that symbols must
have to be considered valid by AS. Please pay attention that
more stringent syntax rules exist for macros and function
parameters.
the wrong symbol
error
The instruction format used does not exist for this
instruction.
the known formats for this command
error
The instruction (processor or pseudo) cannot be used with a
point-suffixed attribute.
none
error
The attribute following a point after an instruction must not
be longer or shorter than one character.
none
error
The number of arguments issued for the instruction (processor
or pseudo) does not conform with the accepted number of
operands.
none
error
The number of options given with this command is not
correct.
none
error
The instruction can be used only with immediate operands
(preceded by #).
none
error
Although the operand is of the right type, it does not have
the correct length (in bits).
none
error
The operands used have different length (in bits)
none
error
It is not possible to estimate, from the opcode and from the
operands, the size of the operand (a trouble with 8086
assembly). You must define it with a BYTE or
WORD PTR prefix.
none
error
an expression does not have a correct operand type
(integer/decimal/string)
the operand type
error
No more than 20 arguments can be given to any
instruction
none
error
An was used that is neither an AS instruction, nor a known
mnemonic for the current processor type.
none
error
The expression parser found an expression enclosed by
parentheses, where the number of opening and closing
parentheses does not match.
the wrong expression
error
An expression on the right side of a division or modulus
operation was found to be equal to 0.
none
error
An integer word underflowed the allowed range.
the value of the word and the allowed minimum (in most cases,
maybe I will complete this one day...)
error
An integer word overflowed the allowed range.
the value of the world, and the allowed maximum (in most
cases, maybe I will complete this one day...)
error
The given address does not correspond with the size needed by
the data transfer, i.e. it is not an integral multiple of the
operand size. Not all processor types can use unaligned
data.
none
error
The displacement used for an address is too large.
none
error
The address of the operand is outside of the address space
that can be accessed using short-addressing mode.
none
error
the addressing mode used, although usually possible, cannot
be used here.
none
error
At this point, only even addresses are allowed, since the low
order bit is used for other purposes or it is reserved.
none
error
The addressing mode(s) used are allowed in sequential, but
not in parallel instructions
none
error
The branch condition used for a conditional jump does not
exist.
none
error
the jump instruction and destination are too apart to execute
the jump with a single step
none
error
Since instruction must only be located at even addresses, the
jump distance between two instructions must always be even,
and the LSB of the jump distance is used otherwise. This
issue was not verified here. The reason is usually the
presence of an odd number of data in bytes or a wrong
ORG.
none
error
only a constant or a data register can be used for defining
the shift size. (only for 680x0)
none
error
constants for shift size or ADDQ argument can be
only within the 1..8 range (only for 680x0)
none
error
(no more used)
none
error
The register list argument of MOVEM or
FMOVEM has a wrong format (only for 680x0)
none
error
The operand combination used with the CMP
instruction is not allowed (only for 680x0)
none
error
The processor type used as argument for CPU command
is unknown to AS.
the unknown processor type
error
The control register used by a MOVEC is not (yet)
available for the processor defined by the CPU
command.
none
error
The register used, although valid, cannot be used in this
context.
none
error
A RESTORE command was found, that cannot be coupled
with a corresponding SAVE.
none
error
After the assembling pass, a SAVE command was
missing.
none.
error
A macro option parameter is unknown to AS.
the dubious option.
error
after the assembling, some of the IF- or
CASE- constructs were found without the closing
command
none
error
The command structure in a IF- or SWITCH-
sequence is wrong.
none
error
In this program module a section with the same name still
exists.
the multiple-defined name
error
In the current scope, there are no sections with this
name
the unknown name
error
Not all the sections were properly closed.
none
error
The given ENDSECTION does not refer to the most
deeply nested one.
none
error
An ENDSECTION command was found, but the associated
section was not defined before.
none
error
A symbol declared with a FORWARD or PUBLIC
statement could not be resolved.
the name of the unresolved symbol.
error
A symbol was defined both as public and private.
the name of the symbol.
error
The number of arguments used for referencing a function does
not match the number of arguments defined in the function
definition.
none
error
At the end of the program, or just before switching to
another processor type, unresolved literals still
remain.
none
error
Although the instruction is correct, it cannot be used with
the selected member of the CPU family.
none
error
Although the addressing mode used is correct, it cannot be
used with the selected member of the CPU family.
none
error
Either the number of bits specified is not allowed, or the
command is not completely specified.
none
error
This pseudo command accepts as argument either ON
or OFF
none
error
It was tried to access a stack via a POPV
instruction that was either never defined or already
emptied.
the name of the stack in question
error
Not exactly one bit was set in a mask passed to the
BITPOS function.
none
error
An ENDSTRUCT instruction was found though there is
currently no structure definition in progress.
none
error
After end of assembly, not all STRUCT instructions
have been closed with appropriate ENDSTRUCTs.
the innermost, unfinished structure definition
error
the name parameter of an ENDSTRUCT instruction does
not correspond to the innermost open structure
definition.
none
error
What should I say about that? PHASE inside a record
simply does not make sense and only leads to
confusion...
none
error
Only EXTNAMES resp. NOEXTNAMES are allowed
as directives of a STRUCT statement.
the unknown directive
error
It was tried to read past the end of a file with a
BINCLUDE statement.
none
error
The ROM table of the 680x0 coprocessor has only 64
entries.
none
error
The only function code arguments allowed are SFC, DFC, a data
register, or a constant in the interval of 0..15 (only for
680x0 MMU).
none
error
Only a number in the interval 0..15 can be used as function
code mask (only for 680x0 MMU)
none
error
The MMU does not have a register with this name (only for
680x0 MMU).
none
error
The level for PTESTW and PTESTR must be a
constant in the range of 0...7 (only for 680x0 MMU).
none
error
The bit mask used for a bit field command has a wrong format
(only for 680x0).
none
error
The register here defined cannot be used in this context, or
there is a syntactic error (only for 680x0).
none
error
An incomplete macro definition was found. Probably an
ENDM was forgotten.
none
error
EXITM is designed to terminate a macro expansion.
This instruction only makes sense within macros and an
attempt was made to call it in the absence of macros.
none
error
A macro cannot have more than 10 parameters
none
error
A macro was defined more than once in a program section.
the multiply defined macro name.
error
The command used has an influence on the length of the
emitted code, so that forward references cannot be resolved
here.
none
error
(no more implemented)
none
error
A ELSEIF- or ENDIF- command was found, that
is not preceded by an IF- command.
none
error
(no more implemented)
none
error
The function invoked was not defined before.
The name of the unknown function
error
The argument does not belong to the allowed argument range
associated to the referenced function.
none
error
Although the argument is within the range allowed to the
function arguments, the result is not valid
none
error
The base-exponent pair used in the expression cannot be
computed
none
error
No jumps can be performed by the selected CPU from this
address.
none
error
No jumps can be performed by the selected CPU to this
address.
none
error
Jump command and destination must be in the same memory
page.
none
error
An attempt was made to generate more than 1024 code or data
bytes in a single memory page.
none
error
The address space for the processor type actually used was
filled beyond the maximum allowed limit.
none
error
Instructions that reserve memory, and instructions that
define constants cannot be mixed in a single pseudo
instruction.
none
error
a STRUCT construct is only designed to describe a
data structure and not to create one; therefore, no
instructions are allowed that generate code.
none
error
Either these instructions cannot be executed in parallel, or
they are not close enough each other, to do parallel
execution.
none
error
The referenced segment cannot be used here.
The name of the segment used.
error
The segment referenced with a SEGMENT command does
not exist for the CPU used.
The name of the segment used
error
The segment referenced here does not exist (8086 only)
none
error
The string has an invalid format.
none
error
The referenced register does not exist, or it cannot be used
here.
none
error
The command used cannot be performed with the
REP-prefix.
none
error
Indirect addressing cannot be used in this way
none
error
(no more implemented)
none
error
This register can be used only in minimum mode
none
error
This register can be used only in maximum mode
none
error
The prefix combination here defined is not allowed, or it
cannot be translated into binary code
none
error
The special character defined using a backslash sequence is
not defined
none
fatal
An error was detected while trying to open a file for
input.
description of the I/O error
fatal
An error happened while AS was writing the listing file.
description of the I/O error
fatal
An error was detected while reading a source file.
description of the I/O error
fatal
While AS was writing a code or share file, an error
happened.
description of the I/O error
fatal
The memory available is not enough to store all the data
needed by AS. Try using the DPMI or OS/2 version of AS.
none
fatal
The program stack crashed, because too complex formulas, or a
bad disposition of symbols and/or macros were used. Try
again, using AS with the option -A.
none
The file requested does not exist, or it is stored on another
drive.
The path of a file does not exist, or it is on another
drive.
There are no more file handles available to DOS. Increase their
number changing the value associated to FILES= in the
file CONFIG.SYS.
Either the network access rights do not allow the file access, or
an attempt was done to rewrite or rename a protected file.
The required drive does not exist.
A file access tried to go beyond the end of file, although
according to its structure this should not happen. The file is
probably corrupted.
This is self explaining! Please, clean up !
When you don't use a hard disk as work medium storage, you should
sometimes remove the protecting tab from your diskette!
you tried to access a peripheral unit that is unknown to DOS.
This should not usually happen, since the name should be
automatically interpreted as a filename.
close the disk drive door.
A bad read error on the disk. Try again; if nothing changes,
reformat the floppy disk resp. begin to take care of your hard
disk!
the diskette/hard disk controller has not found a disk track. See
nr. 154 !
DOS cannot read the diskette format
As nr. 156, but the controller this time could not find a disk
sector in the track.
You probably redirected the output of AS to a printer. Assembler
printout can be veeery long...
The operating system detected an unclassificable read error
The operating system detected an unclassificable write error
The operating system has absolutely no idea of what happened to
the device.
D. Pseudo-Instructions Collected
Instructions that are always available
There is an additional SET resp. EVAL
instruction (in case SET is already a machine instruction).
= := ALIGN BINCLUDE CASE CHARSET CPU DEPHASE ELSE ELSECASE ELSEIF END ENDCASE ENDIF ENDM ENDSECTION ENDSTRUCT ENUM ERROR EQU EXITM FATAL FORWARD FUNCTION GLOBAL IF IFB IFDEF IFEXIST IFNB IFNDEF IFNEXIST IFNUSED IFUSED INCLUDE IRP LABEL LISTING MACEXP MACRO MESSAGE NEWPAGE ORG PAGE PHASE POPV PUSHV PRTEXIT PRTINIT PUBLIC READ RELAXED REPT RESTORE SAVE SECTION SEGMENT SHARED STRUCT SWITCH TITLE WARNING WHILE Motorola 680x0
DC[.<size>] DS[.<size>] FULLPMMU FPU PADDING PMMU SUPMODE Motorola 56xxx
DC DS XSFR YSFR PowerPC
BIGENDIAN DB DD DQ DS DT DW SUPMODE Motorola M-Core
DC[.<size>] DS[.<size>] REG SUPMODE Motorola 68xx/Hitachi 6309
ADR BYT DC[.<size>] DFS DS[.<size>] FCB FCC FDB PADDING RMB Motorola 6805/68HC08
ADR BYT DFS FCB FCC FDB RMB Motorola 6809/Hitachi 6309
ADR ASSUME BYT DFS FCB FCC FDB RMB Motorola 68HC12
ADR BYT DC[.<size>] DFS DS[.<size>] FCB FCC FDB PADDING RMB Motorola 68HC16
ADR ASSUME BYT DFS FCB FCC FDB RMB Hitachi H8/300(L/H)
DC[.<size>] DS[.<size>] MAXMODE PADDING Hitachi H8/500
ASSUME DC[.<size>] DS[.<size>] MAXMODE PADDING Hitachi SH7x00
COMPLITERALS DC[.<size>] DS[.<size>] LTORG PADDING SUPMODE 65xx/MELPS-740
ADR ASSUME BYT DFS FCB FCC FDB RMB 65816/MELPS-7700
ADR ASSUME BYT DB DD DQ DS DT DW DFS FCB FCC FDB RMB Mitsubishi MELPS-4500
DATA RES SFR Mitsubishi M16
DB DD DQ DS DT DW Mitsubishi M16C
DB DD DQ DS DT DW Intel MCS-48
DB DD DQ DS DT DW Intel MCS-(2)51
BIGENDIAN BIT DB DD DQ DS DT DW PORT SFR SFRB SRCMODE Intel MCS-96
ASSUME DB DD DQ DS DT DW Intel 8080/8085
DATA DS Intel 8080/8085
DB DD DQ DS DT DW PORT Intel i960
DB DD DQ DS DT DW FPU SPACE SUPMODE WORD Signetics 8X30x
LIV RIV Philips XA
ASSUME BIT DB DC[.<size>] DD DQ DS[.<size>] DT DW PADDING PORT SUPMODE AMD 29K
ASSUME DB DD DQ DS DT DW EMULATED SUPMODE Siemens 80C166/167
ASSUME BIT DB DD DQ DS DT DW Zilog Zx80
DB DD DEFB DEFW DQ DS DT DW EXTMODE LWORDMODE Zilog Z8
DB DD DQ DS DT DW SFR Toshiba TLCS-900
DB DD DQ DS DT DW MAXIMUM SUPMODE Toshiba TLCS-90
DB DD DQ DS DT DW Toshiba TLCS-870
DB DD DQ DS DT DW Toshiba TLCS-47(0(A))
ASSUME DB DD DQ DS DT DW PORT Toshiba TLCS-9000
DB DD DQ DS DT DW Microchip PIC16C5x
DATA RES SFR ZERO Microchip PIC16C5x
DATA RES SFR ZERO Microchip PIC17C42
DATA RES SFR ZERO SGS-Thomson ST6
ASCII ASCIZ ASSUME BYTE BLOCK SFR WORD SGS-Thomson ST7
DC[.<size>] DS[.<size>] PADDING SGS-Thomson ST9
ASSUME BIT DB DD DQ DS DT DW REG 6804
ADR BYT DFS FCB FCC FDB RMB SFR Texas TM3201x
DATA PORT RES Texas TM32C02x
BFLOAT BSS BYTE DATA DOUBLE EFLOAT TFLOAT LONG LQxx PORT Qxx RES RSTRING STRING WORD Texas TMS320C3x
ASSUME BSS DATA EXTENDED SINGLE WORD Texas TM32C05x
BFLOAT BSS BYTE DATA DOUBLE EFLOAT TFLOAT LONG LQxx PORT Qxx RES RSTRING STRING WORD Texas TMS9900
BSS BYTE PADDING WORD Texas TMS70Cxx
DB DD DQ DS DT DW Texas TMS370
DB DBIT DD DQ DS DT DW Texas MSP430
BSS BYTE PADDING WORD National SC/MP
DB DD DQ DS DT DW National COP8
ADDR ADDRW BYTE DB DD DQ DS DSB DSW DT FB FW SFR WORD NEC µPD78(C)1x
ASSUME DB DD DQ DS DT DW NEC 75K0
ASSUME BIT DB DD DQ DS DT DW SFR NEC 78K0
DB DD DQ DS DT DW NEC µPD772x
DATA RES NEC µPD772x
DS DW Symbios Logic SYM53C8xx
E. Predefined Symbols
name
data type
definition
meaning
ARCHITECTURE
BIGENDIAN
CASESENSITIVE
CONSTPI
DATE
FALSE
HASFPU
HASPMMU
INEXTMODE
INLWORDMODE
INMAXMODE
INSUPMODE
string
boolean
boolean
float
string
boolean
boolean
boolean
boolean
boolean
boolean
boolean
predef.
dyn.(0)
normal
normal
predef.
predef.
dyn.(0)
dyn.(0)
dyn.(0)
dyn.(0)
dyn.(0)
dyn.(0)
target platform AS was
compiled for, in the style
processor-manufacturer-
operating system
storage of constants MSB
first ?
case sensitivity in symbol
names ?
constant Pi (3.1415.....)
date of begin of assembly
0 = logically ''false''
coprocessor instructions
enabled ?
MMU instructions enabled ?
XM flag set for 4 Gbyte
address space ?
LW flag set for 32 bit
instructions ?
processor in maximum
mode ?
processor in supervisor
mode ?
name
data type
definition
meaning
INSRCMODE
FULLPMMU
LISTON
MACEXP
MOMCPU
MOMCPUNAME
MOMFILE
MOMLINE
MOMPASS
MOMSECTION
MOMSEGMENT
boolean
boolean
boolean
boolean
integer
string
string
integer
integer
string
string
dyn.(0)
dyn.(0/1)
dyn.(1)
dyn.(1)
dyn.
(68008)
dyn.
(68008)
special
special
special
special
special
processor in source mode ?
full PMMU instruction set
allowed ?
listing enabled ?
expansion of macro con-
structs in listing enabled ?
number of target CPU
currently set
name of target CPU
currently set
current source file
(including include files)
current line number in
source file
number of current pass
name of current section or
empty string if out of any
section
name of address space
currently selected
with SEGMENT
name
data type
definition
meaning
PADDING
RELAXED
PC
TIME
TRUE
VERSION
*
$
boolean
boolean
integer
string
integer
integer
integer
integer
dyn.(1)
dyn.(0)
special
predef.
predef.
predef.
special
special
pad byte field to even
count ?
any syntax allowed integer
constants ?
curr. program counter
(Thomson)
time of begin of assembly
(1. pass)
1 = logically ''true''
version of AS in BCD
coding, e.g. 1331 hex for
version 1.33p1
curr. program counter
(Motorola, Rockwell, Micro-
chip, Hitachi)
curr. program counter (Intel,
Zilog, Texas, Toshiba, NEC,
Siemens, AMD)
''If I have seen farther than other men,
it is because I stood on the shoulders of giants.''
--Sir Isaac Newton
''If I haven't seen farther than other men,
it is because I stood in the footsteps of giants.''
--unknown
I. Hints for the AS Source Code
All this lead to the result that I used Turbo Pascal to implement AS,
and it stayed that way until today. The times however have changed:
With the demise of Borland's Turbo/Borland Pascal line, the only
Pascal compiler that is still supported is Delphi, something that is
IMHO only good for programs that consist of 90% user interface,
completely unusable for a command-line driven like AS. My operating
system focus also meanwhile changed into the direction of Unix, and
the ANSI standard is now widely accepted. A port of the AS source
code to C is almost finished, but the Pascal version is currently
still the ''reference''. Borland Pascal version 7 is recommended for
compilation; version 6 should also work, but you will not be able to
use some features (like protected mode).
; Longint shift right
; In DX:AX = Value
; CX = Shift count
; Out DX:AX = Result
; Correction 11.6.1994 AA
LongShr:
CMP Test8086,2
JB @@1
.386
SHL EAX,16
SHRD EAX,EDX,16
SHR EAX,CL
SHLD EDX,EAX,16
RETF
.8086
@@1: AND CX,1FH
JE @@3
@@2: SHR DX,1
RCR AX,1
LOOP @@2
@@3: RETF
; Longint shift left
; In DX:AX = Value
; CX = Shift count
; Out DX:AX = Result
; Correction 11.6.1994 AA
LongShl:
CMP Test8086,2
JB @@1
.386
SHL EAX,16
SHRD EAX,EDX,16
SHL EAX,CL
SHLD EDX,EAX,16
RETF
.8086
@@1: AND CX,1FH
JE @@3
@@2: SHL AX,1
RCL DX,1
LOOP @@2
@@3: RETF
People without the sources may alternitavely use the ''emergency
fix'' to set the predefined variable Test8086 to a value
smaller than 2 right at the main program's beginning, thereby
disabling the optimizations at all...
STDINC.PAS
AS.PAS
ASMDEF.PAS
ASMSUB.PAS
ASMPARS.PAS
ASMMAC.PAS
ASMIF.PAS
ASMCODE.PAS
CODEALLG.PAS
CODEPSEU.PAS
DECODECM.PAS
This module is not only used by AS; the tools BIND, P2HEX, and P2BIN
also include it.
STDHANDL.PAS
These channels are predefined by DOS and are now accessible via
normal text variables. AS currently only uses STDERR.
NLS.PAS
AS currently does not use all information offered by this unit (what
should an assembler do with money ;-) ), but NLS support
will become better in the future. This is a unit that is especially
suited for use in other programs: simple inclusion delivers an
UpCase function that works correctly for all characters,
something that Borland did not offer though people have been
quarreling for years about it!
STRINGLI.PAS
STRINGUT.PAS
CHUNKS.PAS
INCLIST.PAS
FILENUMS.PAS
CODExxxx.PAS
I.3. A New Processor...And Now?
Choosing the Processor's Name
Definition of the Code Generator Module
CPUxxxx:=AddCPU('XXXX',SwitchTo_xxxx);
'XXXX' is the name chosen for the processor which later must
be used in assembler programs to switch AS to this target
processor. SwitchTo_xxxx (abbreviated as the ''switcher'' in
the following) is a procedure without parameters that is called by AS
when the switch to the new processor actually takes place.
AddCPU delivers an integer value as result that serves as an
internal ''handle'' for the new processor. The global variable
MomCPU always contains the handle of the target processor that
is currently set. The value returned by AddCPU should be
stored in a private variable of type CPUVar (called
CPUxxxx in the example above). In case a code generator module
implements more than one processor (e.g. several processors of a
family), the module can find out which instruction subset is
currently allowed by comparing MomCPU against the stored
handles.
Do not assume that any of these variables has a predefined value; set
them all!!
All these routines do not receive any parameters and have to be coded
as FAR procedures (like the switcher itself); assignment to
procedure variables would otherwise be impossible.
AddCopyright(
"Intel 80986 code generator (C) 2010 Jim Bonehead");
The string passed to AddCopyright will be printed upon
program start in addition to the standard message.
Inserting the Code Generator Module
Writing the Code Generator itself
Studying other existing code generators should also prove to be
helpful.
Modifications of Tools
68030 Assembly Language Reference.
Addison-Wesley, Reading, Massachusetts, 1989
AM29240, AM29245, and AM29243 RISC
Microcontrollers.
1993
AVR Enhanced RISC Microcontroller Data Book.
May 1996
8-Bit AVR Assembler and Simulator Object File Formats
(Preliminary).
(part of the AVR tools documentation)
G65SC802/G65SC816 CMOS 8/16-Bit Microprocessor.
Family Data Sheet.
CP/M 68K Operating System User's Guide.
1983
FasMath 83D87 User's Manual.
1990
DS80C320 High-Speed Micro User's Guide.
Version 1.30, 1/94
8-/16-Bit Microprocessor Data Book.
1986
Understanding HD6301X/03X CMOS Microprocessor
Systems.
published by Hitachi
H8/300H Series Programming Manual.
(21-032, no year of release given)
SH Microcomputer Hardware Manual (Preliminary).
SH7700 Series Programming Manual.
1st Edition, September 1995
H8/500 Series Programming Manual.
(21-20, 1st Edition Feb. 1989)
H8/532 Hardware Manual.
(21-30, no year of release given)
H8/534,H8/536 Hardware Manual.
(21-19A, no year of release given)
PPC403GA Embedded Controller User's Manual.
First Edition, September 1994
Embedded Controller Handbook.
1987
Microprocessor and Peripheral Handbook, Volume I
Microprocessor.
1988
80960SA/SB Reference Manual.
1991
8XC196NT Microcontroller User's Manual.
June 1995
8XC251SB High Performance CHMOS Single-Chip
Microcontroller.
Sept. 1995, Order Number 272616-003
80296SA Microcontroller User's Manual.
Sept. 1996
A memo on the secret features of 6309.
(available via World Wide Web:
http://www.cs.umd.edu/users/fms/comp/CPUs/6309.txt)
Microchip Data Book.
1993 Edition
Single-Chip 8-Bit Microcomputers.
Vol.2, 1987
Single-Chip 16-Bit Microcomputers.
Enlarged edition, 1991
Single-Chip 8 Bit Microcomputers.
Vol.2, 1992
M34550Mx-XXXFP Users's Manual.
Jan. 1994
M16 Family Software Manual.
First Edition, Sept. 1994
M16C Software Manual.
First Edition, Rev. C, 1996
M30600-XXXFP Data Sheet.
First Edition, April 1996
Microprocessor, Microcontroller and Peripheral
Data.
Vol. I+II, 1988
MC68881/882 Floating Point Coprocessor User's
Manual.
Second Edition, Prentice-Hall, Englewood Cliffs 1989
MC68851 Paged Memory Management Unit User's
Manual.
Second Edition, Prentice-Hall, Englewood Cliffs 1989,1988
CPU32 Reference Manual.
Rev. 1, 1990
DSP56000/DSP56001 Digital Signal Processor User's
Manual.
Rev. 2, 1990
MC68340 Technical Summary.
Rev. 2, 1991
CPU16 Reference Manual.
Rev. 1, 1991
Motorola M68000 Family Programmer's Reference
Manual.
1992
MC68332 Technical Summary.
Rev. 2, 1993
PowerPC 601 RISC Microprocessor User's Manual.
1993
PowerPC(tm) MPC505 RISC Microcontroller Technical
Summary.
1994
CPU12 Reference Manual.
1st edition, 1996
CPU08 Reference Manual.
Rev. 1 (no year of release given im PDF-File)
MC68360 User's Manual.
MCF 5200 ColdFire Family Programmer's Reference
Manual.
1995
M*Core Programmer's Reference Manual.
1997
DSP56300 24-Bit Digital Signal Processor Family
Manual.
Rev. 0 (no year of release given im PDF-File)
SC/MP Programmier- und Assembler-Handbuch.
Publication Number 4200094A, Aug. 1976
COP800 Assembler/Linker/Librarian User's Manual.
Customer Order Number COP8-ASMLNK-MAN
NSC Publication Number 424421632-001B
August 1993
COP87L84BC microCMOS One-Time-Programmable (OTP)
Microcontroller.
Preliminary, March 1996
µpD70108/µpD70116/µpD70208/µpD70216/µ
pD72091 Data Book.
(no year of release given)
User's Manual µCOM-87 AD Family.
(no year of release given)
µCOM-75x Family 4-bit CMOS Microcomputer User's
Manual.
Vol. I+II (no year of release given)
Digital Signal Processor Product Description.
PDDSP.....067V20 (no year of release given)
µPD78070A, 78070AY 8-Bit Single-Chip Microcontroller
User's Manual.
Document No. U10200EJ1V0UM00 (1st edition), August 1995
Data Sheet µPD78014.
16-bit 80C51XA Microcontrollers (eXtended
Architecture).
Data Handbook IC25, 1996
8 Bit MCU Families EF6801/04/05 Databook.
1st edition, 1989
ST6210/ST6215/ST6220/ST6225 Databook.
1st edition, 1991
ST7 Family Programming Manual.
June 1995
ST9 Programming Manual.
3rd edition, 1993
SAB80C166/83C166 User's Manual.
Edition 6.90
SAB C167 Preliminary User's Manual.
Revision 1.0, July 1992
SAB-C502 8-Bit Single-Chip Microcontroller User's
Manual.
Edition 8.94
SAB-C501 8-Bit Single-Chip Microcontroller User's
Manual.
Edition 2.96
C504 8-Bit CMOS Microcontroller User's Manual.
Edition 5.96
Programmierung des 68000.
Sybex-Verlag Düsseldorf, 1985
Symbios Logic PCI-SCSI-I/O Processors PRogramming
Guide.
Version 2.0, 1995/96
Model 990 Computer/TMS9900 Microprocessor Assembly Language
Programmer's Guide.
1977, Manual No. 943441-9701
TMS9995 16-Bit Microcomputer.
Preliminary Data Manual 1981
First-Generation TMS320 User's Guide.
1988, ISBN 2-86886-024-9
TMS7000 family Data Manual.
1991, DB103
TMS320C3x User's Guide.
Revision E, 1991
TMS320C2x User's Guide.
Revision C, Jan. 1993
TMS370 Family Data Manual.
1994, SPNS014B
MSP430 Family Software User's Guide.
1994, SLAUE11
MSP430 Metering Application.
1996, SLAAE10A
MSP430 Family Architecture User's Guide.
1995, SLAUE10A
TMS320C62xx CPU and Instruction Set Reference
Manual.
Jan. 1997, SPRU189A
8-Bit Microcontroller TLCS-90 Development System
Manual.
1990
8-Bit Microcontroller TLCS-870 Series Data Book.
1992
16-Bit Microcontroller TLCS-900 Series Users
Manual.
1992
16-Bit Microcontroller TLCS-900 Series Data Book:
TMP93CM40F/ TMP93CM41F.
1993
4-Bit Microcontroller TLCS-47E/47/470/470A Development
System Manual.
1993
TLCS-9000/16 Instruction Set Manual Version 2.2.
10. Feb 1994
Bipolare Mikroprozessoren und bipolare
LSI-Schaltungen.
Datenbuch, 1985, ISBN 3-87095-186-9
Z8 Microcontrollers Databook.
1992
Discrete Z8 Microcontrollers Databook.
(no year of release given)
Z380 CPU Central Processing Unit User's Manual.
(no year of release given)
ADR 1 | ALIGN 1 | ASCII 1 | ASCIZ 1 | ASSUME 1 |
BFLOAT 1 | BIGENDIAN 1 | BINCLUDE 1 | BIT 1 | BLOCK 1 |
BRANCHEXT 1 | BSS 1 | BYT 1 | BYTE 1 | CASE 1 |
CHARSET 1 | CODEPAGE 1 | CPU 1 | DATA 1 | DB 1 |
DBIT 1 | DC 1 | DD 1 | DEPHASE 1 | DFS 1 |
DOUBLE 1 | DQ 1 | DS 1 2 | DSB 1 | DSW 1 |
DT 1 | DW 1 | EFLOAT 1 | ELSE 1 | ELSECASE 1 |
ELSEIF 1 | END 1 | ENDCASE 1 | ENDIF 1 | ENDM 1 |
ENDSECTION 1 | ENDSTRUCT 1 | ENUM 1 | EQU 1 | ERROR 1 |
EXITM 1 | EXTENDED 1 | EXTMODE 1 | FATAL 1 | FB 1 |
FCB 1 | FCC 1 | FDB 1 | FLOAT 1 | FORWARD 1 |
FPU 1 | FULLPMMU 1 | FUNCTION 1 | FW 1 | GLOBAL 1 |
IF 1 | IFB 1 | IFDEF 1 | IFEXIST 1 | IFNB 1 |
IFNDEF 1 | IFNEXIST 1 | IFNUSED 1 | IFUSED 1 | INCLUDE 1 |
IRP 1 | IRPC 1 | LABEL 1 | LISTING 1 | LIV 1 |
LONG 1 | LQxx 1 | LTORG 1 | LWORDMODE 1 | MACEXP 1 |
MACRO 1 | MAXMODE 1 | MESSAGE 1 | NEWPAGE 1 | ORG 1 |
PADDING 1 | PAGE 1 | PHASE 1 | PMMU 1 | POPV 1 |
PORT 1 | PRTEXIT 1 | PRTINIT 1 | PUBLIC 1 | PUSHV 1 |
Qxx 1 | RADIX 1 | READ 1 | REG 1 | RELAXED 1 |
REPT 1 | RES 1 | RESTORE 1 | RIV 1 | RMB 1 |
RSTRING 1 | SAVE 1 | SECTION 1 | SEGMENT 1 | SET 1 |
SFR 1 | SFRB 1 | SHARED 1 2 3 | SINGLE 1 | SPACE 1 |
SRCMDE 1 | STRING 1 | STRUCT 1 | SUPMODE 1 | SWITCH 1 |
TFLOAT 1 | TITLE 1 | WARNING 1 | WHILE 1 | WORD 1 |
XSFR 1 | YSFR 1 | ZERO 1 | register symbols 1 |