Solar Assembler Reference Manual

for Sol_Asm version 0.36.62.00

updated 09.05.2018

Chapter.1 Introduction
Chapter.2 Running Solar assembler
Chapter 3. Program Setup
Chapter 4. Language elements
Chapter 5 Data definitions
Chapter 6. General Code Syntax
Chapter 7. Directives
Chapter 8. Internal Resource compiler
Chapter 9. Listing
Appendix.1 Other issues
Appendix.2 Known keywords
Appendix.3 Sample programs

Chapter.1 Introduction

This document presents an overview of the syntax and usage of Solar Assembler. It makes the assumption that the reader is familiar with assemblers and ASM programming language.

During this document the following terms and abbreviations are used:

Abbreviation Description

Sol_Asm Solar Assembler

OS Operating system

Win32 or Win64 Windows 32 or 64 bits operating system

PE32 or PE64 Portable Executable Format - 32 or 64 bits

DLL Dynamic Link library

CDECL C default calling convention

STDCALL Win32 API default calling convention

OMF Object Module Format - OBJ format specification

COFF Common Object Format - OBJ format specification

ELF Executable and Linking Format - OBJ format specification

HLL High Level Language

Abbreviation	Description
Sol_Asm	Solar Assembler
OS	Operating system
Win32 or Win64	Windows 32 or 64 bits operating system
PE32 or PE64	Portable Executable Format - 32 or 64 bits
DLL	Dynamic Link library
CDECL	C default calling convention
STDCALL	Win32 API default calling convention
OMF	Object Module Format - OBJ format specification
COFF	Common Object Format - OBJ format specification
ELF	Executable and Linking Format - OBJ format specification
HLL	High Level Language

Also in this document the accolades "{}" are used to enclose some text, name or value that you have to specify in syntax definition.

One exception to this rule is in STRUCTURE initialization section where "{}" are part of the syntax.

1.1.Design Goals

SOL_ASM is designed from the point of view of the creator that uses ASM as its main programming language. Hence Sol_Asm tries to ease the development of huge ASM only projects.

However Sol_Asm can also be used as a low level assembler without the help from HLL directives.

Sol_Asm main features are:

multiple pass modern macro assembler
both low level and high level assembler.
promotes easy to read code styles
portable on different OS architectures (Win98, Winxp, Win7, Win 8, Win10 32/64, Linux, MacOSX)
very fast on huge and complex ASM projects
independent, self contained
- with none or very little dependency on OS
- with no dependency on any external library or code (*needs minimal LibC on Linux)

In daily usage this means that:

Sol_Asm can directly generate 16/32/64 bits binary executables without a linker
Sol_Asm can generate object files for linkers and interaction with other tools.
Sol_Asm has a portability layer and runs on multiple OSes.
Sol_Asm compiles 350.000 lines per second on average for huge projects.
Sol_Asm can cross compile 64 bits code on 32 bits systems, Unix code on Linux, etc

It also means that SOL_ASM does contain a decent amount of HLL features like:

Procedures with PROC/ARG/LOCALS/USES handling
INVOKE keyword to call PROCS and API functions
Structures and Unions and Enums: STRUC/ENUM/UNION
IF statements with logical epressions: .IF condition1 .and condition2 /.ELSEIF/.ELSE/.ENDIF
Macros with local variables and arguments: MACRO/MARG/EXITM
Loop statements: .WHILE/.ENDW and .REPEAT/.UNTIL

All HLL statements are implemented internally in SOL_ASM and code is generated for them at compile time (not by user included macros). This means that all those features can be used to start development with minimal includes.

Of course Sol_Asm is written in assembly language and compiled by Sol_Asm itself. That is why it is named sol_asm2 ... because sol_asm is building sol_asm2 ;)

1.2 Targets

Short term targets

The short term targets until alpha stage have been:

Basic 32 bits encodings and HLL features (PROC, MACRO, .IF)
Compile itself
Compile Solar OS system32 module.
Run on two OSes: Win32 and Solar OS.

All of the short term targets have been acquired.

Long term targets

The long term targets are:

Improve 16/32/64 bits encodings
Improve HLL features
Generate more OBJ and executable formats (ELF, etc)
Run on more OSes: Windows, Unix, Linux, MacOSX.
add resource compiler
add linker
add an IDE for Sol_ASM
Make syntax more simple and more easy to understand
Add latest SSE, virtualization and AVX extensions
Optimizations

Most (but not all) of the long term targets have been acquired also.

1.3 Fair warning

Solar Assembler is stable and functional but still in development. This means that it still contains bugs and has a few missing features. Caution is advised to the user.
However it should be ok for personal / big ASM projects.

Lately Sol_Asm has been used by me to develop:

Sol_ASM itself
Solar_OS operating system 32/64 bits versions
Hostile Encounter RTS Game

Those are all big and complex projects and Sol_ASM has proved itself valid for them.

1.4 OS specific versions

This document assumes you are using the Win32 version of Sol_Asm. Other OS versions and details are not fully presented here.

However SOL_ASM OS specific versions are almost the same and share 99% of code with Win32 versions.

The only differences are:

Windows

The Winows 32bit version is always provided in binary form and it is tested more often than all other versions.

Linux

The Linux 32 bit version is provided in ELF OBJ format and you will have to link it on your OS with your LibC system library in order to obtain the Sol_ASM executable.
- For example the linking could be done with this command in Linux:
```
 gcc sol_asm2_unix_elf.obj -o sol_asm2 
```
On Linux 64 bits you need to install support for running and compiling 32 bit binaries.
- This can normally install 32 bits support on Ubuntu 16.04 64 bits
```
 sudo apt-get install gcc-multilib 
```
- And then you can link sol_asm with this command:
```
 gcc -m32 sol_asm2_unix_elf.obj -o sol_asm2 
```
Warning: The linux version still needs the .asm source code to use CR+LF line terminators.
Hence you must use an editor that can keep/insert CR+LF in source code on Linux
(this will be fixed in future versions)

In latest versions a Linux binary is provided for Ubuntu 16.04 or 14.04

MacOSX

On MacOSX or other Unix OS you will have follow the same procedures as above for Linux in order to link the executable to yout local libC.
On MacOSX the Mach-O object generator still has some bugs.
It is recomended that you build/compile in ELF32/64 obj files with sol_asm and then use a toll like objconv by Agner Fog to convert the ELF obj to MachO obj

Chapter.2 Running Solar assembler

2.1 Invocation

You can execute SOL_ASM from the command line like this:

Syntax:

	sol_asm2 {input_file} {output_file} {-options}

	sol_asm2 {-options} {input_file} {output_file}

Example:

	sol_asm2  -pe32  my_game.asm  my_game.exe

2.2 Options

All command line options must be specified with the "-" prefix character. On Windows you can also use the "/" prefix character.

Help options:

Option Action Obs.

-h, -help This will print all help text and then exit

-h0, -h1, -h2, -h3, -h4 This will print a limited part of help text and then exit

h0: help options

h1: format options

h2: PE sub-systems

h3: Other options

h4: Info options

Option	Action	Obs.
-h, -help	This will print all help text and then exit
-h0, -h1, -h2, -h3, -h4	This will print a limited part of help text and then exit	h0: help options h1: format options h2: PE sub-systems h3: Other options h4: Info options

Output options:

Option Action Obs.

-pe32 This will generate Win32 Portable Executable

-pe64 This will generate Win64 PE executable

-console This will set console sub-system (for Win32)

-dll This will set DLL characteristics of PE (for Win32) make a DLL

-binary This will generate a plain binary useful for OS development or handcrafted formats

-omf32 This will generate an OBJ file in OMF format. This OBJ can be linked with ALINK linker.

-coff32 This will generate an 32 bit OBJ file in MS-COFF format. This OBJ can be linked with MS link, Polink, GOlink and other linkers. and used in projects that link multiple modules together as OBJ

-coff64 This will generate COFF 64 OBJ format

-elf32 This will generate an 32 bit OBJ file in ELF format. can be linked with LD or GCC on Unix like systems

-elf64 This will generate an 64bit OBJ file in ELF format. can be linked with LD or GCC on Unix like systems

-mac32 This will generate an OBJ file in MACHO format. is still experimental or not finished

Option	Action	Obs.
-pe32	This will generate Win32 Portable Executable
-pe64	This will generate Win64 PE executable
-console	This will set console sub-system (for Win32)
-dll	This will set DLL characteristics of PE (for Win32)	make a DLL
-binary	This will generate a plain binary	useful for OS development or handcrafted formats
-omf32	This will generate an OBJ file in OMF format.	This OBJ can be linked with ALINK linker.
-coff32	This will generate an 32 bit OBJ file in MS-COFF format.	This OBJ can be linked with MS link, Polink, GOlink and other linkers. and used in projects that link multiple modules together as OBJ
-coff64	This will generate COFF 64 OBJ format
-elf32	This will generate an 32 bit OBJ file in ELF format.	can be linked with LD or GCC on Unix like systems
-elf64	This will generate an 64bit OBJ file in ELF format.	can be linked with LD or GCC on Unix like systems
-mac32	This will generate an OBJ file in MACHO format.	is still experimental or not finished

For MacOSx it is still recommended that you use the ELF32/64 format and then convert from ELF to MachO obj format before linking (eg by using objconv by Agner Fog)

Other options:

Option Action Obs.

-q Be Quiet: only error messages are shown (for makefiles) for makefiles

-d equ_name This will define equ_name symbol at command line. The value of the symbol is 1 (one) and can be later tested in source code.

-size This will optimize for the output for size.
Using this option will usually result in more passes being done.

-dbg This will generate debug info. Works for PE32, ELF and COFF OBJ, other debug formats and levels will follow.

-list This will generate an listing file named: output_filename_list.lst One extra pass will be done for listing

-list_pass This will generate a series of listing files named: _list_1.lst list_2.lst ... one file for each pass. Compile speed will be slower with this option

-bench This will show a compiler/parser speed benchmark

Option	Action	Obs.
-q	Be Quiet: only error messages are shown (for makefiles)	for makefiles
-d equ_name	This will define equ_name symbol at command line.	The value of the symbol is 1 (one) and can be later tested in source code.
-size	This will optimize for the output for size.	Using this option will usually result in more passes being done.
-dbg	This will generate debug info.	Works for PE32, ELF and COFF OBJ, other debug formats and levels will follow.
-list	This will generate an listing file named: output_filename_list.lst	One extra pass will be done for listing
-list_pass	This will generate a series of listing files named: _list_1.lst list_2.lst ... one file for each pass.	Compile speed will be slower with this option
-bench	This will show a compiler/parser speed benchmark

Info options:

Option Action Files name suffix

-info This will generate OllyDbg specific info.
This can be loaded into OllyDbg with the Labelmaster plugin. _info.lst

-info_proc This will generate a list of all PROC's and their arguments. _proc.lst

-info_stru This will generate a list of all STRUC's and their members info. _stru.lst

-info_equ This will generate a list of all EQU's items and their values. _equ.lst

-info_enum This will generate a list of all ENUM items and their values as EQU definitions. _enum.inc

-info_tkn This will generate a list of known opcodes and directives. _tkn.lst

-info_files This will generate a list include files and folders. _files.lst

-info_reloc This will generate a list of relocations _reloc.lst

-info_sect This will generate a list of sections for each pass _sections_N.lst

-info_all This will generate all above info.

Option	Action	Files name suffix
-info	This will generate OllyDbg specific info. This can be loaded into OllyDbg with the Labelmaster plugin.	_info.lst
-info_proc	This will generate a list of all PROC's and their arguments.	_proc.lst
-info_stru	This will generate a list of all STRUC's and their members info.	_stru.lst
-info_equ	This will generate a list of all EQU's items and their values.	_equ.lst
-info_enum	This will generate a list of all ENUM items and their values as EQU definitions.	_enum.inc
-info_tkn	This will generate a list of known opcodes and directives.	_tkn.lst
-info_files	This will generate a list include files and folders.	_files.lst
-info_reloc	This will generate a list of relocations	_reloc.lst
-info_sect	This will generate a list of sections for each pass	_sections_N.lst
-info_all	This will generate all above info.

Notes

the input and output file names can contain spaces if quoted
command line options are promoted as EQU's and can be tested in source code
-size will:
- optimize jumps code size when possible (small versus large)
- align procedures closer (4 versus 32 bytes)

The info_files option

If you have a main ASM file that includes all other files then Sol_ASM will parse this tree and generate a list of all the files included in your project. The sub folders of your project's main ASM file will become "groups".

The generated list of files is in RadASM INI format and you can copy paste it in a dummy project in order to transfer your existing non-RadASM project into an RadASM project.

2.3 OllyDbg specific debug info

Sol_Asm can generate a file named: output_filename_info.txt that will contain a list of your application LABELS, PROC's and their addresses. This file can be loaded in OllyDbg by the LabelMaster plug-in and can help symbolic debugging a lot. You will be able to see familiar code labels, PROC names, variable names and call stack in OllyDbg.

You can obtain Labelmaster plugin here

The same thing can be obtained by using the -dbg command line option that will generate debug info inside the OBJ or PE32 files. However this simple ascii format has some advantages (multiple address with the same name) and can be used with ease for your own custom debugging utils.

Chapter 3. Program Setup

A series of initial statements are required for making a valid program. this is usually called "red tape" to wrap the package.

Here is a sample of the most simple Sol_Asm program:

; minimal test file
section "code" class_code

	nop
	ret

Only one declaration is absolutely required by SOL_ASM: the section declaration.

3.1 Sections

SOL_ASM divides a program into multiple sections.

You define a section like this:

Syntax:

	section {"section name"} {section_type}

Example:

	section "code" 	class_code
	section "data"  class_data
	section "idata" class_imports

At least one section must be defined before any code generation.

Section type Description Attributes

class_code for code CODE, EXECUTE, READ

class_data for initialized data INITIALIZED, READ, WRITE

class_bss for not initialized data READ, WRITE, RAW size = 0

class_imports for imports INITIALIZED, READ, WRITE

class_relocs relocations INITIALIZED, READ, WRITE

class_exports exports INITIALIZED, READ

class_rsrc for resources work in progress

Section type	Description	Attributes
class_code	for code	CODE, EXECUTE, READ
class_data	for initialized data	INITIALIZED, READ, WRITE
class_bss	for not initialized data	READ, WRITE, RAW size = 0
class_imports	for imports	INITIALIZED, READ, WRITE
class_relocs	relocations	INITIALIZED, READ, WRITE
class_exports	exports	INITIALIZED, READ
class_rsrc	for resources	work in progress

After you have defined your sections, in the program body you can switch in between sections with ".section_name" like this:

.code
	; enter your code here
.data
	; enter some data definitions here
.code
	; return / continue to code section

Notes:

CLASS_RELOCS: the presence of this section will automatically turn on the generation of relocations in output file.
CLASS_IMPORTS: section data will be overwritten by imports
CLASS_BSS: section data will not be emitted into output
CLASS_EXPORTS: section will be overwritten by exports
CLASS_RSRC: if it contains any data it will fix the corresponding PE data directory with section RVA and SIZE

3.1.2 Section Name Alias

When defining a section you can provide an alias name like this:

Syntax

	SECTION	{section_name_for program} ALIAS {section_name_for_OS}

This is useful for linkers that have default section naming conventions or like to unite sections based on section name.

For example:

	SECTION "code"	CLASS_CODE ALIAS ".text"

This will allow to use the familiar ".code" and ".data" section selectors in your program and still output a section name according to your linker's preferences.

Alternatively you can name your section".text" and select it with "..text"

3.2 Imports

If your program is a PE32 or PE64 format then you can specify the imported DLL's and the functions imported from each DLL

3.2.1 Imports Definition

You define imports like this:

Syntax:

	FROM	 	{dll_name}
	IMPORT		{function_name} {[param_count]} {calling convention} ALIAS {alias_name}

Example:

	from	 	kernel32.dll
	import		ExitProcess
	import		GetStdHandle [1] STDCALL ALIAS _GetStdHandle@4

	from		user32.dll
	import		MessageBox ALIAS MessageBoxA

The above example will import:

ExitProcess and GetStdHandle from kernel32.dll
MessageBoxA from user32.dll

Each "import" statement belongs to the previous "from" statement.

Notes:

If you import something then you must also provide a section with "class_imports" type.
If you do not need imports then you can omit imports declarations and section.
"from" DLL_name can contain a path
ALIAS, calling convention and param count can be omitted.

Alternative names for import keywords are:

FROM can also be named FROM_DLL or IMPORT_DLL
IMPORT can also be named IMPORT_FUNC

3.2.2 Imports Alias

When importing an API name you can provide an alias name like this:

Syntax

	IMPORT	{function_name_for program} ALIAS {function_name_for_OS}

For example:

	import	MessageBox alias MessageBoxA

This will allow your program to refer to ASCII or UNICODE versions of API using a single API name across your code.

3.2.3 Calling convention for imports

By default Sol_Asm considers all imported functions to be STDCALL for binary and 32 bits format and WIN64 for 64 bits format.

You can establish a different calling convention of an imported function like this:

Example:

	import	Str_Printf CDECL/win64/stdcall/lin64

Additionally you can add the "varg" statement to mark a variable arguments import function. This is usefull for lin64 calling convention.

3.2.4 Argument count for Imports

Sol_Asm does not need procedures prototypes because it will extract this information from your PROC definition even if the definition is present after procedure usage. However imported API's are not defined in your sources and hence by default Sol_Asm will not check the argument count for imported functions.

You can define the argument count for imports like this

Syntax

	IMPORT	{function_name} [{argument_count}]

Example:

	import	MessageBox [4] alias MessageBoxA

In this case Sol_Asm will check an INVOKE statement to have 4 parameters and the IMPORT statement acts as a mini prototype.

3.3 EXTERN

With EXTERN you can define symbols external to your module (defined in other modules). This is usefully when you gnerate OBJ files and link them together with an external linker.

Syntax

	EXTERN	{function_name} [{argument_count}] ALIAS {function_alias}

Example:

	extern AddAtom [1] alias _AddAtomA@4

Notes:

ALIAS and argument count can be missing

EXTERN is similar to IMPORT but it does not need a FROM_DLL statement

EXTERN symbols will be solved by the linker at link time. Depending on your linker configuration they can be linked in statically or dynamically.

3.4 EXPORT

Using EXPORT you can export a procedure or a label from your program. It works for PE32, DLL and COFF,ELF output formats.

For OBJ output formats it has the effect of making your symbol PUBLIC.

3.4.1 Define Exports

You define exported functions like this:

Syntax

	EXPORT {proc_or_label_name} ALIAS {export_name_for_output}

Notes:

ALIAS keyword can be omitted.
It works for procedures and labels.
An error is generated if an EXPORT statement is present but no CLASS_EXPORTS section is defined for PE and DLL output format but not for OBJ output.
The /DLL command line option might be omitted and then the generated EXE will still have exports (if a CLASS_EXPORTS section is present).
Normally you should also enable relocations when generating an DLL.

3.5 Entry point

By default the entry point is at the start of the first section defined.

However you can specify another location like this: Syntax:

	.ENTRY {symbol_name}

Example:

	.entry App_Init
...
App_Init:
		call	Main
		invoke	ExitProcess
		ret		
...

PROC MAIN
	...
	ret 
ENDP

This will define "App_Init" label as the entry point of your program.

Notes:

for binary output the entry point is usually defined by your loader code
The entry point symbol is also made PUBLIC for OBJ output formats.
you can also use ENTRY (without dot) instead of ".ENTRY"

3.6 Base address, ORG and DISP

Those keywords allow you to setup the address where your code will or should be placed in memory at run time.

3.6.1 Base address

The default base address is setup like this:

PE32 or PE64 will start at 0x40_1000
Binary will start at 0x0000
DLL will start at 0x1000_0000
OBJ will start where the linker decides (OMF,COFF,ELF)

You can setup a base address like this:

Syntax:

	BASE	{absolute_or_virtual_address}

Example:

	BASE	02000_0000h

This will make your executable base address start at 512M. The base address is the address of the first section. Each section is aligned at 4K and PE files also have an additional 4K header before the first section starts.

BASE has the same effect like an ORG and an DISP with the same value.

3.6.2 ORG

If you want a piece of code to be located at an absolute address (for OS development) or if you want to "jump around" inside code positions you can use the ORG directive:

Syntax:

	ORG	{absolute_address}

The ORG directive moves both the current address counter and output pointer in output file to the specified address.

Code and data will be generated at the new address and offset in output file.

3.6.3 DISP

DISP directive will move the output pointer backward a certain amount. the reason for this is to avoid having zeroes at start of output file after an ORG directive.

Syntax:

	DISP	{negative_move_size}

For example:

	org	0B000h
	disp	0B000h

Are the first lines in Solar_OS System32 module.

This means that code is made to run at absolute address 0xB000 but the output pointer remains at:

	output_position = + B000h (because of org) - B000h (because of disp) = 0

And this way the generated binary will not contain 0xB000 zeroes or garbage at start of file.

3.7 Encoding Modes

Sol_Asm can encode 16bits, 32bits and 64bits ASM.

You switch between 32/64 bits encoding with:

	.USE16		- encode 16 bits
	.USE32		- encode 32 bits (default)
	.USE64		- encode 64 bits

Note:

by default SOL_ASM will start in 32 bits mode even without .USE32
16 bits and 64bits encodings are tested less offten.

3.8 Include files

Your main asm file can include other asm files and so on.

Syntax:

	include		{include_path_and_file_name}

Or alternatively you can include binary files.

Syntax

	incbin		{include_path_and_file_name}

You can also fine tune the binary include:

Syntax

	incfrom		{start_pos}, {size}, {include_path_and_file_name}

In this case all 3 parameters must be present.

Include size can be "?" if you want to include until the end of the file.

Examples:

	incfrom		512,1027,help2.txt	; skip 512 bytes, include 1027 bytes
	incfrom		128,?,help2.txt		; skip 128 bytes, include rest of file

Notes:

You must use quotes around include_path_and_file_name if it does contain spaces.

Chapter 4. Language elements

4.1 Numbers

SOL_ASM accepts numbers in the following formats:

binary
decimal
hexadecimal
floating point numbers

Notes:

Decimal is the default base.
Binary numbers must have the "b" suffix
Hexadecimal numbers must have the "h" suffix
All numbers can contain an "_" (underscore) character as a visual group separator.

Examples:

	111_000_101b		- binary
	0FFFF_C0_00h		- hexadecimal
	1_000_000		- decimal

	10.2345			- floating point

Numbers do not have to start with "0" or a digit but that is good practice.

Limitations:

Floating point numbers can only be used with real4, real8 and real10 data definitions.

4.2 Expressions

Expressions are statements like:

	(5+4)*7
	((PCI_DEVICES_MAX*PCI_ITEM_SIZE)/(4096+4096))+1
	((ETH_RX_APPS_MAX*4)/4096)+1
	<1 SHL 5>

Expressions can contain Numbers, Operators, Braces and Symbols

Operators are:

Operator Description Priority

"*" multiplication 1

"/" division 1

+ addition 2

- subtraction 2

x SHL n shift x left n times 1

x SHR n shift x right n times 1

x ROL n rotate x left n times 1

x ROR n rotate x right n times 1

x XOR y binary XOR 1

x AND y binary AND 1

x OR y binary OR 1

NOT x binary NOT 2

- unary minus 1

RND N obtain a random number in range [0...N] 1

Operator	Description	Priority
"*"	multiplication	1
"/"	division	1
+	addition	2
-	subtraction	2
x SHL n	shift x left n times	1
x SHR n	shift x right n times	1
x ROL n	rotate x left n times	1
x ROR n	rotate x right n times	1
x XOR y	binary XOR	1
x AND y	binary AND	1
x OR y	binary OR	1
NOT x	binary NOT	2
-	unary minus	1
RND N	obtain a random number in range [0...N]	1

Variables recognized in expressions

Variable Description Priority

$ current address 2

$adr current address 2

$$ current section base addr 2

$$$ format base addr 2

$ofs current offset in section 2

$rva symbol RVA of symbol 2

$pass current pass nr 2

$style token token style (token, string modrm) 2

$type token token type (register, label, etc) 2

$size token token size (8,16,32,64 bits) 2

$value token token value (reg code, label addr) 2

Variable	Description	Priority
$	current address	2
$adr	current address	2
$$	current section base addr	2
$$$	format base addr	2
$ofs	current offset in section	2
$rva symbol	RVA of symbol	2
$pass	current pass nr	2
$style token	token style (token, string modrm)	2
$type token	token type (register, label, etc)	2
$size token	token size (8,16,32,64 bits)	2
$value token	token value (reg code, label addr)	2

Limitations:

Currently expressions may not contain spaces unless the whole expression is enclosed in between < > or { }
Symbols used in expressions must be already defined at time of evaluation.

For example:

	< 1 SHL 5 >

The above expression does contain spaces and was therefore enclosed in < and >

4.3 ModRM Expressions

Are expressions used by CPU in complex effective address calculations. Those kind of expressions are handled differently from normal expressions by Sol_Asm.

The generic layout is:

Syntax

	[{base_reg} + {scale}*{index_reg} + {displacement}]

Example

	mov	eax,[esi + 4*ecx + 1234h]
	mov	eax,[esi + INFO_CTX.name_len]

In the first example above:

	base_reg 	= esi
	scale		= 4
	index_reg	= ecx
	displacement	= 1234h

As per CPU specifications scale can be missing or: 2,4,8 only

Limitations:

the expression must or should be layout exactly like this: [base_reg + scale*index_reg + displacement].
Variations of above layout might not be supported
negative index can not be encoded by the CPU: ie [esi - ecx] is invalid

Notes

during development the above limitations will be relaxed when possible.
Sol_Asm can extract argument size when structures are used in ModRM
Sol_Asm can encode DELTA trick expressions as relative in ModRM
multiple displacements are allowed

4.4 Small strings

This is special kind of string that can be used as an instruction operand.

For example:

	mov	eax,"abcd"
	cmp	al,"-"

You can use the SWAP modifier to reverse the string

For example:

	cmp	eax," rox"		; compare with "xor " in reverse because of endian issues
	cmp	eax, swap "xor "	; same as above but much easier to read

Limitations:

The string can be maximum 4 bytes long (4 ASCII chars) in 32 bits

4.5 User defined symbols

They are used as names for labels, procedures, etc in the program.

For Example:

	System32_Start:

User defines symbols are case sensitive and can contain underscores "_" digits and special characters but can not contain: CR, LF, space "<>" and comma.

They do not have to start with a letter... (but that is good practice).

The max symbol size is 128 bytes.

4.5 Comments

4.5.1 Line comments

This kind of comments start with ";" character and extend until the end of line.

For example:

	; this is a single comment on a line

	mov	eax,1		; this comment is at end of line

4.5.2 Block comments

Block comments are made with: "/*" and "*/"

For example:

; comment out this debug code
/*
	ODS_str	<13,10,"+++ Equ_Create">
	ODS_token
	
	; notice here how block comments can be nested
	/*
	ODS_fmt	<13,10,09,"equ_create: value=%x">, eax
	*/

*/

Note:

Block comments can be nested and commented by ";" .

4.6 Very Long Lines

You can continue a long line on the next line with the "\" symbol.

For Example:

invoke	CreateWindowExA,0,class_name,wnd_title,\
			WS_OVERLAPPED+WS_CAPTION+WS_SYSMENU+\
			WS_THICKFRAME+WS_MINIMIZEBOX+WS_MAXIMIZEBOX,\
			32,64,320,240,0,0,[module_handle],0

Note:

SOL_ASM does not have a text line size limit and can assemble huge line sizes.

4.7 Keywords

Keywords are:

OPCODE MNEMONICS
CPU Register names
SOL_ASM directives.

For Example:

	MOV, XOR, ADD, SUB, JMP, CALL		- are opcode mnemonics 
	EAX, ECX, ST0, MM0, RAX, XMM1		- are register names
	PROC, STRUC, .entry, ORG, INVOKE	- are SOL_ASM directives

Keywords are case insensitive.

4.8 Special symbols

SOL_ASM rarely treats a symbol in a special way.All symbols are born equal :P However there are exceptions:

Special Character	Description	Notes
SPACE, TAB or ","	are used as separators for tokens	can not be part of user tokens
CR, LF	line end and separators for tokens	can not be part of user tokens
":"	it defines a code label when used as suffix after a user symbol	can be part of user tokens
"$"	means current address counter	can be part of user tokens
"?"	means "do not care" / "non initialized" in data define statements	can be part of user tokens
"."	hints a section selection when followed by a section name means a structure member name separator in {structure}.{member}	can be part of user tokens
" " (double quotes)	encloses a string	can be part of user tokens
' ' (single quotes)	encloses a string	can be part of user tokens
< >	means LITERAL, multiple tokens enclosed by < > and separated by spaces or comma will be considered as one ;)	has use restrictions
"[" "]"	encloses a Mod_RM address expression	has use restrictions
"{" "}"	used to enclose structure initializations statements also used instead of < >	can be part of user tokens but has use restrictions

The following symbols have a special meaning only inside a MACRO body:

Special Character Description Notes

"@" means MLOCAL when used as a prefix inside a MACRO can be part of tokens

"&" triggers MARG check and expansion even when inside a token can be part of tokens

Special Character	Description	Notes
"@"	means MLOCAL when used as a prefix inside a MACRO	can be part of tokens
"&"	triggers MARG check and expansion even when inside a token	can be part of tokens

As you can see Sol_Asm is relatively tolerant toward the use of special symbols inside user defined tokens.

4.8.1 Build Time

The special symbol "$time" means current build time in OS_TIME format and it creates a data definition.

STRUC OS_TIME
	year		dw	?
	month		dw	?
	day_of_week	dw	?
	day_of_month	dw	?
	
	hour		dw	?
	minute		dw	?
	second		dw	?
	mili_sec	dw	?
ENDS

When include in source this data definition will be updated by SOL_ASM at each compile time.

For example:

build_time:
	;-------------------------------------
	; compile time symbol, 
	; value is filled in by assembler
	;-------------------------------------
	$time				
	db	0

Chapter 5. Data definitions

5.1 Define initialized data

You can define initialized data like this:

Syntax:

	{label_name}	db	{data_item}	; define byte 	8 bits
	{label_name}	dw	{data_item}	; define word	16 bits
	{label_name}	dd	{data_item}	; define dword	32 bits

	{label_name}	dq	{data_item}	; define qword	64 bits
	{label_name}	dt	{data_item}	; define tword	80 bits
	{label_name}	do	{data_item}	; define oword	128 bits

Notes:

Each data definition keyword accepts multiple items on the same line
{label_name} is optional

Additionally "db" does accept ASCII strings.

For example:

	my_dwords	dd	1,2,7,0FACE_BABEh,1356789,11

	my_string	db	"This is a message",0

You can use the "?" special character to define non initialized data.

For example:

	my_var_db	db	?
	my_var_dw	dw	?
	my_var_dd	dd	?

Limitations:

DT and DO are not correctly initialized for now.

5.2 Define unicode strings

You can define an unicode string like this:

Syntax:

	{label_name}	du		"type your utf-8 string here",0

The parser will read and interpret utf-8 encoded code points from the quoted string and translate them to 16 bits words.

5.3 Define Floating point data

Syntax:

	{label_name}	real4		{data_item}	; define REAL4 number	- 32bits
	{label_name}	real8		{data_item}	; define REAL8 number	- 64bits
	{label_name}	real10		{data_item}	; define REAL10 number	- 80bits

For example:

	test1		real4		10.2345
	test2		real4		0.7785
	test3		real4		1_277_789.534
	test4		real4		999_123_456_789.37

	test1b		real8		10.2345
	test2b		real8		0.77854321773
	test3b		real8		1_277_789.534
	test4b		real8		999_123_456_789.37

	test1c		real10		10.2345
	test2c		real10		0.77854321773
	test3c		real10		1_277_789.534
	test4c		real10		999_123_456_789.37

SOL_ASM performs real number conversions into the highest floating point precision available (80bits) and stores the result in requested format. Because of this "test4" above can not retain all defined digits but "test4c" can do it.

5.4 Reserve non initialized data

You can reserve data with the following keywords:

	{label_name}	rb	{count} 	; reserve byte(s)  =   8 x  bits
	{label_name}	rw	{count} 	; reserve word(s)  =  16 x  bits	
	{label_name}	rd	{count} 	; reserve dword(s) =  32 x  bits
	{label_name}	rq	{count} 	; reserve qword(s) =  64 x  bits
	{label_name}	rt	{count}		; reserve tbytes   =  80 x  bits
	{label_name}	ro	{count}		; reserve owords   = 128 x  bits

You can reserve structures like this:

	rs	{structure_name},{count} 	- reserve {count} structures

For example:

	rb	1024		; reserve 1024 bytes	
	rw	17		; reserve   17 words
	rd	23		; reserve   23 dwords
	rs	WNDCLASS,77	; reserve   77 WNDCLASS structures

5.5 Fill data buffers

You can fill initialized data buffers with the following keywords:

 {label_name}	fb	{count},{fill_value} 	; fill bytes	
 {label_name}	fw	{count},{fill_value} 	; fill words		
 {label_name}	fd	{count},{fill_value} 	; fill dwords	
 {label_name}	fq	{count},{fill_value} 	; fill qwords

And for structures:

 {label_name}	fs	{struc},{count},{fill_value}	; fill structures

5.6 Structure data definitions

Structure definitions are automatically promoted as data types and you can define a structure like this

Syntax:

	{label_name} {structure_name}	{data_item}

For example:

	my_class	WNDCLASS	?	
	my_ps		PAINTSTRUCT	?

Defines one WNDCLASS structure at label "my_class" with initial value unknown, and one PAINTSTRUCT structures at label "my_ps".

5.7. Structure Member data initializations

Considering the structure:

	STRUC POINT_3D
		x	dd	?
		y	dd	?
		z	dd	?
	ENDS

You can initialize structure members like this:

	my_pt_1		POINT_3D	{ 1 2 3 }				

	my_pt_2		POINT_3D	{  y = 2  z = 7  x = 1 }

	my_pt_3		POINT_3D {  
					x = 2  
					y = 7  
					z = 1 
				}

The first version initializes structure members in sequence of their definition.
The second version initializes structure members by name.
The third version shows how you can extend a structure initialization on multiple lines without the use of "\" character.

You can initialize sub structure members by name like this:

	STRUC POINT_2D
		x	dd	?
		y	dd	?
	ENDS
	
	STRUC CIRCLE_2D
		color		dd	?
		center		rs	POINT_2D,1
		radius		dd	?
	ENDS	
	
	my_circle_var	CIRCLE_2D {
					color = 00_7F_FF_3Fh
					center.x = 100
					center.y = 200
					radius = 77
				}

You can nest {} like this

	
	my_circle_var	CIRCLE_2D {
					color = 00_7F_FF_3Fh
					{ 100 200 }
					radius = 77
				}
	; or by name
	my_circle_var	CIRCLE_2D {
					color = 00_7F_FF_3Fh
					{ x = 100 y = 200 }
					radius = 77
				}

You can also use {} for members that are made of multiple items but are not typed as structures (RB, RW, RD, RQ, RO) like this:

	
struc GUID
	dd1	dd	?
	dw1	dw	?
	dw2	dw	?
	bytes	rb	8
ends

my_guid	GUID { aaaa_bbbbh,cccch,ddddh { 1 2 3 4 5 6 7 8 } }

Chapter 6. General Code Syntax

SOL_ASM follows Intel style ASM syntax as opposed to AT && T syntax. The syntax reflects my personal preferences resulted from doing extensive applications in ASM. The following sub chapters will present the most notable syntax issues...

6.1 Default ASM instruction syntax

Each ASM source line haves this default layout:

Syntax:

	{label} {instruction} {parameter1} , {parameter2} {; comments}

For example:


read_pixel:	mov	eax,[esi]	; 32 bits ARGB format

All elements can be missing but if a {parameter} is present then {instruction} must also be present.

Directives do not have to follow this syntax.

6.2 Offset keyword and use of []

There is no "offset" keyword. The name of a variable or label automatically means "offset of" As a consequence you must always use brackets for obtaining "contents of" a variable.

For example:

	.data
		my_var		dd	37h
	.code
		mov	esi,my_var
		mov	edx,[my_var]
		mov	edx,[esi]		
	...

In the above example the first MOV will fill ESI register with the offset of my_var (ie with 0x402000 for example)

The second MOV will fill EDX with the content of my_var (ie with 37h for example). The 3rd move will do the same but by using esi as a pointer to my_var.

Notice the similarity of the second MOV with: MOV EDX,[ESI]

6.3 Size overrides

When needed or when wanted the user can override the operand size of encodings.

Available overrides are:

	byte		- force   8 bits
	word		- force  16 bits
	dword		- force  32 bits
	qword		- force  64 bits
	tbyte		- force  80 bits
	oword		- force 128 bits

	small		- force low word of symbol

6.4 Structure members

Let us assume we have defined the following structure:

	STRUC INFO_CTX
		info_name		rb	128
		info_dword		dd	?
		info_word		dw	?
		info_byte		db	?	
	ENDS

And then we reserve a vector of 1024 such structures:

	my_info		rs	INFO_CTX,1024

Then the following rules apply for accessing structure members:

	mov	esi,my_info
	mov	eax,[esi + INFO_CTX.info_dword]		
	mov	[esi + INFO_CTX.info_word],2		; will move WORD 2
	mov	[esi + INFO_CTX.info_byte],1		; will move BYTE 1

	; go to next item in vector
	add	esi, size INFO_CTX

Observe how the structure member size will hint instructions for operand size when possible. This greatly reduces the need for "dword / word / byte" modifiers.

For example:

	movzx	eax,[esi + INFO_CTX.info_byte]
	movzx	eax,[esi + INFO_CTX.info_word]

is equivalent to:

	movzx	eax,byte [esi + INFO_CTX.info_byte]
	movzx	eax,word [esi + INFO_CTX.info_word]

But you do not have to use "byte" and "word" hints because of the structure that provides this information.

However in this example:

	mov	byte [esi],4

SOL_ASM will require the "byte" user size override / hint because there is no structure member hint available

6.5 Multiple instructions on the same line

You can write multiple assembly instructions on the same line. Sol_Asm will know when one instruction ends and the next one starts.

For Example

	push ebx  	push esi	push edi

	; init
	mov eax,1 	mov ecx,17	mov ebx,3
loop:
	xor ecx,ebx  	sub ebx,edx
	dec ecx 	jnz loop
 
	pop edi 	pop esi 	pop ebx

Notes:

This is not considered a good coding practice for ASM. However sometimes is useful. Use it with caution and moderation.
Directives that have an unknown number of parameters (INVOKE, ";", DB, DD, etc) will disable this feature for their line.

6.7 Empty Spaces

In this development stage Sol_ASM can be very annoying about white spaces requirements. This behaviour is in part because the parser always considers spaces as token separators no mater what. This helps parsing speed and eases debugging but it also makes some problems.

It is my intention to remove those limitations in later versions but for now you will have to know and respect them

6.7.1. Expressions and spaces

The expression parser doe shandle white spaces but the high level tokenizer does break expressions on spaces and because of this you must avoid spaces in expressions or if you need spaces then enclose the whole expression in < and > or { and }

;---------------
; this is OK
;---------------
mov	eax, (7*4)+(5*PACKET_SIZE)	; this is an expression with no spaces inside
mov	ecx, WND_CHILD+WND_MINIMIZE	; this is an expression	with no spaces inside
mov	ecx, size MY_STRU		; this is not an expression
mov	al, byte [esi]			; this is not an expression

;--------------------------------------------------------------
; this is NOT OK because expressions can not contain spaces
;--------------------------------------------------------------
mov	eax,(7*4) + (5 * PACKET_SIZE)
mov	eax,1 SHL 18					; this expression needs spaces
mov	ecx,WND_RESIZE OR WND_CHILD OR WND_MINIMIZE

;-----------------------------------------------
; this is made  OK by the use of < and >
;-----------------------------------------------
mov	eax, < (7*4) + (5 * PACKET_SIZE) >
mov	eax, < 1 SHL 18 >
mov	ecx, < WND_RESIZE OR WND_CHILD OR WND_MINIMIZE >

Notes for expressions

as a rule of thumb whenever you need to have a space inside an expression you cand use the < ... > operators to allow this.
some expression operators (SHL, SHR, $RVA, etc) DO need spaces and hence automatically require the use of < > arround expressions
the "(" and ")" used in expressions DO NOT need spaces arround them
and alternative for < > is { }

6.7.2 .IF and Spaces

Runtime conditionals like .IF or .While or .Repeat do need spaces arround: paranthesis, conditions and logical operators.

;---------------
; this is OK 
;---------------
.if ( eax == 1 .and. ebx == 5 ) .or. ( [status] == 1 .and. [errors] == 0 )
	...	
.endif


;------------------
; this is NOT OK
;------------------
.if (eax==1 .and. ebx==5).or.([status]==1.and.[errors]==0)
	...	
.endif

;-----------------------------------------------------------------
; here use {} because < and > are conditional operators also
;-----------------------------------------------------------------
.if ( eax < { 7FFFFh SHR 5 } ) .and. ( edx > { 1 SHL 7} )
	...	
.endif

Notes for .IF

All .IF conditions ( "==" "<=" "!=" ) and logical grouping ( ".or." ".and." ) DO need spaces arround them
The "(" and ")" used in .IF logical grouping DO need spaces arround
rules for no spaces in expressions still apply
< and > are also operators and can NOT be used in .IF conditions to group tokens.
you can use { and } to group tokens with spaces inside .IF

Chapter 7. Directives

7.1 EQUATES

Syntax:

	{symbol_name}	EQU	{value or expression}

Examples:

	equ1			equ	40
	ETH_RX_APPS_MAX		EQU	1024
	ETH_MEM_BLOCKS		EQU	((ETH_RX_APPS_MAX*4)/4096)+1	
	equ_28			EQU	< 1 SHL 28 >

Equates can not be redefined or double defined. However you can use the assignment operator for this:

Syntax:

	{symbol_name}	=	{value or expression}

Examples:

	x = y + 1
	y = 7

Note

Sol_Asm will try to iteratively solve the value of symbols. Take care not to create infinite loops.

For example the folowing code will force Sol_ASM to make 8 passes until y = 7 and no longer changes it's value

#if $pass == 1
	y = 0
#endif

#if y < 7
	y = y+1
#endif

#echo " y=%x",y

7.2 LABELS

Labels are defined in two modes:

1) by the ":" colon operator
2) by a data definition line

Syntax:

{label_name}:
{label_name}	{data definition keyword}	{data_items}

For example:

	mov	ecx,nr_of_items
	mov	esi,items_ptr
my_loop:
	; perform some actions here
	add	[esi+ITEM.quantity],1
	
	; next item
	add	esi,size ITEM
	dec	ecx
	jnz	my_loop

In the above code sequence "my_loop" is a code label and serves as a target for the JNZ instruction.

.data
	my_account_balance	dd	1234_5678h
.code	
	mov	ecx,nr_of_invoices
	mov	esi,invoices_ptr
my_loop:
	; perform some actions here
	mov	eax,[esi+INVOICE.total]
	sub	[my_account_balance],eax
	
	; next invoice
	add	esi,size INVOIVE
	dec	ecx
	jnz	my_loop

In the above code sequence "my_account_ballance" is a data label and serves as a parameter for the SUB instruction.

7.2.1 Labels scope

Labels defined outside of a procedure are global in name scope. Global labels can not be double defined.

Labels defined inside PROC ... ENDP construct are local in namespace to the procedure. Hence there can be multiple labels with the exact same name as long as they reside in different procedures.

7.3 STRUCTURES

Structures are defined like this:

Syntax:

STRUC {structure_name}
	{member_name1}	{data_definition_keyword}	{data_item}
	...
	{member_name2}	{data_reserve_keyword}		{count}
	...
ENDS

For example:

	
	STRUC ETH_PACKET
		packet_ptr		dd	?
		packet_id		dd	?
		packet_mac_src		rb	16
		packet_mac_dest		rb	16
	ENDS


	STRUC ETH_DRV
		drv_id			dd	?
		drv_name		rb	128
	
		status			dd	?
		
		packets_buff		rs	ETH_PACKET,1024
	ENDS

As you can see structures can contain other structures. Once a structure is defined it can be used in subsequent data definitions.

Access to it's members can be done like this

	.data
		my_eth	ETH_DRV 	?
	.code
	
	; via pointer
	mov 	esi, my_eth
	mov	eax,[esi + ETH_DRV.status]
	
	; or by direct access
	mov	[my_eth.status], 1
	mov	eax,[my_eth.status]

You can define a LOCAL variable in a PROC as having STRUC type and access it like this:

	PROC my_proc stdcall
		ARG arg1, arg2
		LOCAL my_eth :ETH_DRV, wc :WNDCLASSEX
		
		; note the space between my_eth and :ETH_DRV it is required now
		mov	[my_eth.status],2
		mov	[wc.cbSize], size WNDCLASSEX
		
		; or via pointer
		lea	esi,my_eth
		mov	[rsi+ETH_DRV.status],4
		ret
	ENDP

Structure size can be obtained like this:

	add	esi, SIZE ETH_DRV

Also you can obtain the offset of a member inside a structure like this:

	mov	eax, ETH_DRV.eth_status

Note:

The SIZE operator is accepted pro-forma.
In fact the name of a structure means it's size.
This is important for expressions where SIZE operator is not accepted but structure names are.

Hence this code is also valid:

	mov eax, ETH_DRV

And it will move the size of ETH_DRV structure into eax.

For clarity reasons the use of SIZE is recommended whenever possible.

You can access structure members like this:

Example:

.data
	my_driver 	rs	ETH_DRV,16
.code
	mov	esi,my_driver
	mov	eax,[esi + ETH_DRV.packets_buff.packet_id]
	...

7.3.1 UNIONS

You can define unnamed UNIONS inside a structure.

Syntax:

	UNION
		{member_name1}	{data_definition_keyword}	{data_item}
		...
		{member_name2}	{data_reserve_keyword}		{count}
	ENDU

For Example:

	struc pixel_format
		flags1   dd   ?

		union
			r_mask   dd   ?
			y_mask   dd   ?
			union
				rx_mask      dd   ?
				ry_mask      dd   ?
			endu      
		endu

		flags2   dd   ?

		union
			g_mask   dd   ?
			u_mask   dd   ?      
		endu
	ends

And you can access any UNION member just like any other structure member.

7.4 PROCEDURES

Procedures are defined like this:

Syntax:

PROC {proc_name} {proc_call_convention_type}
	USES	{uses_list}
	ARG	{arg_list}
	LOCAL	{local_list}

	; some code

proc_label:	
	...

	ret
ENDP

For example:


	PROC Test_01 stdcall
		USES	esi,edi
		ARG	wnd_handle, wnd_action
		LOCAL	count, my_var1, my_var2


		mov	esi,[wnd_handle]
		mov	ecx,100

	loop_here:
		mov	eax,[esi]
		test	eax,eax
		jz	finish

		add	[count],eax

		dec	ecx
		jnz	loop_here

	finish:
		mov	eax,[count]
		ret
	ENDP

Known calling conventions

SOL_ASM will automatically generate PROLOGUE and EPILOGUE code and will generate code for handling of USES, ARG and LOCAL variables as needed.

Known calling conventions are:

STDCALL
CDECL
WIN64
LIN64
NOFRAME

Additionally you can use the "varg" statement to mark a procedure that uses variable arguments count.

Notes:

INVOKE will check parameter count as defined by PROC statements
RET inside a PROC is a kind of macro that will trigger EPILOGUE code generation
you can avoid EPILOGUE code generation by using RETI pseudo mnemonic (return immediate)

For PROC's defined as NOFRAME Sol_Asm will not emit prologue and epilogue code but will emit PUSH/POP code for USES statements if present. In this case you should write the prologue and epilogue code yourself.

Default arguments and locals sizes:

word for 16 bits
dword for 32 bits
qword for 64 bits

This can be overwritten if they have a structure type like this:

	PROC Test_02 stdcall
		USES	esi,edi
		ARG	wnd_handle, wnd_action
		LOCAL	my_var2 :MCTX  l_point :POINT_3D
		...
		ret
	ENDP

7.4.1 PROC Local buffers

You can define a local procedure buffer like this:

	PROC Test_02 stdcall
		USES	esi,edi
		ARG	wnd_handle, wnd_action
		LOCAL	my_var1,  my_buff [32],  my_var2 :MY_CTX [32]
		...
		ret
	ENDP

This will define a buffer of 32 dwords starting at "my_buff" and a 32 * SIZE MY_CTX buffer / vector at my_var2

Notes:

Do not assume that local variables are layout on stack in "natural" order.
in x64 with win64 ABI calling convetion a "spill" is usually needed for non-leaf functions and you must do it by hand at start of proc (this will change)

For example: in PROC Test_02 above incrementing address from "my_buff" will hit "my_var1" and not "my_var2"

	PROC Wnd_Proc1 win64
		ARG	hwnd, wmsg, wparam, lparam
		LOCAL	tmp_hdc

		;-------------------------
		; spill is usually needed
		;-------------------------
		mov     [hwnd],rcx
		mov     [wmsg],rdx
		mov     [wparam],r8
		mov     [lparam],r9
		
		...
		
		ret
	ENDP

Notes:

7.5 INVOKE

Procedures or imported functions can be used with INVOKE syntax:

Syntax:

	INVOKE {function_name}, {param1},{param2}, ... {param_N}
	
	; or with dynamic function in register
	mov	rbx,[my_function]
	INVOKE	{abi_name},rbx,{param1},{param2}, ..., {param_N}

Known ABI names:

stdcall (default)
win64
lin64
cdecl

Notes:

INVOKE can be used with symbols defined AFTER it.
INVOKE does NOT require PROTO before use
INVOKE does parameter checking based on PROC later definitions
INVOKE can be used with unknown methods (invoke [eax] ,...)
You can use STDCALL in x64 (for your own OS for example)

For example:

	invoke	Str_Printf,ods_fname,ods_fname_fmt,[pass_nr]
	invoke	OS_File_Create,ods_fname
	mov	[ods_fhandle],eax

	mov	eax,[My_Dynamic_Proc]	
	invoke	stdcall,eax,ecx,edx
	
	; use ADDR to get the address of a local variable in a PROC
	PROC my_proc stdcall
		ARG 	arg1, arg2
		LOCAL	wc :WNDCLASSEX
		
		mov	[wc.cbSize],size WNDCLASSEX
		...
		invoke	RegisterClassA, ADDR wc
		...
		
		ret
	ENDP

Depending on each procedure definition or import function hints Sol_ASM will handle calling conventions details.

CINVOKE

CINVOKE is a variation for invoke that will assume CDECL convention and will not perform parameter count checking.

7.6 MACROS

SOL_ASM contains a MACRO processor that supports nested and recursive macros with VARARG and checked arguments.

7.6.1 Define MACRO

A MACRO is defined like this:

Syntax:

	MACRO {macro_name}
		MARG	{ marg_list [:REQ] [:VARARG] }

		; some code

	@macro_label:
		...

	ENDM

For example:

;---------------------------------------
; define and output a simple string
; note: @ means local symbol for macros
;---------------------------------------
MACRO ODS_str
	MARG	mpar1
	#ifdef SHOW_DEBUG

		jmp	@over1
			@mstring1	db	mpar1,0
		@over1:

		pushad
		invoke	Str_Len,@mstring1
		invoke	OS_File_Write_Dbg,[ods_fhandle],@mstring1,eax
		popad

	#endif
ENDM

And can then be used like this:

ODS_str	<13,10,"-------- Listing Sections -------">

Notes:

SOL_ASM does perform MACRO parameter count checking
:VARARG is needed for macros with an unknown number of arguments
Observe the #ifdef usage inside macros:
- you could change parameters with it
- you can use macro arguments as #ifdef parameters

Inside a MACRO the "@" prefix means that the symbol is local to this MACRO and will get a different name each time the MACRO is expanded.

Limitations:

:= is not yet supported for MARG (default arguments values)
there are no string operators available yet (but you can use DEFINE / TEQU operators)

7.6.2 VARARG MACRO's

A macro can have a variable number of arguments.

For example:

;---------------------------------------
; define and output a formatted string
; note: @ means local symbol for macros
;---------------------------------------
MACRO ODS_fmt
	MARG	mfmt, arg_list :VARARG

	jmp	@over1	
		@mstring1	db	mfmt,0
	@over1:

	pushad
	invoke	Str_Printf, sz_buff1, @mstring1, arg_list
	invoke	OS_File_Write_Dbg, [ods_fhandle], sz_buff1, eax
	popad

ENDM

And can then be used like this:

ODS_fmt	<13,10,"Section:%u RVA=%x VSIZE=%x Name=%s">,ecx,[esi+PE_SECT.rva],[esi+PE_SECT.vsize],esi

7.6.3 MACROS with :REQ

The ":REQ" MARG type can be used to force MACRO parameter number check up to a specific argument position.

For example:

MACRO MTEST
	MARG	a1,a2,a3,a4 :req , a5

	mov	eax,a1
	mov	ebx,a2
	mov	ecx,a3
	mov	edx,a4

ENDM

On macro invocation this will check for 4 macro arguments. And because of this "a5" can be missing but "a4" can not.

7.6.4 Nested MACROS

You can define a macro inside another macro... and so on.

For example:

MACRO M2 
	MARG arg1,arg2

	mov	eax,arg1

	MACRO M3 
		MARG arg3,arg4
		mov	eax,arg3
		push	arg4
	ENDM
	
	push	eax
	push	arg2

ENDM

On first invocation of M2 only it's body will be generated and M3 will be defined but not expanded.

7.6.5 Using "&" in MACROS

In MACRO body the "&" character will trigger a MARG check and expansion even if found in the middle of another token or string.

For example

MACRO M4
	MARG arg1 arg2

in_label_&arg1:
	mov	eax,<&arg1>
	db	" In strings: &arg2",0

ENDM

7.6.7 Recursive Macros

A macro can invoke itself recursively.

For example:

MACRO MPUSH
	MARG	p1,p2,p3,p4

	#ifnb <&p1>
		push	p1
		MPUSH	p2,p3,p4
	#endif
ENDM

Notes:

Take care to provide a mechanism to stop recursion.

7.6.7 Using EXITM in macros

EXITM can be used to return a token from a MACRO expansion.

For example

MACRO RV
	MARG func, params

	invoke func,params

	; return something from macro
	exitm eax
ENDM

; later on in code
	...
	mov	ecx,RV GetModuleHandle
	invoke	ExitProcess, < RV GetModuleHandleA >
	push	RV GetModuleHandleA
	...

Note:

EXITM can also be used to exit a macro prematurely

7.6.8 The REPT Macro

You can use REPT to repeat a series of instructions.

For example

	x = 7

	REPT 12
		shl	eax,x
		add	ecx,3
		x = x+1
	ENDM

7.6.9 The FOR Macro

You can use FOR to repeat a series of instructions for each item in a list.

Syntax:

	FOR {item} IN: {items list} {REV} DO
		{ for macro body }
	ENDM

Sol_Asm will expand the {for macro body} for each element in {items list} and will replace any occurrence of {item} in the {macro body} with current {items list} element.

The "REV" keyword is optional and if present then the {items list} will be parsed in reversed order.

FOR can be used to iterate the variable parameters of a MACRO.

For example

MACRO my_invoke
	MARG func :req, params :vararg

	FOR item IN: params REV  DO
		push   item
	ENDM

	call	func
ENDM

The above sample will define your own INVOKE like macro and you can later on use it like this:

	my_invoke	My_Func,eax,0,1,"123",[ecx]

7.7 Conditional Assembly

You can conditionally eliminate a block of source code at compile time by using the following directives:

Directive Description

#ifdef {symbol} if symbol is defined

#ifndef {symbol} if symbol is not defined

#ifb {token} if token is blank

#ifnb {token} if token is not blank

#if_used {symbol} if symbol is used in code

#if_not_used {symbol} if symbol is not used in code

#if {condition} if condition is true

Directive	Description
#ifdef {symbol}	if symbol is defined
#ifndef {symbol}	if symbol is not defined
#ifb {token}	if token is blank
#ifnb {token}	if token is not blank
#if_used {symbol}	if symbol is used in code
#if_not_used {symbol}	if symbol is not used in code
#if {condition}	if condition is true

Syntax:


	#ifdef {symbol_name}

		; code block for true
		....
	#else
		; code block for false

	#endif

For example:


	;------------------------------------------------
	; this checks the command line /binary option
	;------------------------------------------------
	#ifdef /binary
		org	0B000h
		disp	0B000h
	#endif
	
	#if $ >= 512
		#echo "boot sector address overflow: %x", $
	#endif

Observe how command line options get auto promoted as EQU symbol and can be tested by #ifdef

#ifdef can be nested on multiple levels so the following example is valid also.

	#IFDEF INTEL
		mov	eax,1
		#IFDEF WIN32
			mov	esi,32h
			#ifdef	LUCKY
				mov	ecx,33h
			#else
				mov	ecx,11h
			#endif
		#ELSE
			mov	esi,16h
		#ENDIF
		mov	edi,88h
	#ELSE
		mov	eax,2

		#IFDEF WIN32
			mov	esi,32h
			#ifdef	LUCKY
				mov	ecx,33h
			#else
				mov	ecx,11h
			#endif
		#ELSE
			mov	esi,16h
		#ENDIF	
		mov	edi,77h
	#ENDIF

7.8 Runtime High Level .IF and friends

You can use runtime high level .IF .ELSEIF .ELSE .ENDIF constructs in SOL_ASM.

Sol_ASM will generate the needed compare, jump code and labels internally. This internal code generation is preformed much faster than a MACRO can do.

Syntax:


	.IF {operand1} {condition_a} {operand2}

		; code block for {condition_a} true
		....

	.ELSEIF  {operand3} {condition_b} {operand4}

		; code block for {condition_b} true
		...

	.ELSE
		; code block for all above conditions false
		...

	.ENDIF

For example:

	.if [parse_mode] == 1
		.if [parse_status] == 1
			mov	ecx,1
		.elseif [parse_status] == 2
			mov	ecx,2
		.elseif [parse_status] <= 7
			mov	ecx,7
		.else
			mov	ecx,-1
		.endif
	.elseif [parse_mode] == 2
		mov	edx,2
	.elseif eax == swap "xor "
		mov	edx,7
	.else
		mov	edx,-1
	.endif

Known condition operators are:

Operator Description Flag checked

"==" equal ZF = 1

"!=" not equal ZF = 0

"<" unsigned smaller CF = 1

">" unsigned greater (NBE)

"<=" smaller or equal (BE)

">=" greater or equal (NC)

"zero?" Z flag (Z)

"zero?" not Z (NZ)

"carry?" Carry (C)

"!carry?" not Carry (NC)

"sign?" S flag SF

"!sign?" not signed SF

Overflow? OF = 1 OF

!Overflow? OF = 0 OF

parity? P = 1 PF

!parity? P = 0 PF

Operator	Description	Flag checked
"=="	equal	ZF = 1
"!="	not equal	ZF = 0
"<"	unsigned smaller	CF = 1
">"	unsigned greater	(NBE)
"<="	smaller or equal	(BE)
">="	greater or equal	(NC)
"zero?"	Z flag	(Z)
"zero?"	not Z	(NZ)
"carry?"	Carry	(C)
"!carry?"	not Carry	(NC)
"sign?"	S flag	SF
"!sign?"	not signed	SF
Overflow?	OF = 1	OF
!Overflow?	OF = 0	OF
parity?	P = 1	PF
!parity?	P = 0	PF

Note:

There are NO LIMITS on the number of .IF statements in a project or file.

Limitations:

the maximum .IF nesting deepth level is 64
the maximum number of .ELSEIF per one .IF level is 512
those limits are not conceptual but rather temporary

7.8.1 Using multiple conditions

You can use multiple conditions in .IF like this:

Example

	.if ( eax == 1 .or. ecx == 2 ) .and. esi != 7
		...
	.elseif dl == "a" .or. dl == "b" .or. dl == "s"
		... 
	.endif

Note:

if you use "(" and ")" to group conditions then you must use spaces arround them
Recomended logical operators are ".or." and ".and."
"&&" and "||" are also suported for now but this is uncertain in future versions

7.8.2 Using signed conditions

By default all comparations in a .IF are unsigned.
You can use signed conditions in .IF by prefixing the condition with the signed keyword like this:

Example

.if signed edx > = [edi + HTML_CTX.wnd_dy]
	; flag done		
	mov	eax,1
	ret
.endif

7.8.3 .REPEAT .UNTIL

You can use high level REPEAT ... UNTIL constructs. SolAsm will generate the needed code.

Syntax

	.REPEAT 
		{repeat body}
	.UNTIL {condition}

Notes

the syntax for {condition} is the same as for .IF

Example

	mov	ecx,17
	.repeat
		mov	edx,0
		.repeat 
			inc	edx
		.until edx > 7

		dec	ecx
	.until ecx == 0

7.8.4 .WHILE .ENDW

You can use high level WHILE ... ENDW constructs. SolAsm will generate the needed code.

Syntax

	.WHILE {condition} 
		{while body}
	.ENDW

Notes

the syntax for {condition} is the same as for .IF

Example

	mov	ecx,17
	.while ecx > 1
	
		mov	edx,0
		.while edx < 7 
			inc	edx
		.endw
		
		dec	ecx
	.endw

7.9 ENUMS

ENUM is a kind of auto generated EQU sequence. Sol_Asm will auto increment the values and will check for limits.

You can define ENUMS like this:

Syntax

	ENUM {enum_name},{start_value},{max_value}
		{enum name items}
	ENDE

Example

ENUM Modes,77h,ffh
	MODE_1
	MODE_2
	MODE_3
	MODE_21
ENDE

Sol_ASM will generate: MODE_1 EQU 77h , MODE_2 EQU 78h ... and so on for each ENUM item in sequence and will check for limits.

Note:

Enum items become EQU's with the same name as the item
Enum items do not have to be on a separated line but must be separated with spaces, tabs or comma

7.10 DEFINE text equates

DEFINE creates symbolic constants for text or strings. It behaves like a kind of EQU for strings and tokens.

This allows you to:

define symbolic strings or token sequences
handle string operations inside MACRO's
redefine ASM keywords

Syntax

	DEFINE {symbolic_name},{text}

An alternative name for DEFINE is TEQU

Example

	define	text1	"planet earth"
	define	text2	< swap "ecx" >
	define	text3	ebx
	define	text4	[esi+4]
	define	text5	STRCUT has_ebx_inside,5,3

	define	and	xor

	...

.data
	my_stting	db	text1	; in fact "planet earth"
.code

	mov	eax,text2	; in fact mov eax,swap "ecx"
	mov	eax,text3	; in fact mov eax,ebx
	mov	eax,text4	; in fact mov eax,[esi+4]

	mov	eax,text5	; in fact mov eax,ebx		

	and	eax,eax		; in fact XOR eax,eax

Notes

You can use < and > in order to define tokens that contain spaces
Defined text equates can be used anywhere in code and will be processed before any other tests are made and are always reevaluated.
Since text equates can eventually contain other text equates you should take care not to create an infinite loop of definitions.
For obviouse reasons the first parameter of a DEFINE statement is not expanded as a text equate, but everything else is expanded.

Textequ Types

Defined text equates have some subtle types attached:

String when quoted like: "planet earth"
ModRM when inside [...] like: [esi]
Token otherwise

7.11 STRING Functions

String functions allow you to operate on strings in text equates.

The folowing functions are available

Function Description Notes

STRCUT Extract a sub string from a string

STRADD Add two strings

STRLEN Obtain Length of string the result is a numeric token

Function	Description	Notes
STRCUT	Extract a sub string from a string
STRADD	Add two strings
STRLEN	Obtain Length of string	the result is a numeric token

7.11.1 STRCUT

STRCUT will extract a sub string from a source string

Syntax

	STRCUT {source},{start_pos},{length}

Example

 define	ebx1	STRCUT has_ebx_inside,5,3	; ebx		type token 
 define	ebx2	STRCUT "has_ebx_inside",6,3	; "ebx"		type string
 define	ebx3	STRCUT [ebx+ecx],1,3		; [ebx]		type ModRM

The result of STRCUT has the same type as the source

7.11.1 STRADD

STRADD will add two strings together.

Syntax

	STRADD {string1},{string2}

Example

 define	txt1	STRADD "planet"," earth"	; "planet earth" 	string 
 define	txt2	STRADD in,voke			; invoke		token
 define	txt3	STRADD [ebx],[+ecx]		; [ebx+ecx]		ModRM

The result of STRADD has the same type as string2

7.11.3 STRLEN

STRLEN will return the length of a string.

Syntax

	STRLEN {string1}

Example

 len1	equ	STRLEN "planet"	
 len2	equ	STRLEN invoke
 len3	equ	STRLEN [ebx+ecx]

 len4	equ	STRLEN STRADD "planet"," earth"	
 define	txt1	STRCUT "has_ebx_inside",6, STRLEN "ebx"

Notes

The result of STRLEN is a numerical token (decimal)
For quoted strings the length does include the quotes
As you can see above the string functions accept other string functions as parameters.

7.12 #ECHO

The #ECHO directive allows you to emit formated message text at compile time. This can be used to debug macros or inform user of compile stages.

Syntax

	#ECHO {format string},{arg1},{arg2},...

Example

MY_EQU equ 1234
define my_str " this is a string message"

.code
 ...

 #echo "\n code end=%x section base=%x, my_equ=%u string=%s",$,$$$,MY_EQU,my_str

Notes

#ECHO can have a variable number of arguments
The format string must be present

As a format specificator you can use one of the folowing:

Format Description

%x Hexadecimal number

%u unsigned decimal number

%d signed decimal number

%s an ASCII null terminated string

\n new line (CR+LF)

\t TAB

%% the "%" ASCII char itself

\\ the "\" ASCII char itself

Format	Description
%x	Hexadecimal number
%u	unsigned decimal number
%d	signed decimal number
%s	an ASCII null terminated string
\n	new line (CR+LF)
\t	TAB
%%	the "%" ASCII char itself
\\	the "\" ASCII char itself

7.13 OPTION

The OPTION directive is used to setup compiler optional behaviour.

Syntax

	OPTION {option_type}, [ {option_value} ]

The folowing options are available

Option	Description
list_on	activates listing output
list_off	deactivates listing output
proc_align { value }	setups alignment for PROC (default is 16 bytes)

7.14 #LOAD

This directive allows you to read a value from compiled code or data at compile time.

Syntax

	#LOAD {equ_name}, [byte/word/dword/qword] {address}

For Example:

	my_db	db	1
	
	#load	x,byte my_db
	#echo " x=%x",x

Notes

#Load will create the equ_name and give it the value read from address at compile time.
Address must be valid in one of the defined sections when #load is reached

7.15 #STORE

This directives allows you to write a value to compiled code or data at compile time.

Syntax

	#STORE {address}, [byte/word/dword/qword] {value}

For Example:

	my_db	db	1
	
	#store	my_db, byte 55h

Notes

Address must be valid in one of the defined sections when #store is reached

Chapter 8. Resource compiler

Sol_Asm does contain a mini resource compiler.

It can parse some RC scripts elements and can generate an "in memory" templates for them.

In resource scripts Sol_ASM does support C style hexadecimal constants.

8.1 Resource ID's

You can define a resource ID like this:

Syntax:

	#define		{ID value}

For Example:

	#define IDD_DLG1 1000
	#define IDC_BTN1 1001
	#define IDC_EDT1 1002
	#define IDC_BTN2 1003

Note:

#defined constants become EQU's and can be used in your program.

8.2 Dialogs

You can define a DIALOG like this:

Syntax:

	{dialog_id} 	DIALOGEX {dlg_x},{dlg_y},{dlg_dx},{dlg_dy}
	CAPTION		{caption string}
	STYLE		{style value}
	BEGIN
		{ control definitions }
	END

You can define a CONTROL like this:

Syntax:

	CONTROL {caption},{id},{"class"},{flags},{x},{y},{dx},{dy},{flags_ex}

For Example:

#define IDD_DLG1 1000

#define IDC_BTN1 1001
#define IDC_EDT1 1002
#define IDC_BTN2 1003
#define IDC_STC1 1004

IDD_DLG1 	DIALOGEX 	57,7,258,158
CAPTION 	"Sol_Asm Dialog 01"
STYLE		0x10CF0000

BEGIN
 CONTROL "Save",	IDC_BTN1,"Button",	0x50010000,	134,114,50,13,	0x00000000
 CONTROL "Exit",	IDC_BTN2,"Button",	0x50010000,	196,112,42,15,	0x00000000
 CONTROL "Name",	IDC_STC1,"Static",	0x50000000,	12,24,22,8,	0x00000000
 CONTROL "Text Edit",	IDC_EDT1,"Edit",	0x50010000,	50,22,134,11,	0x00000200
END

8.3 MENUS

You can define a MENU like this:

Syntax:

	{menu_id} 	MENUEX 
	BEGIN
		POPUP {"text"},{id}

		BEGIN
			MENUITEM {"text"},{id}
		END
	END

For Example:

SEPARATOR EQU 0

#define IDR_MENU 	10000
#define IDM_File 	10001
#define IDM_File_Open 	10004
#define IDM_File_New 	10005
#define IDM_File_Exit 	10009
#define IDM_Edit	10002
#define IDM_Edit_Cut	10006
#define IDM_Edit_Copy	10007
#define IDM_Edit_Paste	10008

IDR_MENU MENUEX
BEGIN
	POPUP "File",IDM_File

	BEGIN
		MENUITEM "Open",IDM_File_Open
		MENUITEM "New",IDM_File_New
		MENUITEM SEPARATOR
		MENUITEM "Exit",IDM_File_Exit
	END

	POPUP "Edit",IDM_Edit
	BEGIN
		MENUITEM "Cut",IDM_Edit_Cut
		MENUITEM "Copy",IDM_Edit_Copy
		MENUITEM "Paste",IDM_Edit_Paste
	END	
END

8.4 Emit Compiled Resources

You can emit a compiled resource as a data item like this:

Syntax:

	EMIT_RSRC {resource_id}

For Example:

align 32

my_dialog:
	EMIT_RSRC IDD_DLG1

align 32

my_menu:

	EMIT_RSRC IDR_MENU

and in your code you can write:

	...
   	invoke	DialogBoxIndirectParamA,[hInstance],my_dialog,0,Dlg_Proc,0

	...
	invoke	LoadMenuIndirectA,my_menu

Chapter 9. Listing

Sol_Asm can produce a listing file when the "-list" command line option is used.

Listing columns format:

{include_level} {macro_level} {flag} {address} {program text} {opcodes}

Include Level column

Shows the depth of include file nesting.

Macro level column

Shows the depth of macro expansion nesting

Flag column

It is an internal flag to Sol_Asm and changes often for debugging. Currently it shows if there is a need for a new pass to solve a symbol.

Address column

Shows the address for current line being assembled. For OBJ formats it shows the offset in section since the final address will be setup by the linker.

Program text column

Shows the program source text.

This includes:

normal source code
#IFDEF #ELSE #ENDIF source body even if condition is false
MACRO definitions and macro expansion performed by Sol_ASM preprocessor
Internal macro expansions performed for INVOKE directive
All directives

Opcode Column

It shows the CPU opcodes or data generated by Sol_Asm for each source line as a series of hexadecimal bytes.

Opcode column is aligned to column 128 if possible and expands up to column 224.

If more opcodes are needed then a new row is generated. If more than 4 rows are needed then an ellipsis "..." is shown and further opcodes are not shown anymore.

Limitations:

Opcode are fixed to column 128
Does not show source code for .IF .ELSEIF .ENDIF
Does not show PROLOGUE and EPILOGUE source code for PROC's

Listing Example:

1 0 0 00401047	
1 0 0 00401047		;--------------------------
1 0 0 00401047		; make up a build date 
1 0 0 00401047		;--------------------------
1 0 0 00401047		mov	esi,build_time				BE 3C A4 42 00 
1 0 0 0040104C		
1 0 0 0040104C		xor	eax,eax					33 C0 
1 0 0 0040104E		xor	edx,edx					33 D2 
1 0 0 00401050		xor	ecx,ecx					33 C9 
1 0 0 00401052		
1 0 0 00401052		mov	ax,[esi + OS_TIME.year]			66 8B 46 00 
1 0 0 00401056		mov	cx,[esi + OS_TIME.month]		66 8B 4E 02 
1 0 0 0040105A		mov	dx,[esi + OS_TIME.day_of_month]		66 8B 56 06 
1 0 0 0040105E	
1 0 0 0040105E		invoke	Str_Printf,sz_tmp1,sz_fmt_bld1,eax,ecx,edx
1 0 0 0040105E	push edx  						52 
1 0 0 0040105F	push ecx  						51 
1 0 0 00401060	push eax  						50 
1 0 0 00401061	push sz_fmt_bld1  					68 69 A0 42 00 
1 0 0 00401066	push sz_tmp1  						68 00 16 43 00 
1 0 0 0040106B	call Str_Printf  					E8 B0 07 00 00 
1 0 0 00401070	add esp, 00000014h  					83 C4 14

Appendix.1 Other issues

A1.1 Namespaces

Sol_Asm does use separated NAMESPACES for:

EQU's
LABLES
PROC's
STRUC's
IMPORTS
EXPORTS
SECTIONS

Because of this you can have a PROC with the same name as a STRUC but not two PROC's or two STRUC's with the same name.

However for now this is not under the control of the programmer and hence it is advised to avoid such coding practice because you can not control the order in witch SOl_ASM searches the separated namespaces.

It is my intention to provide a mechanism for controlling and defining namespaces to the user.

A1.2 System requirements

CPU

Sol ASM does require a 386 CPU as a minimum and does benefit form new advanced CPU's.

Memory

SOL_ASM pre allocates approximatively 24Mega bytes at startup.

Each section gets 1M at define time and that is eventually reallocated when needed.

Additional memory is allocate when needed for files, imports, macro's etc

OS

Sol asm was tested on WinXP, Solar OS and WinXP64 but it should also work on Win95, win98, win2k, win2003 and Vista

Starting from version 14.02 Sol_Asm also runs on Linux and on UNIX like OSes that can link Sol_Asm OBJ against a limited set of LIBC functions.

A version for Mac OS X is also available in ELF OBJ format. You can use Agner Fog's OBJCONV program to convert it to MACH-O and link to LIBC to obtain the executable on your Mac.

A1.3 Speed testing

Speed testing was performed on two big projects: Sol_Asm itself and Solar_OS.

Synthetic testing was performed on files with 10.000 or 100k PROC's

For Example:

Solar Assembler version 0.10.01
Copyright (C) 2004-2008 Bogdan Valentin Ontanu, All rights reserved.
Build on 2008_2_23  at 7:14:23

Assembling file: sol_asm2.asm
Assembler  pass: 1
Assembler  pass: 2
Assembler  pass: 3
Assembler  pass: 4
Assembler lines: 67866
Output    bytes: 192512
Assembler  time: 406 ms
---------------------------

4 pass x 67.866 lines = 271.464 lines in 406 ms --> 668.630 lines per second

For Example:

Solar Assembler version 0.10.01
Copyright (C) 2004-2008 Bogdan Valentin Ontanu, All rights reserved.
Build on 2008_2_23  at 7:14:23

Assembling file: system_32.asm
Assembler  pass: 1
Assembler  pass: 2
Assembler  pass: 3
Assembler lines: 111403
Output    bytes: 534016
Assembler  time: 578 ms
---------------------------

3 pass x 111.403 lines = 334.209 lines in 578 ms --> 578.216 lines per second

This are real projects with many PROC's, STRUC's, MACRO's and code.

Testing was performed on an laptop with an Intel Core 2 Duo CPU at 2Ghz and with 1G of RAM in WinXP 32.

Appendix.2 Known keywords

Warning:

In current alpha status the fact that a keyword is recognized or accepted does not automatically mean it is also correctly encoded.

Registers:

	

8 bit registers
-------------------------------              
"al"    "r8l"       "spl"
"cl"    "r9l"       "bpl"
"dl"    "r10l"      "sil"
"bl"    "r11l"      "dil"
"ah"    "r12l"
"ch"    "r13l"
"dh"    "r14l"
"bh"    "r15l"
              
16 bits registers
-------------------------------
"ax"    "r8w"      "es" 
"cx"    "r9w"      "cs" 
"dx"    "r10w"     "ss" 
"bx"    "r11w"     "ds" 
"sp"    "r12w"     "fs" 
"bp"    "r13w"     "gs" 
"si"    "r14w" 
"di"    "r15w" 
       
32 bits registers
-------------------------------
"eax"     "r8d"  
"ecx"     "r9d"  
"edx"     "r10d" 
"ebx"     "r11d" 
"esp"     "r12d" 
"ebp"     "r13d" 
"esi"     "r14d" 
"edi"     "r15d" 
       
64 bits registers
-------------------------------
"rax"     "r0"      "r8" 
"rcx"     "r1"      "r9" 
"rdx"     "r2"      "r10"
"rbx"     "r3"      "r11"
"rsp"     "r4"      "r12"
"rbp"     "r5"      "r13"
"rsi"     "r6"      "r14"
"rdi"     "r7"      "r15"
       

MMX registers       
--------------------------------
"mm0"  
"mm1"  
"mm2"  
"mm3"  
"mm4"  
"mm5"  
"mm6"  
"mm7"  
       
FPU registers       
--------------------------------
"st0"  
"st1"  
"st2"  
"st3"  
"st4"  
"st5"  
"st6"  
"st7"  
       
XMM registers       
--------------------------------
"xmm0" 
"xmm1" 
"xmm2" 
"xmm3" 
"xmm4" 
"xmm5" 
"xmm6" 
"xmm7" 
       
"xmm8" 
"xmm9" 
"xmm10"
"xmm11"
"xmm12"
"xmm13"
"xmm14"
"xmm15"

Instructions and directives


0	mov					 
1	lea					 
2	movzx					 
3	movsx					
4	bswap					 
5	xchg					 
6	xor					 
7	cmp					 
8	add					 
9	sub					 
10	or					 
11	and					 
12	sbb					 
13	adc					 
14	shl					 
15	shr					 
16	sar					 
17	rol					 
18	ror					 
19	rcl					 
20	rcr					 
21	sal					 
22	shld					 
23	shrd					 
24	test					 
25	not					 
26	neg					 
27	inc					 
28	dec					 
29	div					 
30	idiv					 
31	mul					 
32	imul					 
33	call					 
34	jmp					 
35	loop					 
36	ret					 
37	retn					 
38	int					 
39	int3					 
40	into					 
41	iret					 
42	iretd					 
43	hlt					 
44	leave					 
45	push					 
46	pushad					 
47	pusha					 
48	pushfd					 
49	pushf					 
50	pop					 
51	popad					 
52	popa					 
53	popfd					 
54	popf					 
55	jo					 
56	jno					 
57	jc					 
58	jnc					 
59	jb					 
60	jnb					 
61	jnae					 
62	jae					 
63	jz					 
64	jnz					 
65	je					 
66	jne					 
67	jbe					 
68	jnbe					 
69	jna					 
70	ja					 
71	js					 
72	jns					 
73	jpe					 
74	jpo					 
75	jl					 
76	jnl					 
77	jnge					 
78	jge					 
79	jle					 
80	jnle					 
81	jng					 
82	jg					 
83	rep					 
84	movsb					 
85	movsd					 
86	movsw					 
87	stosb					 
88	stosd					 
89	stosw					 
90	lodsb					 
91	lodsd					 
92	lodsw					 
93	scasb					 
94	scasd					 
95	nop					 
96	clc					 
97	stc					 
98	daa					 
99	das					 
100	cbw					 
101	cdq					 
102	cld					 
103	cmc					 
104	aaa					 
105	aas					 
106	lahf					 
107	lock					 
108	cpuid					 
109	rdtsc					 
110	aad					 
111	aam					 
112	out					 
113	in					 
114	finit					 
115	fninit					 
116	fld					 
117	fild					 
118	fst					 
119	fstp					 
120	fistp					 
121	fadd					 
122	faddp					 
123	fiadd					 
124	fsub					 
125	fisub					 
126	fdiv					 
127	fdivrp					 
128	fmul					 
129	fmulp					 
130	fimul					 
131	fxch					 
132	fucompp					 
133	fclex					 
134	fnclex					 
135	fnop					 
136	fchs					 
137	fabs					 
138	ftst					 
139	fxam					 
140	fld1					 
141	fldl2t					 
142	fldl2e					 
143	fldpi					 
144	fldlg2					 
145	fldln2					 
146	fldz					 
147	f2xm1					 
148	fyl2x					 
149	fptan					 
150	fpatan					 
151	fxtract					 
152	fprem1					 
153	fdecstp					 
154	fincstp					 
155	fprem					 
156	fyl2xp1					 
157	fsqrt					 
158	fsincos					 
159	frndint					 
160	fscale					 
161	fsin					 
162	fcos					 
163	emms					 
164	sidt					 
165	lidt					 
166	lgdt					 
167	sgdt					 
168	cli					 
169	sti					 
170	wbinvd					 
171	xlat					 
172	db					 
173	dw					 
174	dd					 
175	dq					 
176	dt					 
177	do					 
178	real4					 
179	real8					 
180	real10					 
181	rb					 
182	rw					 
183	rd					 
184	rq					 
185	rt					 
186	ro					 
187	rs					 
188	equ					 
189	align					 
190	proc					
191	uses					 
192	arg					 
193	local					 
194	endp					 
195	.if					 
196	.elseif					 
197	.else					 
198	.endif					 
199	#ifdef					 
200	#ifndef					 
201	#else					 
202	#endif					
203	#ifnb					 
204	#ifb					 
205	#if_used				 
206	#if_not_used				
207	macro					 
208	endm					 
209	exitm					 
210	rept					 
211	invoke					 
212	cinvoke					 
213	cdecl					 
214	stdcall					 
215	include					 
216	incbin					 
217	incfrom					 
218	import_dll				 
219	from_dll				 
220	import_lib				 
221	from_lib				 
222	import_func				 
223	import					 
224	extern					 
225	export					 
226	alias					 
227	struc					 
228	struct					 
229	ends					 
230	enum					 
231	ende					 
232	.entry					 
233	org					 
234	disp					 
235	.use16					 
236	.use32					 
237	.use64					 
238	section					 
239	class_code				 
240	class_data				 
241	class_imports				 
242	class_relocs				 
243	class_bss				 
244	class_exports				 
245	class_rsrc				 
246	#define					 
247	begin					 
248	end					 
249	dialogex				 
250	caption					 
251	style					 
252	control					 
253	menuex					 
254	popup					 
255	menuitem				 
256	emit_rsrc				 
257	.echo					 
258	$time

Appendix.3 Sample programs

A win32 sample application

;------------------------------------------------------
; Sol_Asm assembler sample
; Copyright (c) 2004-2008, Bogdan Valentin Ontanu
; All rights reserved.
;------------------------------------------------------

;----------------------------
; define imports
;----------------------------
from_dll 	kernel32.dll
	import	ExitProcess
	import	GetStdHandle

from_dll	user32.dll
	import	MessageBox alias MessageBoxA

;-------------------
; define sections
;-------------------
section "code" 		class_code
section "data"  	class_data
section "idata" 	class_imports


.data
	sz_message	db	"First Win32 PE application",0
	sz_title	db	"Sol_ASM",0

.code
	;------------------------
	; define entry point
	;------------------------
	.entry Start

Start:
	;-----------------------------
	; the classical message box
	;-----------------------------
	invoke	MessageBox, 0, sz_message, sz_title, 3

	;--------------------------
	; done here, exit nicely
	;--------------------------
	invoke	ExitProcess,0
	ret

Assuming the file in named: test_win32.asm and Sol_Asm is in path you can build this sample with the following command:

	sol_asm2  test_win32.asm test_win32.exe -pe32

The resulted executable should display a message box when run.

Solar Assembler Reference Manual

Contents

Chapter.1 Introduction

1.1.Design Goals

1.2 Targets

Short term targets

Long term targets

1.3 Fair warning

1.4 OS specific versions

Windows

Linux

MacOSX

Chapter.2 Running Solar assembler

2.1 Invocation

Syntax:

Example:

2.2 Options

Help options:

Output options:

Other options:

Info options:

Notes

The info_files option

2.3 OllyDbg specific debug info

Chapter 3. Program Setup

3.1 Sections

Syntax:

Example:

Notes:

3.1.2 Section Name Alias

Syntax

For example:

3.2 Imports

3.2.1 Imports Definition

Syntax:

Example:

Notes:

3.2.2 Imports Alias

Syntax

For example:

3.2.3 Calling convention for imports

Example:

3.2.4 Argument count for Imports

Syntax

Example:

3.3 EXTERN

Syntax

Example:

Notes:

3.4 EXPORT

3.4.1 Define Exports

Syntax

Notes:

3.5 Entry point

Example:

Notes:

3.6 Base address, ORG and DISP

3.6.1 Base address

Syntax:

Example:

3.6.2 ORG

Syntax:

3.6.3 DISP

Syntax:

For example:

3.7 Encoding Modes

Note:

3.8 Include files

Syntax:

Syntax

Syntax

Examples:

Notes:

Chapter 4. Language elements

4.1 Numbers

Notes:

Examples:

Limitations:

4.2 Expressions

Operators are: