back to www.oby.ro
> back to Sol_Asm
for Sol_Asm version 0.36.62.00
updated 09.05.2018
©   Copyright 2007,2018 Bogdan Valentin Ontanu. All rights reserved.
This document presents an overview of the syntax and usage of Solar Assembler. It makes the assumption that the reader is familiar with assemblers and ASM programming language.
During this document the following terms and abbreviations are used:
Abbreviation | Description |
---|---|
Sol_Asm | Solar Assembler |
OS | Operating system |
Win32 or Win64 | Windows 32 or 64 bits operating system |
PE32 or PE64 | Portable Executable Format - 32 or 64 bits |
DLL | Dynamic Link library |
CDECL | C default calling convention |
STDCALL | Win32 API default calling convention |
OMF | Object Module Format - OBJ format specification |
COFF | Common Object Format - OBJ format specification |
ELF | Executable and Linking Format - OBJ format specification |
HLL | High Level Language |
Also in this document the accolades "{}" are used to enclose some text, name or value that you have to specify in syntax definition.
One exception to this rule is in STRUCTURE initialization section where "{}" are part of the syntax.
SOL_ASM is designed from the point of view of the creator that uses ASM as its main programming language. Hence Sol_Asm tries to ease the development of huge ASM only projects.
However Sol_Asm can also be used as a low level assembler without the help from HLL directives.
Sol_Asm main features are:
In daily usage this means that:
It also means that SOL_ASM does contain a decent amount of HLL features like:
All HLL statements are implemented internally in SOL_ASM and code is generated for them at compile time (not by user included macros). This means that all those features can be used to start development with minimal includes.
Of course Sol_Asm is written in assembly language and compiled by Sol_Asm itself. That is why it is named sol_asm2 ... because sol_asm is building sol_asm2 ;)
The short term targets until alpha stage have been:
All of the short term targets have been acquired.
The long term targets are:
Most (but not all) of the long term targets have been acquired also.
Solar Assembler is stable and functional but still in development.
This means that it still contains bugs and has a few missing features.
Caution is advised to the user.
However it should be ok for personal / big ASM projects.
Lately Sol_Asm has been used by me to develop:
Those are all big and complex projects and Sol_ASM has proved itself valid for them.
This document assumes you are using the Win32 version of Sol_Asm. Other OS versions and details are not fully presented here.
However SOL_ASM OS specific versions are almost the same and share 99% of code with Win32 versions.
The only differences are:
gcc sol_asm2_unix_elf.obj -o sol_asm2
sudo apt-get install gcc-multilib
gcc -m32 sol_asm2_unix_elf.obj -o sol_asm2
You can execute SOL_ASM from the command line like this:
sol_asm2 {input_file} {output_file} {-options}
or
sol_asm2 {-options} {input_file} {output_file}
sol_asm2 -pe32 my_game.asm my_game.exe
All command line options must be specified with the "-" prefix character. On Windows you can also use the "/" prefix character.
Option | Action | Obs. |
---|---|---|
-h, -help | This will print all help text and then exit |   |
-h0, -h1, -h2, -h3, -h4 | This will print a limited part of help text and then exit |
|
Option | Action | Obs. |
---|---|---|
-pe32 | This will generate Win32 Portable Executable |   |
-pe64 | This will generate Win64 PE executable |   |
-console | This will set console sub-system (for Win32) |   |
-dll | This will set DLL characteristics of PE (for Win32) | make a DLL |
-binary | This will generate a plain binary | useful for OS development or handcrafted formats |
-omf32 | This will generate an OBJ file in OMF format. | This OBJ can be linked with ALINK linker. |
-coff32 | This will generate an 32 bit OBJ file in MS-COFF format. | This OBJ can be linked with MS link, Polink, GOlink and other linkers. and used in projects that link multiple modules together as OBJ |
-coff64 | This will generate COFF 64 OBJ format |   |
-elf32 | This will generate an 32 bit OBJ file in ELF format. | can be linked with LD or GCC on Unix like systems |
-elf64 | This will generate an 64bit OBJ file in ELF format. | can be linked with LD or GCC on Unix like systems |
-mac32 | This will generate an OBJ file in MACHO format. | is still experimental or not finished |
For MacOSx it is still recommended that you use the ELF32/64 format and then convert from ELF to MachO obj format before linking (eg by using objconv by Agner Fog)
Option | Action | Obs. |
---|---|---|
-q | Be Quiet: only error messages are shown (for makefiles) | for makefiles |
-d equ_name | This will define equ_name symbol at command line. | The value of the symbol is 1 (one) and can be later tested in source code. |
-size | This will optimize for the output for size. |
Using this option will usually result in more passes being done. |
-dbg | This will generate debug info. | Works for PE32, ELF and COFF OBJ, other debug formats and levels will follow. |
-list | This will generate an listing file named: output_filename_list.lst | One extra pass will be done for listing |
-list_pass | This will generate a series of listing files named: _list_1.lst list_2.lst ... one file for each pass. | Compile speed will be slower with this option |
-bench | This will show a compiler/parser speed benchmark |
Option | Action | Files name suffix |
---|---|---|
-info |
This will generate OllyDbg specific info.
This can be loaded into OllyDbg with the Labelmaster plugin. |
_info.lst |
-info_proc | This will generate a list of all PROC's and their arguments. | _proc.lst |
-info_stru | This will generate a list of all STRUC's and their members info. | _stru.lst |
-info_equ | This will generate a list of all EQU's items and their values. | _equ.lst |
-info_enum | This will generate a list of all ENUM items and their values as EQU definitions. | _enum.inc |
-info_tkn | This will generate a list of known opcodes and directives. | _tkn.lst |
-info_files | This will generate a list include files and folders. | _files.lst |
-info_reloc | This will generate a list of relocations | _reloc.lst |
-info_sect | This will generate a list of sections for each pass | _sections_N.lst |
-info_all | This will generate all above info. |   |
If you have a main ASM file that includes all other files then Sol_ASM will parse this tree and generate a list of all the files included in your project. The sub folders of your project's main ASM file will become "groups".
The generated list of files is in RadASM INI format and you can copy paste it in a dummy project in order to transfer your existing non-RadASM project into an RadASM project.
Sol_Asm can generate a file named: output_filename_info.txt that will contain a list of your application LABELS, PROC's and their addresses. This file can be loaded in OllyDbg by the LabelMaster plug-in and can help symbolic debugging a lot. You will be able to see familiar code labels, PROC names, variable names and call stack in OllyDbg.
You can obtain Labelmaster plugin here
The same thing can be obtained by using the -dbg command line option that will generate debug info inside the OBJ or PE32 files. However this simple ascii format has some advantages (multiple address with the same name) and can be used with ease for your own custom debugging utils.
A series of initial statements are required for making a valid program. this is usually called "red tape" to wrap the package.
Here is a sample of the most simple Sol_Asm program:
; minimal test file section "code" class_code nop ret
Only one declaration is absolutely required by SOL_ASM: the section declaration.
SOL_ASM divides a program into multiple sections.
You define a section like this:
section {"section name"} {section_type}
section "code" class_code section "data" class_data section "idata" class_imports
At least one section must be defined before any code generation.
Section type | Description | Attributes |
---|---|---|
class_code | for code | CODE, EXECUTE, READ |
class_data | for initialized data | INITIALIZED, READ, WRITE |
class_bss | for not initialized data | READ, WRITE, RAW size = 0 |
class_imports | for imports | INITIALIZED, READ, WRITE |
class_relocs | relocations | INITIALIZED, READ, WRITE |
class_exports | exports | INITIALIZED, READ |
class_rsrc | for resources | work in progress |
After you have defined your sections, in the program body you can switch in between sections with ".section_name" like this:
.code ; enter your code here .data ; enter some data definitions here .code ; return / continue to code section
When defining a section you can provide an alias name like this:
SECTION {section_name_for program} ALIAS {section_name_for_OS}
This is useful for linkers that have default section naming conventions or like to unite sections based on section name.
SECTION "code" CLASS_CODE ALIAS ".text"
This will allow to use the familiar ".code" and ".data" section selectors in your program and still output a section name according to your linker's preferences.
Alternatively you can name your section".text" and select it with "..text"
If your program is a PE32 or PE64 format then you can specify the imported DLL's and the functions imported from each DLL
You define imports like this:
FROM {dll_name} IMPORT {function_name} {[param_count]} {calling convention} ALIAS {alias_name}
from kernel32.dll import ExitProcess import GetStdHandle [1] STDCALL ALIAS _GetStdHandle@4 from user32.dll import MessageBox ALIAS MessageBoxA
The above example will import:
Each "import" statement belongs to the previous "from" statement.
Alternative names for import keywords are:
When importing an API name you can provide an alias name like this:
IMPORT {function_name_for program} ALIAS {function_name_for_OS}
import MessageBox alias MessageBoxA
This will allow your program to refer to ASCII or UNICODE versions of API using a single API name across your code.
By default Sol_Asm considers all imported functions to be STDCALL for binary and 32 bits format and WIN64 for 64 bits format.
You can establish a different calling convention of an imported function like this:
import Str_Printf CDECL/win64/stdcall/lin64
Additionally you can add the "varg" statement to mark a variable arguments import function. This is usefull for lin64 calling convention.
Sol_Asm does not need procedures prototypes because it will extract this information from your PROC definition even if the definition is present after procedure usage. However imported API's are not defined in your sources and hence by default Sol_Asm will not check the argument count for imported functions.
You can define the argument count for imports like this
IMPORT {function_name} [{argument_count}]
import MessageBox [4] alias MessageBoxA
In this case Sol_Asm will check an INVOKE statement to have 4 parameters and the IMPORT statement acts as a mini prototype.
With EXTERN you can define symbols external to your module (defined in other modules). This is usefully when you gnerate OBJ files and link them together with an external linker.
EXTERN {function_name} [{argument_count}] ALIAS {function_alias}
extern AddAtom [1] alias _AddAtomA@4
EXTERN is similar to IMPORT but it does not need a FROM_DLL statement
EXTERN symbols will be solved by the linker at link time. Depending on your linker configuration they can be linked in statically or dynamically.
Using EXPORT you can export a procedure or a label from your program. It works for PE32, DLL and COFF,ELF output formats.
For OBJ output formats it has the effect of making your symbol PUBLIC.
You define exported functions like this:
EXPORT {proc_or_label_name} ALIAS {export_name_for_output}
By default the entry point is at the start of the first section defined.
However you can specify another location like this: Syntax:
.ENTRY {symbol_name}
.entry App_Init ... App_Init: call Main invoke ExitProcess ret ... PROC MAIN ... ret ENDP
This will define "App_Init" label as the entry point of your program.
Those keywords allow you to setup the address where your code will or should be placed in memory at run time.
The default base address is setup like this:
You can setup a base address like this:
BASE {absolute_or_virtual_address}
BASE 02000_0000h
This will make your executable base address start at 512M. The base address is the address of the first section. Each section is aligned at 4K and PE files also have an additional 4K header before the first section starts.
BASE has the same effect like an ORG and an DISP with the same value.
If you want a piece of code to be located at an absolute address (for OS development) or if you want to "jump around" inside code positions you can use the ORG directive:
ORG {absolute_address}
The ORG directive moves both the current address counter and output pointer in output file to the specified address.
Code and data will be generated at the new address and offset in output file.
DISP directive will move the output pointer backward a certain amount. the reason for this is to avoid having zeroes at start of output file after an ORG directive.
DISP {negative_move_size}
org 0B000h disp 0B000h
Are the first lines in Solar_OS System32 module.
This means that code is made to run at absolute address 0xB000 but the output pointer remains at:
output_position = + B000h (because of org) - B000h (because of disp) = 0
And this way the generated binary will not contain 0xB000 zeroes or garbage at start of file.
Sol_Asm can encode 16bits, 32bits and 64bits ASM.
You switch between 32/64 bits encoding with:
.USE16 - encode 16 bits .USE32 - encode 32 bits (default) .USE64 - encode 64 bits
Your main asm file can include other asm files and so on.
include {include_path_and_file_name}
Or alternatively you can include binary files.
incbin {include_path_and_file_name}
You can also fine tune the binary include:
incfrom {start_pos}, {size}, {include_path_and_file_name}
In this case all 3 parameters must be present.
Include size can be "?" if you want to include until the end of the file.
incfrom 512,1027,help2.txt ; skip 512 bytes, include 1027 bytes incfrom 128,?,help2.txt ; skip 128 bytes, include rest of file
SOL_ASM accepts numbers in the following formats:
111_000_101b - binary 0FFFF_C0_00h - hexadecimal 1_000_000 - decimal 10.2345 - floating point
Numbers do not have to start with "0" or a digit but that is good practice.
Expressions are statements like:
(5+4)*7 ((PCI_DEVICES_MAX*PCI_ITEM_SIZE)/(4096+4096))+1 ((ETH_RX_APPS_MAX*4)/4096)+1 <1 SHL 5>
Expressions can contain Numbers, Operators, Braces and Symbols
Operator | Description | Priority |
---|---|---|
"*" | multiplication | 1 |
"/" | division | 1 |
+ | addition | 2 |
- | subtraction | 2 |
x SHL n | shift x left n times | 1 |
x SHR n | shift x right n times | 1 |
x ROL n | rotate x left n times | 1 |
x ROR n | rotate x right n times | 1 |
x XOR y | binary XOR | 1 |
x AND y | binary AND | 1 |
x OR y | binary OR | 1 |
NOT x | binary NOT | 2 |
- | unary minus | 1 |
RND N | obtain a random number in range [0...N] | 1 |
Variable | Description | Priority |
---|---|---|
$ | current address | 2 |
$adr | current address | 2 |
$$ | current section base addr | 2 |
$$$ | format base addr | 2 |
$ofs | current offset in section | 2 |
$rva symbol | RVA of symbol | 2 |
$pass | current pass nr | 2 |
$style token | token style (token, string modrm) | 2 |
$type token | token type (register, label, etc) | 2 |
$size token | token size (8,16,32,64 bits) | 2 |
$value token | token value (reg code, label addr) | 2 |
< 1 SHL 5 >
The above expression does contain spaces and was therefore enclosed in < and >
Are expressions used by CPU in complex effective address calculations. Those kind of expressions are handled differently from normal expressions by Sol_Asm.
The generic layout is:
[{base_reg} + {scale}*{index_reg} + {displacement}]
mov eax,[esi + 4*ecx + 1234h] mov eax,[esi + INFO_CTX.name_len]
In the first example above:
base_reg = esi scale = 4 index_reg = ecx displacement = 1234h
As per CPU specifications scale can be missing or: 2,4,8 only
This is special kind of string that can be used as an instruction operand.
mov eax,"abcd" cmp al,"-"
You can use the SWAP modifier to reverse the string
cmp eax," rox" ; compare with "xor " in reverse because of endian issues cmp eax, swap "xor " ; same as above but much easier to read
They are used as names for labels, procedures, etc in the program.
System32_Start:
User defines symbols are case sensitive and can contain underscores "_" digits and special characters but can not contain: CR, LF, space "<>" and comma.
They do not have to start with a letter... (but that is good practice).
The max symbol size is 128 bytes.
This kind of comments start with ";" character and extend until the end of line.
; this is a single comment on a line mov eax,1 ; this comment is at end of line
Block comments are made with: "/*" and "*/"
; comment out this debug code /* ODS_str <13,10,"+++ Equ_Create"> ODS_token ; notice here how block comments can be nested /* ODS_fmt <13,10,09,"equ_create: value=%x">, eax */ */
invoke CreateWindowExA,0,class_name,wnd_title,\ WS_OVERLAPPED+WS_CAPTION+WS_SYSMENU+\ WS_THICKFRAME+WS_MINIMIZEBOX+WS_MAXIMIZEBOX,\ 32,64,320,240,0,0,[module_handle],0
Keywords are:
MOV, XOR, ADD, SUB, JMP, CALL - are opcode mnemonics EAX, ECX, ST0, MM0, RAX, XMM1 - are register names PROC, STRUC, .entry, ORG, INVOKE - are SOL_ASM directives
Keywords are case insensitive.
SOL_ASM rarely treats a symbol in a special way.All symbols are born equal :P However there are exceptions:
Special Character | Description | Notes |
---|---|---|
SPACE, TAB or "," | are used as separators for tokens | can not be part of user tokens |
CR, LF | line end and separators for tokens | can not be part of user tokens |
":" | it defines a code label when used as suffix after a user symbol | can be part of user tokens |
"$" | means current address counter | can be part of user tokens |
"?" | means "do not care" / "non initialized" in data define statements | can be part of user tokens |
"." |
hints a section selection when followed by a section name
means a structure member name separator in {structure}.{member} |
can be part of user tokens |
" " (double quotes) | encloses a string | can be part of user tokens |
' ' (single quotes) | encloses a string | can be part of user tokens |
< > | means LITERAL, multiple tokens enclosed by < > and separated by spaces or comma will be considered as one ;) | has use restrictions |
"[" "]" | encloses a Mod_RM address expression | has use restrictions |
"{" "}" | used to enclose structure initializations statements
also used instead of < > |
can be part of user tokens but has use restrictions |
The following symbols have a special meaning only inside a MACRO body:
Special Character | Description | Notes |
---|---|---|
"@" | means MLOCAL when used as a prefix inside a MACRO | can be part of tokens |
"&" | triggers MARG check and expansion even when inside a token | can be part of tokens |
As you can see Sol_Asm is relatively tolerant toward the use of special symbols inside user defined tokens.
The special symbol "$time" means current build time in OS_TIME format and it creates a data definition.
STRUC OS_TIME year dw ? month dw ? day_of_week dw ? day_of_month dw ? hour dw ? minute dw ? second dw ? mili_sec dw ? ENDS
When include in source this data definition will be updated by SOL_ASM at each compile time.
build_time: ;------------------------------------- ; compile time symbol, ; value is filled in by assembler ;------------------------------------- $time db 0
You can define initialized data like this:
{label_name} db {data_item} ; define byte 8 bits {label_name} dw {data_item} ; define word 16 bits {label_name} dd {data_item} ; define dword 32 bits {label_name} dq {data_item} ; define qword 64 bits {label_name} dt {data_item} ; define tword 80 bits {label_name} do {data_item} ; define oword 128 bits
Additionally "db" does accept ASCII strings.
my_dwords dd 1,2,7,0FACE_BABEh,1356789,11 my_string db "This is a message",0
You can use the "?" special character to define non initialized data.
my_var_db db ? my_var_dw dw ? my_var_dd dd ?
You can define an unicode string like this:
{label_name} du "type your utf-8 string here",0
The parser will read and interpret utf-8 encoded code points from the quoted string and translate them to 16 bits words.
{label_name} real4 {data_item} ; define REAL4 number - 32bits {label_name} real8 {data_item} ; define REAL8 number - 64bits {label_name} real10 {data_item} ; define REAL10 number - 80bits
test1 real4 10.2345 test2 real4 0.7785 test3 real4 1_277_789.534 test4 real4 999_123_456_789.37 test1b real8 10.2345 test2b real8 0.77854321773 test3b real8 1_277_789.534 test4b real8 999_123_456_789.37 test1c real10 10.2345 test2c real10 0.77854321773 test3c real10 1_277_789.534 test4c real10 999_123_456_789.37
SOL_ASM performs real number conversions into the highest floating point precision available (80bits) and stores the result in requested format. Because of this "test4" above can not retain all defined digits but "test4c" can do it.
You can reserve data with the following keywords:
{label_name} rb {count} ; reserve byte(s) = 8 xbits {label_name} rw {count} ; reserve word(s) = 16 x bits {label_name} rd {count} ; reserve dword(s) = 32 x bits {label_name} rq {count} ; reserve qword(s) = 64 x bits {label_name} rt {count} ; reserve tbytes = 80 x bits {label_name} ro {count} ; reserve owords = 128 x bits
You can reserve structures like this:
rs {structure_name},{count} - reserve {count} structures
rb 1024 ; reserve 1024 bytes rw 17 ; reserve 17 words rd 23 ; reserve 23 dwords rs WNDCLASS,77 ; reserve 77 WNDCLASS structures
You can fill initialized data buffers with the following keywords:
{label_name} fb {count},{fill_value} ; fill bytes {label_name} fw {count},{fill_value} ; fill words {label_name} fd {count},{fill_value} ; fill dwords {label_name} fq {count},{fill_value} ; fill qwords
And for structures:
{label_name} fs {struc},{count},{fill_value} ; fill structures
Structure definitions are automatically promoted as data types and you can define a structure like this
{label_name} {structure_name} {data_item}
my_class WNDCLASS ? my_ps PAINTSTRUCT ?
Defines one WNDCLASS structure at label "my_class" with initial value unknown, and one PAINTSTRUCT structures at label "my_ps".
STRUC POINT_3D x dd ? y dd ? z dd ? ENDS
You can initialize structure members like this:
my_pt_1 POINT_3D { 1 2 3 } my_pt_2 POINT_3D { y = 2 z = 7 x = 1 } my_pt_3 POINT_3D { x = 2 y = 7 z = 1 }
You can initialize sub structure members by name like this:
STRUC POINT_2D x dd ? y dd ? ENDS STRUC CIRCLE_2D color dd ? center rs POINT_2D,1 radius dd ? ENDS my_circle_var CIRCLE_2D { color = 00_7F_FF_3Fh center.x = 100 center.y = 200 radius = 77 }
You can nest {} like this
my_circle_var CIRCLE_2D { color = 00_7F_FF_3Fh { 100 200 } radius = 77 } ; or by name my_circle_var CIRCLE_2D { color = 00_7F_FF_3Fh { x = 100 y = 200 } radius = 77 }
You can also use {} for members that are made of multiple items but are not typed as structures (RB, RW, RD, RQ, RO) like this:
struc GUID dd1 dd ? dw1 dw ? dw2 dw ? bytes rb 8 ends my_guid GUID { aaaa_bbbbh,cccch,ddddh { 1 2 3 4 5 6 7 8 } }
SOL_ASM follows Intel style ASM syntax as opposed to AT && T syntax. The syntax reflects my personal preferences resulted from doing extensive applications in ASM. The following sub chapters will present the most notable syntax issues...
Each ASM source line haves this default layout:
{label} {instruction} {parameter1} , {parameter2} {; comments}
read_pixel: mov eax,[esi] ; 32 bits ARGB format
All elements can be missing but if a {parameter} is present then {instruction} must also be present.
Directives do not have to follow this syntax.
There is no "offset" keyword. The name of a variable or label automatically means "offset of" As a consequence you must always use brackets for obtaining "contents of" a variable.
.data my_var dd 37h .code mov esi,my_var mov edx,[my_var] mov edx,[esi] ...
In the above example the first MOV will fill ESI register with the offset of my_var (ie with 0x402000 for example)
The second MOV will fill EDX with the content of my_var (ie with 37h for example). The 3rd move will do the same but by using esi as a pointer to my_var.
Notice the similarity of the second MOV with: MOV EDX,[ESI]
When needed or when wanted the user can override the operand size of encodings.
byte - force 8 bits word - force 16 bits dword - force 32 bits qword - force 64 bits tbyte - force 80 bits oword - force 128 bits small - force low word of symbol
Let us assume we have defined the following structure:
STRUC INFO_CTX info_name rb 128 info_dword dd ? info_word dw ? info_byte db ? ENDS
And then we reserve a vector of 1024 such structures:
my_info rs INFO_CTX,1024
Then the following rules apply for accessing structure members:
mov esi,my_info mov eax,[esi + INFO_CTX.info_dword] mov [esi + INFO_CTX.info_word],2 ; will move WORD 2 mov [esi + INFO_CTX.info_byte],1 ; will move BYTE 1 ; go to next item in vector add esi, size INFO_CTX
Observe how the structure member size will hint instructions for operand size when possible. This greatly reduces the need for "dword / word / byte" modifiers.
movzx eax,[esi + INFO_CTX.info_byte] movzx eax,[esi + INFO_CTX.info_word]
is equivalent to:
movzx eax,byte [esi + INFO_CTX.info_byte] movzx eax,word [esi + INFO_CTX.info_word]
But you do not have to use "byte" and "word" hints because of the structure that provides this information.
However in this example:
mov byte [esi],4
SOL_ASM will require the "byte" user size override / hint because there is no structure member hint available
You can write multiple assembly instructions on the same line. Sol_Asm will know when one instruction ends and the next one starts.
push ebx push esi push edi ; init mov eax,1 mov ecx,17 mov ebx,3 loop: xor ecx,ebx sub ebx,edx dec ecx jnz loop pop edi pop esi pop ebx
In this development stage Sol_ASM can be very annoying about white spaces requirements. This behaviour is in part because the parser always considers spaces as token separators no mater what. This helps parsing speed and eases debugging but it also makes some problems.
It is my intention to remove those limitations in later versions but for now you will have to know and respect themThe expression parser doe shandle white spaces but the high level tokenizer does break expressions on spaces and because of this you must avoid spaces in expressions or if you need spaces then enclose the whole expression in < and > or { and }
;--------------- ; this is OK ;--------------- mov eax, (7*4)+(5*PACKET_SIZE) ; this is an expression with no spaces inside mov ecx, WND_CHILD+WND_MINIMIZE ; this is an expression with no spaces inside mov ecx, size MY_STRU ; this is not an expression mov al, byte [esi] ; this is not an expression ;-------------------------------------------------------------- ; this is NOT OK because expressions can not contain spaces ;-------------------------------------------------------------- mov eax,(7*4) + (5 * PACKET_SIZE) mov eax,1 SHL 18 ; this expression needs spaces mov ecx,WND_RESIZE OR WND_CHILD OR WND_MINIMIZE ;----------------------------------------------- ; this is made OK by the use of < and > ;----------------------------------------------- mov eax, < (7*4) + (5 * PACKET_SIZE) > mov eax, < 1 SHL 18 > mov ecx, < WND_RESIZE OR WND_CHILD OR WND_MINIMIZE >
Runtime conditionals like .IF or .While or .Repeat do need spaces arround: paranthesis, conditions and logical operators.
;--------------- ; this is OK ;--------------- .if ( eax == 1 .and. ebx == 5 ) .or. ( [status] == 1 .and. [errors] == 0 ) ... .endif ;------------------ ; this is NOT OK ;------------------ .if (eax==1 .and. ebx==5).or.([status]==1.and.[errors]==0) ... .endif ;----------------------------------------------------------------- ; here use {} because < and > are conditional operators also ;----------------------------------------------------------------- .if ( eax < { 7FFFFh SHR 5 } ) .and. ( edx > { 1 SHL 7} ) ... .endif
{symbol_name} EQU {value or expression}
equ1 equ 40 ETH_RX_APPS_MAX EQU 1024 ETH_MEM_BLOCKS EQU ((ETH_RX_APPS_MAX*4)/4096)+1 equ_28 EQU < 1 SHL 28 >
Equates can not be redefined or double defined. However you can use the assignment operator for this:
{symbol_name} = {value or expression}
x = y + 1 y = 7
For example the folowing code will force Sol_ASM to make 8 passes until y = 7 and no longer changes it's value
#if $pass == 1 y = 0 #endif #if y < 7 y = y+1 #endif #echo " y=%x",y
Labels are defined in two modes:
{label_name}: {label_name} {data definition keyword} {data_items}
mov ecx,nr_of_items mov esi,items_ptr my_loop: ; perform some actions here add [esi+ITEM.quantity],1 ; next item add esi,size ITEM dec ecx jnz my_loop
In the above code sequence "my_loop" is a code label and serves as a target for the JNZ instruction.
or
.data my_account_balance dd 1234_5678h .code mov ecx,nr_of_invoices mov esi,invoices_ptr my_loop: ; perform some actions here mov eax,[esi+INVOICE.total] sub [my_account_balance],eax ; next invoice add esi,size INVOIVE dec ecx jnz my_loop
In the above code sequence "my_account_ballance" is a data label and serves as a parameter for the SUB instruction.
Labels defined outside of a procedure are global in name scope. Global labels can not be double defined.
Labels defined inside PROC ... ENDP construct are local in namespace to the procedure. Hence there can be multiple labels with the exact same name as long as they reside in different procedures.
Structures are defined like this:
STRUC {structure_name} {member_name1} {data_definition_keyword} {data_item} ... {member_name2} {data_reserve_keyword} {count} ... ENDS
STRUC ETH_PACKET packet_ptr dd ? packet_id dd ? packet_mac_src rb 16 packet_mac_dest rb 16 ENDS STRUC ETH_DRV drv_id dd ? drv_name rb 128 status dd ? packets_buff rs ETH_PACKET,1024 ENDS
As you can see structures can contain other structures. Once a structure is defined it can be used in subsequent data definitions.
Access to it's members can be done like this
.data my_eth ETH_DRV ? .code ; via pointer mov esi, my_eth mov eax,[esi + ETH_DRV.status] ; or by direct access mov [my_eth.status], 1 mov eax,[my_eth.status]
You can define a LOCAL variable in a PROC as having STRUC type and access it like this:
PROC my_proc stdcall ARG arg1, arg2 LOCAL my_eth :ETH_DRV, wc :WNDCLASSEX ; note the space between my_eth and :ETH_DRV it is required now mov [my_eth.status],2 mov [wc.cbSize], size WNDCLASSEX ; or via pointer lea esi,my_eth mov [rsi+ETH_DRV.status],4 ret ENDP
Structure size can be obtained like this:
add esi, SIZE ETH_DRV
Also you can obtain the offset of a member inside a structure like this:
mov eax, ETH_DRV.eth_status
Hence this code is also valid:
mov eax, ETH_DRV
And it will move the size of ETH_DRV structure into eax.
For clarity reasons the use of SIZE is recommended whenever possible.
You can access structure members like this:
.data my_driver rs ETH_DRV,16 .code mov esi,my_driver mov eax,[esi + ETH_DRV.packets_buff.packet_id] ...
You can define unnamed UNIONS inside a structure.
UNION {member_name1} {data_definition_keyword} {data_item} ... {member_name2} {data_reserve_keyword} {count} ENDU
struc pixel_format flags1 dd ? union r_mask dd ? y_mask dd ? union rx_mask dd ? ry_mask dd ? endu endu flags2 dd ? union g_mask dd ? u_mask dd ? endu ends
And you can access any UNION member just like any other structure member.
Procedures are defined like this:
PROC {proc_name} {proc_call_convention_type} USES {uses_list} ARG {arg_list} LOCAL {local_list} ; some code proc_label: ... ret ENDP
PROC Test_01 stdcall USES esi,edi ARG wnd_handle, wnd_action LOCAL count, my_var1, my_var2 mov esi,[wnd_handle] mov ecx,100 loop_here: mov eax,[esi] test eax,eax jz finish add [count],eax dec ecx jnz loop_here finish: mov eax,[count] ret ENDP
SOL_ASM will automatically generate PROLOGUE and EPILOGUE code and will generate code for handling of USES, ARG and LOCAL variables as needed.
Known calling conventions are:
Additionally you can use the "varg" statement to mark a procedure that uses variable arguments count.
For PROC's defined as NOFRAME Sol_Asm will not emit prologue and epilogue code but will emit PUSH/POP code for USES statements if present. In this case you should write the prologue and epilogue code yourself.
This can be overwritten if they have a structure type like this:
PROC Test_02 stdcall USES esi,edi ARG wnd_handle, wnd_action LOCAL my_var2 :MCTX l_point :POINT_3D ... ret ENDP
You can define a local procedure buffer like this:
PROC Test_02 stdcall USES esi,edi ARG wnd_handle, wnd_action LOCAL my_var1, my_buff [32], my_var2 :MY_CTX [32] ... ret ENDP
This will define a buffer of 32 dwords starting at "my_buff" and a 32 * SIZE MY_CTX buffer / vector at my_var2
For example: in PROC Test_02 above incrementing address from "my_buff" will hit "my_var1" and not "my_var2"
PROC Wnd_Proc1 win64 ARG hwnd, wmsg, wparam, lparam LOCAL tmp_hdc ;------------------------- ; spill is usually needed ;------------------------- mov [hwnd],rcx mov [wmsg],rdx mov [wparam],r8 mov [lparam],r9 ... ret ENDP
Procedures or imported functions can be used with INVOKE syntax:
INVOKE {function_name}, {param1},{param2}, ... {param_N} ; or with dynamic function in register mov rbx,[my_function] INVOKE {abi_name},rbx,{param1},{param2}, ..., {param_N}
invoke Str_Printf,ods_fname,ods_fname_fmt,[pass_nr] invoke OS_File_Create,ods_fname mov [ods_fhandle],eax mov eax,[My_Dynamic_Proc] invoke stdcall,eax,ecx,edx ; use ADDR to get the address of a local variable in a PROC PROC my_proc stdcall ARG arg1, arg2 LOCAL wc :WNDCLASSEX mov [wc.cbSize],size WNDCLASSEX ... invoke RegisterClassA, ADDR wc ... ret ENDP
Depending on each procedure definition or import function hints Sol_ASM will handle calling conventions details.
CINVOKE is a variation for invoke that will assume CDECL convention and will not perform parameter count checking.
SOL_ASM contains a MACRO processor that supports nested and recursive macros with VARARG and checked arguments.
A MACRO is defined like this:
MACRO {macro_name} MARG { marg_list [:REQ] [:VARARG] } ; some code @macro_label: ... ENDM
;--------------------------------------- ; define and output a simple string ; note: @ means local symbol for macros ;--------------------------------------- MACRO ODS_str MARG mpar1 #ifdef SHOW_DEBUG jmp @over1 @mstring1 db mpar1,0 @over1: pushad invoke Str_Len,@mstring1 invoke OS_File_Write_Dbg,[ods_fhandle],@mstring1,eax popad #endif ENDM
And can then be used like this:
ODS_str <13,10,"-------- Listing Sections -------">
Inside a MACRO the "@" prefix means that the symbol is local to this MACRO and will get a different name each time the MACRO is expanded.
A macro can have a variable number of arguments.
;--------------------------------------- ; define and output a formatted string ; note: @ means local symbol for macros ;--------------------------------------- MACRO ODS_fmt MARG mfmt, arg_list :VARARG jmp @over1 @mstring1 db mfmt,0 @over1: pushad invoke Str_Printf, sz_buff1, @mstring1, arg_list invoke OS_File_Write_Dbg, [ods_fhandle], sz_buff1, eax popad ENDM
And can then be used like this:
ODS_fmt <13,10,"Section:%u RVA=%x VSIZE=%x Name=%s">,ecx,[esi+PE_SECT.rva],[esi+PE_SECT.vsize],esi
The ":REQ" MARG type can be used to force MACRO parameter number check up to a specific argument position.
MACRO MTEST MARG a1,a2,a3,a4 :req , a5 mov eax,a1 mov ebx,a2 mov ecx,a3 mov edx,a4 ENDM
On macro invocation this will check for 4 macro arguments. And because of this "a5" can be missing but "a4" can not.
You can define a macro inside another macro... and so on.
MACRO M2 MARG arg1,arg2 mov eax,arg1 MACRO M3 MARG arg3,arg4 mov eax,arg3 push arg4 ENDM push eax push arg2 ENDM
On first invocation of M2 only it's body will be generated and M3 will be defined but not expanded.
In MACRO body the "&" character will trigger a MARG check and expansion even if found in the middle of another token or string.
MACRO M4 MARG arg1 arg2 in_label_&arg1: mov eax,<&arg1> db " In strings: &arg2",0 ENDM
A macro can invoke itself recursively.
MACRO MPUSH MARG p1,p2,p3,p4 #ifnb <&p1> push p1 MPUSH p2,p3,p4 #endif ENDM
EXITM can be used to return a token from a MACRO expansion.
MACRO RV MARG func, params invoke func,params ; return something from macro exitm eax ENDM ; later on in code ... mov ecx,RV GetModuleHandle invoke ExitProcess, < RV GetModuleHandleA > push RV GetModuleHandleA ...
You can use REPT to repeat a series of instructions.
x = 7 REPT 12 shl eax,x add ecx,3 x = x+1 ENDM
You can use FOR to repeat a series of instructions for each item in a list.
FOR {item} IN: {items list} {REV} DO { for macro body } ENDM
Sol_Asm will expand the {for macro body} for each element in {items list} and will replace any occurrence of {item} in the {macro body} with current {items list} element.
The "REV" keyword is optional and if present then the {items list} will be parsed in reversed order.
FOR can be used to iterate the variable parameters of a MACRO.
MACRO my_invoke MARG func :req, params :vararg FOR item IN: params REV DO push item ENDM call func ENDM
The above sample will define your own INVOKE like macro and you can later on use it like this:
my_invoke My_Func,eax,0,1,"123",[ecx]
You can conditionally eliminate a block of source code at compile time by using the following directives:
Directive | Description |
---|---|
#ifdef {symbol} | if symbol is defined |
#ifndef {symbol} | if symbol is not defined |
#ifb {token} | if token is blank |
#ifnb {token} | if token is not blank |
#if_used {symbol} | if symbol is used in code |
#if_not_used {symbol} | if symbol is not used in code |
#if {condition} | if condition is true |
#ifdef {symbol_name} ; code block for true .... #else ; code block for false #endif
;------------------------------------------------ ; this checks the command line /binary option ;------------------------------------------------ #ifdef /binary org 0B000h disp 0B000h #endif #if $ >= 512 #echo "boot sector address overflow: %x", $ #endifObserve how command line options get auto promoted as EQU symbol and can be tested by #ifdef
#ifdef can be nested on multiple levels so the following example is valid also.
#IFDEF INTEL mov eax,1 #IFDEF WIN32 mov esi,32h #ifdef LUCKY mov ecx,33h #else mov ecx,11h #endif #ELSE mov esi,16h #ENDIF mov edi,88h #ELSE mov eax,2 #IFDEF WIN32 mov esi,32h #ifdef LUCKY mov ecx,33h #else mov ecx,11h #endif #ELSE mov esi,16h #ENDIF mov edi,77h #ENDIF
You can use runtime high level .IF .ELSEIF .ELSE .ENDIF constructs in SOL_ASM.
Sol_ASM will generate the needed compare, jump code and labels internally. This internal code generation is preformed much faster than a MACRO can do.
.IF {operand1} {condition_a} {operand2} ; code block for {condition_a} true .... .ELSEIF {operand3} {condition_b} {operand4} ; code block for {condition_b} true ... .ELSE ; code block for all above conditions false ... .ENDIF
.if [parse_mode] == 1 .if [parse_status] == 1 mov ecx,1 .elseif [parse_status] == 2 mov ecx,2 .elseif [parse_status] <= 7 mov ecx,7 .else mov ecx,-1 .endif .elseif [parse_mode] == 2 mov edx,2 .elseif eax == swap "xor " mov edx,7 .else mov edx,-1 .endif
Known condition operators are:
Operator | Description | Flag checked |
---|---|---|
"==" | equal | ZF = 1 |
"!=" | not equal | ZF = 0 |
"<" | unsigned smaller | CF = 1 |
">" | unsigned greater | (NBE) |
"<=" | smaller or equal | (BE) |
">=" | greater or equal | (NC) |
"zero?" | Z flag | (Z) |
"zero?" | not Z | (NZ) |
"carry?" | Carry | (C) |
"!carry?" | not Carry | (NC) |
"sign?" | S flag | SF |
"!sign?" | not signed | SF |
Overflow? | OF = 1 | OF |
!Overflow? | OF = 0 | OF |
parity? | P = 1 | PF |
!parity? | P = 0 | PF |
You can use multiple conditions in .IF like this:
.if ( eax == 1 .or. ecx == 2 ) .and. esi != 7 ... .elseif dl == "a" .or. dl == "b" .or. dl == "s" ... .endif
By default all comparations in a .IF are unsigned.
You can use signed conditions in .IF by prefixing the condition with the signed keyword like this:
.if signed edx > = [edi + HTML_CTX.wnd_dy] ; flag done mov eax,1 ret .endif
You can use high level REPEAT ... UNTIL constructs. SolAsm will generate the needed code.
.REPEAT {repeat body} .UNTIL {condition}
mov ecx,17 .repeat mov edx,0 .repeat inc edx .until edx > 7 dec ecx .until ecx == 0
You can use high level WHILE ... ENDW constructs. SolAsm will generate the needed code.
.WHILE {condition} {while body} .ENDW
mov ecx,17 .while ecx > 1 mov edx,0 .while edx < 7 inc edx .endw dec ecx .endw
ENUM is a kind of auto generated EQU sequence. Sol_Asm will auto increment the values and will check for limits.
You can define ENUMS like this:
ENUM {enum_name},{start_value},{max_value} {enum name items} ENDE
ENUM Modes,77h,ffh MODE_1 MODE_2 MODE_3 MODE_21 ENDE
Sol_ASM will generate: MODE_1 EQU 77h , MODE_2 EQU 78h ... and so on for each ENUM item in sequence and will check for limits.
DEFINE creates symbolic constants for text or strings. It behaves like a kind of EQU for strings and tokens.
This allows you to:
DEFINE {symbolic_name},{text}
An alternative name for DEFINE is TEQU
define text1 "planet earth" define text2 < swap "ecx" > define text3 ebx define text4 [esi+4] define text5 STRCUT has_ebx_inside,5,3 define and xor ... .data my_stting db text1 ; in fact "planet earth" .code mov eax,text2 ; in fact mov eax,swap "ecx" mov eax,text3 ; in fact mov eax,ebx mov eax,text4 ; in fact mov eax,[esi+4] mov eax,text5 ; in fact mov eax,ebx and eax,eax ; in fact XOR eax,eax
Defined text equates have some subtle types attached:
String functions allow you to operate on strings in text equates.
The folowing functions are available
Function | Description | Notes |
---|---|---|
STRCUT | Extract a sub string from a string | |
STRADD | Add two strings | |
STRLEN | Obtain Length of string | the result is a numeric token |
STRCUT will extract a sub string from a source string
STRCUT {source},{start_pos},{length}
define ebx1 STRCUT has_ebx_inside,5,3 ; ebx type token define ebx2 STRCUT "has_ebx_inside",6,3 ; "ebx" type string define ebx3 STRCUT [ebx+ecx],1,3 ; [ebx] type ModRM
The result of STRCUT has the same type as the source
STRADD will add two strings together.
STRADD {string1},{string2}
define txt1 STRADD "planet"," earth" ; "planet earth" string define txt2 STRADD in,voke ; invoke token define txt3 STRADD [ebx],[+ecx] ; [ebx+ecx] ModRM
The result of STRADD has the same type as string2
STRLEN will return the length of a string.
STRLEN {string1}
len1 equ STRLEN "planet" len2 equ STRLEN invoke len3 equ STRLEN [ebx+ecx] len4 equ STRLEN STRADD "planet"," earth" define txt1 STRCUT "has_ebx_inside",6, STRLEN "ebx"
The #ECHO directive allows you to emit formated message text at compile time. This can be used to debug macros or inform user of compile stages.
#ECHO {format string},{arg1},{arg2},...
MY_EQU equ 1234 define my_str " this is a string message" .code ... #echo "\n code end=%x section base=%x, my_equ=%u string=%s",$,$$$,MY_EQU,my_str
As a format specificator you can use one of the folowing:
Format | Description |
---|---|
%x | Hexadecimal number |
%u | unsigned decimal number |
%d | signed decimal number |
%s | an ASCII null terminated string |
\n | new line (CR+LF) |
\t | TAB |
%% | the "%" ASCII char itself |
\\ | the "\" ASCII char itself |
The OPTION directive is used to setup compiler optional behaviour.
OPTION {option_type}, [ {option_value} ]
The folowing options are available
Option | Description |
---|---|
list_on | activates listing output |
list_off | deactivates listing output |
proc_align { value } | setups alignment for PROC (default is 16 bytes) |
This directive allows you to read a value from compiled code or data at compile time.
#LOAD {equ_name}, [byte/word/dword/qword] {address}
my_db db 1 #load x,byte my_db #echo " x=%x",x
This directives allows you to write a value to compiled code or data at compile time.
#STORE {address}, [byte/word/dword/qword] {value}
my_db db 1 #store my_db, byte 55h
Sol_Asm does contain a mini resource compiler.
It can parse some RC scripts elements and can generate an "in memory" templates for them.
In resource scripts Sol_ASM does support C style hexadecimal constants.
You can define a resource ID like this:
#define {ID value}
#define IDD_DLG1 1000 #define IDC_BTN1 1001 #define IDC_EDT1 1002 #define IDC_BTN2 1003
You can define a DIALOG like this:
{dialog_id} DIALOGEX {dlg_x},{dlg_y},{dlg_dx},{dlg_dy} CAPTION {caption string} STYLE {style value} BEGIN { control definitions } END
You can define a CONTROL like this:
CONTROL {caption},{id},{"class"},{flags},{x},{y},{dx},{dy},{flags_ex}
#define IDD_DLG1 1000 #define IDC_BTN1 1001 #define IDC_EDT1 1002 #define IDC_BTN2 1003 #define IDC_STC1 1004 IDD_DLG1 DIALOGEX 57,7,258,158 CAPTION "Sol_Asm Dialog 01" STYLE 0x10CF0000 BEGIN CONTROL "Save", IDC_BTN1,"Button", 0x50010000, 134,114,50,13, 0x00000000 CONTROL "Exit", IDC_BTN2,"Button", 0x50010000, 196,112,42,15, 0x00000000 CONTROL "Name", IDC_STC1,"Static", 0x50000000, 12,24,22,8, 0x00000000 CONTROL "Text Edit", IDC_EDT1,"Edit", 0x50010000, 50,22,134,11, 0x00000200 END
You can define a MENU like this:
{menu_id} MENUEX BEGIN POPUP {"text"},{id} BEGIN MENUITEM {"text"},{id} END END
SEPARATOR EQU 0 #define IDR_MENU 10000 #define IDM_File 10001 #define IDM_File_Open 10004 #define IDM_File_New 10005 #define IDM_File_Exit 10009 #define IDM_Edit 10002 #define IDM_Edit_Cut 10006 #define IDM_Edit_Copy 10007 #define IDM_Edit_Paste 10008 IDR_MENU MENUEX BEGIN POPUP "File",IDM_File BEGIN MENUITEM "Open",IDM_File_Open MENUITEM "New",IDM_File_New MENUITEM SEPARATOR MENUITEM "Exit",IDM_File_Exit END POPUP "Edit",IDM_Edit BEGIN MENUITEM "Cut",IDM_Edit_Cut MENUITEM "Copy",IDM_Edit_Copy MENUITEM "Paste",IDM_Edit_Paste END END
You can emit a compiled resource as a data item like this:
EMIT_RSRC {resource_id}
align 32 my_dialog: EMIT_RSRC IDD_DLG1 align 32 my_menu: EMIT_RSRC IDR_MENU
and in your code you can write:
... invoke DialogBoxIndirectParamA,[hInstance],my_dialog,0,Dlg_Proc,0 ... invoke LoadMenuIndirectA,my_menu
Sol_Asm can produce a listing file when the "-list" command line option is used.
{include_level} {macro_level} {flag} {address} {program text} {opcodes}
Shows the depth of include file nesting.
Shows the depth of macro expansion nesting
It is an internal flag to Sol_Asm and changes often for debugging. Currently it shows if there is a need for a new pass to solve a symbol.
Shows the address for current line being assembled. For OBJ formats it shows the offset in section since the final address will be setup by the linker.
Shows the program source text.
This includes:
It shows the CPU opcodes or data generated by Sol_Asm for each source line as a series of hexadecimal bytes.
Opcode column is aligned to column 128 if possible and expands up to column 224.
If more opcodes are needed then a new row is generated. If more than 4 rows are needed then an ellipsis "..." is shown and further opcodes are not shown anymore.
1 0 0 00401047 1 0 0 00401047 ;-------------------------- 1 0 0 00401047 ; make up a build date 1 0 0 00401047 ;-------------------------- 1 0 0 00401047 mov esi,build_time BE 3C A4 42 00 1 0 0 0040104C 1 0 0 0040104C xor eax,eax 33 C0 1 0 0 0040104E xor edx,edx 33 D2 1 0 0 00401050 xor ecx,ecx 33 C9 1 0 0 00401052 1 0 0 00401052 mov ax,[esi + OS_TIME.year] 66 8B 46 00 1 0 0 00401056 mov cx,[esi + OS_TIME.month] 66 8B 4E 02 1 0 0 0040105A mov dx,[esi + OS_TIME.day_of_month] 66 8B 56 06 1 0 0 0040105E 1 0 0 0040105E invoke Str_Printf,sz_tmp1,sz_fmt_bld1,eax,ecx,edx 1 0 0 0040105E push edx 52 1 0 0 0040105F push ecx 51 1 0 0 00401060 push eax 50 1 0 0 00401061 push sz_fmt_bld1 68 69 A0 42 00 1 0 0 00401066 push sz_tmp1 68 00 16 43 00 1 0 0 0040106B call Str_Printf E8 B0 07 00 00 1 0 0 00401070 add esp, 00000014h 83 C4 14
Sol_Asm does use separated NAMESPACES for:
Because of this you can have a PROC with the same name as a STRUC but not two PROC's or two STRUC's with the same name.
However for now this is not under the control of the programmer and hence it is advised to avoid such coding practice because you can not control the order in witch SOl_ASM searches the separated namespaces.
It is my intention to provide a mechanism for controlling and defining namespaces to the user.
Sol ASM does require a 386 CPU as a minimum and does benefit form new advanced CPU's.
SOL_ASM pre allocates approximatively 24Mega bytes at startup.
Each section gets 1M at define time and that is eventually reallocated when needed.
Additional memory is allocate when needed for files, imports, macro's etc
Sol asm was tested on WinXP, Solar OS and WinXP64 but it should also work on Win95, win98, win2k, win2003 and Vista
Starting from version 14.02 Sol_Asm also runs on Linux and on UNIX like OSes that can link Sol_Asm OBJ against a limited set of LIBC functions.
A version for Mac OS X is also available in ELF OBJ format. You can use Agner Fog's OBJCONV program to convert it to MACH-O and link to LIBC to obtain the executable on your Mac.
Speed testing was performed on two big projects: Sol_Asm itself and Solar_OS.
Synthetic testing was performed on files with 10.000 or 100k PROC's
Solar Assembler version 0.10.01 Copyright (C) 2004-2008 Bogdan Valentin Ontanu, All rights reserved. Build on 2008_2_23 at 7:14:23 Assembling file: sol_asm2.asm Assembler pass: 1 Assembler pass: 2 Assembler pass: 3 Assembler pass: 4 Assembler lines: 67866 Output bytes: 192512 Assembler time: 406 ms ---------------------------
4 pass x 67.866 lines = 271.464 lines in 406 ms --> 668.630 lines per second
Solar Assembler version 0.10.01 Copyright (C) 2004-2008 Bogdan Valentin Ontanu, All rights reserved. Build on 2008_2_23 at 7:14:23 Assembling file: system_32.asm Assembler pass: 1 Assembler pass: 2 Assembler pass: 3 Assembler lines: 111403 Output bytes: 534016 Assembler time: 578 ms ---------------------------
3 pass x 111.403 lines = 334.209 lines in 578 ms --> 578.216 lines per second
This are real projects with many PROC's, STRUC's, MACRO's and code.
Testing was performed on an laptop with an Intel Core 2 Duo CPU at 2Ghz and with 1G of RAM in WinXP 32.
8 bit registers ------------------------------- "al" "r8l" "spl" "cl" "r9l" "bpl" "dl" "r10l" "sil" "bl" "r11l" "dil" "ah" "r12l" "ch" "r13l" "dh" "r14l" "bh" "r15l" 16 bits registers ------------------------------- "ax" "r8w" "es" "cx" "r9w" "cs" "dx" "r10w" "ss" "bx" "r11w" "ds" "sp" "r12w" "fs" "bp" "r13w" "gs" "si" "r14w" "di" "r15w" 32 bits registers ------------------------------- "eax" "r8d" "ecx" "r9d" "edx" "r10d" "ebx" "r11d" "esp" "r12d" "ebp" "r13d" "esi" "r14d" "edi" "r15d" 64 bits registers ------------------------------- "rax" "r0" "r8" "rcx" "r1" "r9" "rdx" "r2" "r10" "rbx" "r3" "r11" "rsp" "r4" "r12" "rbp" "r5" "r13" "rsi" "r6" "r14" "rdi" "r7" "r15" MMX registers -------------------------------- "mm0" "mm1" "mm2" "mm3" "mm4" "mm5" "mm6" "mm7" FPU registers -------------------------------- "st0" "st1" "st2" "st3" "st4" "st5" "st6" "st7" XMM registers -------------------------------- "xmm0" "xmm1" "xmm2" "xmm3" "xmm4" "xmm5" "xmm6" "xmm7" "xmm8" "xmm9" "xmm10" "xmm11" "xmm12" "xmm13" "xmm14" "xmm15"
0 mov 1 lea 2 movzx 3 movsx 4 bswap 5 xchg 6 xor 7 cmp 8 add 9 sub 10 or 11 and 12 sbb 13 adc 14 shl 15 shr 16 sar 17 rol 18 ror 19 rcl 20 rcr 21 sal 22 shld 23 shrd 24 test 25 not 26 neg 27 inc 28 dec 29 div 30 idiv 31 mul 32 imul 33 call 34 jmp 35 loop 36 ret 37 retn 38 int 39 int3 40 into 41 iret 42 iretd 43 hlt 44 leave 45 push 46 pushad 47 pusha 48 pushfd 49 pushf 50 pop 51 popad 52 popa 53 popfd 54 popf 55 jo 56 jno 57 jc 58 jnc 59 jb 60 jnb 61 jnae 62 jae 63 jz 64 jnz 65 je 66 jne 67 jbe 68 jnbe 69 jna 70 ja 71 js 72 jns 73 jpe 74 jpo 75 jl 76 jnl 77 jnge 78 jge 79 jle 80 jnle 81 jng 82 jg 83 rep 84 movsb 85 movsd 86 movsw 87 stosb 88 stosd 89 stosw 90 lodsb 91 lodsd 92 lodsw 93 scasb 94 scasd 95 nop 96 clc 97 stc 98 daa 99 das 100 cbw 101 cdq 102 cld 103 cmc 104 aaa 105 aas 106 lahf 107 lock 108 cpuid 109 rdtsc 110 aad 111 aam 112 out 113 in 114 finit 115 fninit 116 fld 117 fild 118 fst 119 fstp 120 fistp 121 fadd 122 faddp 123 fiadd 124 fsub 125 fisub 126 fdiv 127 fdivrp 128 fmul 129 fmulp 130 fimul 131 fxch 132 fucompp 133 fclex 134 fnclex 135 fnop 136 fchs 137 fabs 138 ftst 139 fxam 140 fld1 141 fldl2t 142 fldl2e 143 fldpi 144 fldlg2 145 fldln2 146 fldz 147 f2xm1 148 fyl2x 149 fptan 150 fpatan 151 fxtract 152 fprem1 153 fdecstp 154 fincstp 155 fprem 156 fyl2xp1 157 fsqrt 158 fsincos 159 frndint 160 fscale 161 fsin 162 fcos 163 emms 164 sidt 165 lidt 166 lgdt 167 sgdt 168 cli 169 sti 170 wbinvd 171 xlat 172 db 173 dw 174 dd 175 dq 176 dt 177 do 178 real4 179 real8 180 real10 181 rb 182 rw 183 rd 184 rq 185 rt 186 ro 187 rs 188 equ 189 align 190 proc 191 uses 192 arg 193 local 194 endp 195 .if 196 .elseif 197 .else 198 .endif 199 #ifdef 200 #ifndef 201 #else 202 #endif 203 #ifnb 204 #ifb 205 #if_used 206 #if_not_used 207 macro 208 endm 209 exitm 210 rept 211 invoke 212 cinvoke 213 cdecl 214 stdcall 215 include 216 incbin 217 incfrom 218 import_dll 219 from_dll 220 import_lib 221 from_lib 222 import_func 223 import 224 extern 225 export 226 alias 227 struc 228 struct 229 ends 230 enum 231 ende 232 .entry 233 org 234 disp 235 .use16 236 .use32 237 .use64 238 section 239 class_code 240 class_data 241 class_imports 242 class_relocs 243 class_bss 244 class_exports 245 class_rsrc 246 #define 247 begin 248 end 249 dialogex 250 caption 251 style 252 control 253 menuex 254 popup 255 menuitem 256 emit_rsrc 257 .echo 258 $time
;------------------------------------------------------ ; Sol_Asm assembler sample ; Copyright (c) 2004-2008, Bogdan Valentin Ontanu ; All rights reserved. ;------------------------------------------------------ ;---------------------------- ; define imports ;---------------------------- from_dll kernel32.dll import ExitProcess import GetStdHandle from_dll user32.dll import MessageBox alias MessageBoxA ;------------------- ; define sections ;------------------- section "code" class_code section "data" class_data section "idata" class_imports .data sz_message db "First Win32 PE application",0 sz_title db "Sol_ASM",0 .code ;------------------------ ; define entry point ;------------------------ .entry Start Start: ;----------------------------- ; the classical message box ;----------------------------- invoke MessageBox, 0, sz_message, sz_title, 3 ;-------------------------- ; done here, exit nicely ;-------------------------- invoke ExitProcess,0 ret
Assuming the file in named: test_win32.asm and Sol_Asm is in path you can build this sample with the following command:
sol_asm2 test_win32.asm test_win32.exe -pe32
The resulted executable should display a message box when run.