To make a program fast each step in code generation has to be done as early as possible. In particular everything that can be done at compile-time should be done at compile-time. It is also possible that a specific algorithm uses input, different parts of which will be available at different stages of the program run-time. The data available at the early stage can be partially applied to existing algorithm, yielding a modified version of it - like curring - but resulting not in closure, containing unmodified code and partial data, but in some reductions applied to underlying code. Such reductions might include replacement the known variables(registers) with constants(immediate values), loops unrolling, elimination of unnecessary tests etc. All that might result in a better performance.
The intended object of this kind of reductions have to be low-level. It can be some kind of lambda-calculi or a compiler-generated intermediate language but the machine code or its assembly equivalent is practical enough too. Some interesting possibilities arise if the meta-language coincides with the object-language, .e.g. possibility of applying of a code-modifying procedure to itself. That's why the code generating functions used in lib45
themselves are written in assembly language. The downside of it is restricted portability and might-be-not-effective-on-next-generation-of-CPU issue.
String search is widely used and there are plenty of stuff to compare with. The intended use of routines comprising the library is the situation when the substring to be searched for (also referred as a pattern
) is known at earlier stage than string to be searched. For example a pattern might be obtained via the user input like in the grep
program. In all cases the generated routine have to run long enough to pay back for the time spent on its generation.
#include <lib45.h> void *buf = malloc(l45_searchfn_zt_size(strlen("fox"))); l45_searchfn searcher = l45_make_searchfn_zt(buf, "fox"); char *found = searcher("A quick red fox ..."); /* or */ char *found = ((l45_searchfn)buf)("A quick red fox ...");
l45_searchfn __stdcall l45_make_searchfn_zt(void* buf, char* pattern);
Generates routine for searching PATTERN in a zero-terminated string and places it in BUF buffer. The buffer has to be at least l45_searchfn_zt_size(strlen(pattern))
bytes large
Returns pointer to produced routine of type:
typedef char* (__stdcall *l45_searchfn)(char* string);
The produced routine will return a pointer to the found pattern, null pointer otherwise.
l45_searchfn_zt_size
function see section Determine the size of generation buffer to determine the buffer size.
l45_make_searchfn_n
or l45_make_searchfn
see section Searching a string with known length see section Searching an arbitrary buffer
#include <lib45.h> void *buf = malloc(l45_searchfn_n_size(strlen("fox"))); l45_searchfn_n searcher = l45_make_searchfn_n(buf, "fox"); char *found = searcher("A quick red fox ...", strlen("A quick red fox ...")); /* or */ char *found = ((l45_searchfn_n)buf)("A quick red fox ...",strlen("A quick red fox ..."));
l45_searchfn __stdcall l45_make_searchfn_n(void* buf, char* pattern);
Generates routine for searching PATTERN in a string with known(at the moment when the routine is run) length and places it in BUF buffer. The buffer has to be at least l45_searchfn_n_size(strlen(pattern))
bytes large.
Returns pointer to produced routine that is of the following type:
typedef char* (__stdcall *l45_searchfn_n)(char* string, unsigned int string_size);
The produced routine will return a pointer to the found pattern, null pointer otherwise.
l45_searchfn_n_size
function see section Determine the size of generation buffer to determine the buffer size .
#include <lib45.h> void *buf = malloc(l45_searchfn_size(strlen("fox"))); l45_searchfn searcher = l45_make_searchfn(buf, "fox"); char *found = searcher("A quick red fox ..."); /* or */ char *found = ((l45_searchfn)buf)("A quick red fox ...");
l45_searchfn __stdcall l45_make_searchfn(void* buf, char* pattern);
Generates routine for searching PATTERN in an arbitrary memory buffer. The generated code will be placed in BUF buffer.
The buffer has to be at least l45_searchfn_size(strlen(pattern))
bytes large
l45_searchfn __stdcall l45_make_searchfni(void* buf, char* pattern);
This is a case-insensitive variant of l45_make_searchfn
.
The buffer has to be at least l45_searchfni_size(strlen(pattern))
bytes large
Returns pointer to produced routine. The routine has the type as follows:
typedef char* (__stdcall *l45_searchfn)(char* string);
The generated routine returns pointer to the first occurrence of the PATTERN in the STRING.
l45_searchfn_size
( or l45_searchfn_size
for case-insensitive variant ) function see section Determine the size of generation buffer to determine the buffer size .
#include <lib45.h> void *buf = malloc(l45_searchfn_ln_size(strlen("fox"))); l45_searchfn searcher = l45_make_searchfn_ln(buf, "fox", 32); char *found = searcher("A quick red fox ..."); /* or */ char *found = ((l45_searchfn)buf)("A quick red fox ...");
l45_searchfn __stdcall l45_make_searchfn_ln(void* buf, char* pattern, char separating_char);
Generates routine for searching PATTERN or character SEPARATING_CHAR in an arbitrary memory buffer. The generated code will be placed in BUF buffer. The PATTERN should not contain SEPARATING_CHAR.
The buffer has to be at least l45_searchfn_ln_size(strlen(pattern))
bytes large
l45_searchfn __stdcall l45_make_searchfn_lni(void* buf, char* pattern, char separating_char);
This is a case-insensitive variant of l45_make_searchfn_ln
.
The buffer has to be at least l45_searchfn_lni_size(strlen(pattern))
bytes large
Returns pointer to produced routine. The routine has the type as follows:
typedef char* (__stdcall *l45_searchfn)(char* string);
The generated routine returns pointer to the first occurrence of the PATTERN or the SEPARATING_CHAR in the STRING.
l45_searchfn_ln_size
( or l45_searchfn_lni_size
for case-insensitive search ) function see section Determine the size of generation buffer to determine the buffer size .
#include <lib45.h> void *buf = malloc(l45_searchfn_l_size(strlen("fox"))); char *line; unsigned int lines; l45_searchfn searcher = l45_make_searchfn_l(buf, "fox", 32); char *found = searcher("A quick red fox ...", &lines, &line); /* or */ char *found = ((l45_searchfn_l)buf)("A quick red fox ...", &lines, &line);
l45_searchfn __stdcall l45_make_searchfn_l(void* buf, char* pattern, char separating_char);
Generates code for searching PATTERN and counting SEPARATING_CHARs in an arbitrary string. The code will be placed in a BUF buffer. The PATTERN should not contain SEPARATING_CHAR. The buffer has to be at least l45_searchfn_l_size(strlen(pattern))
bytes large
l45_searchfn __stdcall l45_make_searchfn_li(void* buf, char* pattern, char separating_char);
This is a case-insensitive variant of l45_make_searchfn_l
.
The buffer has to be at least l45_searchfn_li_size(strlen(pattern))
bytes large
Returns pointer to produced routine. The routine has the type as follows:
typedef char* (__stdcall *l45_searchfn_l)(char* string, unsigned int* c_count, char** last_c);
The parameters are :
l45_searchfn_l_size
function see section Determine the size of generation buffer to determine the buffer size .
The following five functions return the size in bytes required for generation buffer. The proper alignment of buffer start is also advisable. The functions are written (to be exact - generated)in C language.
int l45_searchfn_zt_size(int pattern_size)
int l45_searchfn_n_size (int pattern_size)
int l45_searchfn_size (int pattern_size)
int l45_searchfn_ln_size(int pattern_size)
int l45_searchfn_l_size (int pattern_size)
int l45_searchfni_size (int pattern_size)
int l45_searchfn_lni_size(int pattern_size)
int l45_searchfn_li_size (int pattern_size)
a function named l45_searchfn_XX_size
returns buffer size needed its l45_make_searchfn_XX
counterpart.
The next two functions are written in Assembler but do not use run-time generation.
unsigned int __stdcall l45_strlen(char* string);
Returns the length of the zero-terminated string STRING.
unsigned int __stdcall l45_strlen_n(char* buffer, unsigned int size);
Returns the length of the zero-terminated string STRING if the length is less than SIZE, SIZE otherwise. Useful e.g. when determining whether the given file is binary.
l45_bench
provides a simple wrapper around rdts
instruction.
#include <lib45.h> #include <stdio.h> unsigned int s[2]; l45_bench(s); ...do what you want to benchmark... printf("it took 0x%x0000 CPU cycles\n", l45_bench(s));
unsigned int __stdcall l45_bench(unsigned int[2] buffer);
Fills the contents of BUFFER with the number of CPU clocks counted from the moment of power-up.
Returns the difference with the previous contents of the buffer multiplied by 0x10000.
The elapsed time in seconds can be computed as <returned value> * 0x10000 / <frequency of CPU>.
gssp
has been written to demonstrate usage of lib45
functions. Command interface tries to resemble grep
as much as possible. It implements only a subset of grep
options, also there are certain restrictions on size of file processed, but it runs rather fast and can be used along with grep
e.g. as Emacs
inferior process.
gssp [options] substring [files..]
Where SUBSTRING is substring to be searched literally(Regular expressions are not supported). If no files specified stdin
will be searched.
mmap
system call to read input, instead of
the default read
system call. In some situations, `--mmap'
yields better performance.
mmap
option only about first 4MB of file will be searched.( The warning is printed). With mmap
gssp
will fail on extra large (about GB) files. Unlike grep
gssp
also uses mmap
on cygwin and non-cygwin NT.
stdin
will be searched. ( The warning is printed )
grep
no CRLF
translations are performed.
To use gssp
from within Emacs
along with grep
`misc/gssp.el' is provided. You have to manually copy this file to somewhere in Emacs lisp path, byte-compile it and include the line
load "gssp" in your `.emacs' file.
The provided commands M-x gssp
and C-u M-x gssp
are analogous to M-x grep
and C-u M-x grep
.
lib45
library can be used only on Intel 32-bit CPUs. Although it not uses instructions that are specific for Pentium IV, it optimized for use on it. As concerning various OS-es or object files formats lib45
is as portable as NASM
is. Currently the library has been build and tested for the following object file formats:
ELF
COFF/win32
OBJ/win32
The most important l45_make_searchfn_XX
functions are written in Assembly language.
The rationale is explained in see section Introduction. Technically the task of code generation is simple, e.g. all that is needed to generate instruction pop ebx
would be to write:
mov byte[proper address] ,0x5B
However, the resulting code will be hardly human-readable. To facilitate developing the special macro is used: if a line is prefixed with -
it will be not executed at generation time but translated to machine instruction and written to the proper place (the macros will keep track of it) in the generation buffer. For example the above sample would be written as:
- pop ebx
To visually discern register names in prefixed lines from their counterparts in the executing code
is advisable use the special asmi
Emacs
mode which can be found in the file `misc/asmi-mode.el' of the distribution directory.
To expand macro Perl
modules `Tools/instrgen.pm' and `Tools/codegen.pm' are used . For example to expand macros in file `foo.in' issue the following command from the distribution directory:
perl -Mlib=./Tools -Mcodegen -e"instrgen::transform 'foo.in'"
The result will be placed in `foo.out' and can be used as input for NASM
. Note that Perl
is needed only at build time.
To understand the content of this section properly, note that there are three distinct stages in operation of code generating functions:
-
, <
or >
are executed.
-
prefixThe prefixed line will be translated to machine code and placed into generation buffer.
+
prefix
The braced expression containing labels referring to generated(-
prefixed) code will be calculated. The line itself will execute at generation time.
The following generation-time instruction calculates the size of code contained between the labels start
and end
and puts the result into ecx
register
mov ecx,{end-start} -start: - xor eax,ebx - test eax,eax -end:
>
prefixForward jump in the generated code from the address that could be calculated at compile-time to the one that could not. Implemented as an fix-up at generation time.
<
prefix
Backward jump in the generated code from the address that could not be calculated at compile-time to the one that could. Note that both prefixes >
and are not needed if the distance between the jump origin and target can be calculated at compile-time.
*
prefixGeneration-time jump over some generated instructions. Used as hint to expansion program to properly update the assembly point.
-reset
commandSets the assembly point to the beginning of the generation buffer.
-REPEAT
and -END-REPEAT
commands
The generated(-
prefixed) code between these commands will be generated repeatedly repeatcnt
times. The register ebp
will be decremented from the repeatcnt
value to 1. repeatcnt
should be defined as NASM
macro expanding to a register or memory dword and assigned(loaded) to appropriate value before the first occurrence of -REPEAT
. This value should be constant within the generated file (or between -reset
commands. Due to this restriction the jumps between different repeat blocks are possible.
$<register>
expressions
Can be used in generated (-
prefixed) code. The value of generation-time register is inserted as immediate value in the generated instruction.
Example
the generating code below
mov edx,1 - cmp ecx,$edx
will result in:
cmp ecx,1
at run time.
-
prefixed) code can contain only the following instructions:
mov
, cmp
, xor
, any kind of jumps, ret
, retn
, retf
, dec
, inc
, test
, push
, pop
, sub
, add
, and
, ud0
, ud1
, ud2
, movsx
, movzx
, nop
, dd
, lea
, or
, call
. Not prefixed code can contain any kind of instructions recognized by NASM
-
prefixed) are used to support generation. Their use is restricted by the following rules:
esi
:assembly point(moving)
ebx
:start of assembly area
eax
,edx
:used to store intermediate values in
+
-
to label in -REPEAT
blocks or over -REPEAT
blocks.
-REPEAT
blocks but will lose their values
across the +
prefixed values or jumps prefixed with -
to label in -REPEAT
blocks or over -REPEAT
blocks.
edx
:used to store intermediate values in instructions prefixed with <
ebp
:loop counter in -REPEAT
blocks. Can be safely used outside of them but will lose its value across. In -REPEAT
blocks will be decremented from repeatcnt
to 1.
edi
, ecx
:are never changed or used implicitly
The "HELLO WORLD" example is included to demonstrate the use of some macros described above. Note that run-time generation of code that only prints a string does not bring you any gains in performance.
;; -*-asmi-*- ;; annotated "hello world" example ;; demonstrates: ;; * calling of external functions ;; * defining code and data in the same buffer global _l45_hello_world extern _printf section .text
Emacs
to employ asmi
mode which use makes
the text more readable.
global
is a NASM
directive telling that symbol l45_hello_world
is declared in this file and can be linked from outside
extern
is a NASM
directive telling that symbol printf
is defined elsewhere.
section
is a NASM
directive telling that the following lines contain code.
_l45_hello_world:
_l45_hello_world
(can be referred in C as l45_hello_world
) is a function receives the single parameter which is the address of generation buffer and returns the same value
push esi push ebx; mov esi,dword[esp+12] ; assembly point(moving) mov ebx,esi ; start of generation buffer
The lines above constitute a kind of standard prologue for lib45
generating routines. The registers esi
and ebx
will used by generating stage at run-time, when expanding the '-' prefixed routines, so they need to be saved
+ lea edx,[ebx+{string}] - push $edx
The code above calculates an absolute address of "HELLO WORLD" string and generates code pushing it on the stack.
mov edx, _printf + sub edx,{callsite} sub edx,ebx - call $edx -callsite:
The code above calculates the relative address of printf
function and generates calling code for it.
- add esp,4 - ret
;clean stack and return
- nop - nop - nop
some padding between code and data
-string: - dd 0x4C4C4548 ;HELL - dd 0x4F57204F ;O WO - dd 0x0A444C52 ;RLD - dd 0 ;terminating zero
The four lines above define four dwords containing the "HELLO WORLD" string.
mov eax,ebx
Generation buffer address is returned in eax
.
pop ebx pop esi ret
Restore the registers and return from the generating routine.
The produced code and data will look like:
0x0022fc08: push 0x22fc1c 0x0022fc0d: call 0x4015a0 <printf> 0x0022fc12: add esp,0x4 0x0022fc18: ret 0x0022fc19: nop 0x0022fc1a: nop 0x0022fc1b: nop 0x0022fc1c: 0x4c4c4548 0x0022fc20: 0x4f57204f 0x0022fc24: 0x0a444c52 0x0022fc28: 0x00000000
The full source for the example an be found in files `hello.c' and `l45_hello.in' of the distribution directory.
The "compare" example is included to demonstrate the use of some macros described above. For the sake of simplicity the comparison is performed on byte-by-byte basis. If performance is sought for the comparison on dword-by-dword basis should be used.
; -*-asmi-*- ;; annotated "compare" example ;; demonstrates: ;; * loop unrolling using '-REPEAT' global _l45_compare section .text ;NASM macro with this name should be defined if '-REPEAT' is used ;it can refer to register or dword in memory %define repeatcnt dword[esp-4] _l45_compare: ;receives two parameters ; * the address of generation buffer ; * the address of a pattern(zero-terminated string) to compare against ;returns the the address of generation buffer ; which might be cast to 'typedef int (*type)(char*)' type ; the generated function will return 0,1 or -1 like 'strcmp' does. push esi push ebx push ebp ;used in 'REPEAT' as loop counter mov esi,dword[esp+16] ; assembly point(moving) mov edi,dword[esp+20] ; pattern point mov ebx,esi ; start of generation buffer ;calculating the pattern length+1: xor ecx,ecx lloop: mov dl,byte[edi+ecx] add ecx,1 test dl,dl jne lloop mov repeatcnt,ecx; 'repeatcnt' has to be defined and contain the loop repetition count - mov eax,[esp+4];string pointer xor ecx,ecx -REPEAT mov al,byte[edi+ecx] - cmp byte[eax+$ecx],$al - jne near noneq add ecx,1 -END-REPEAT - xor eax,eax - ret -noneq: jl short less - mov eax,-1 - ret -less: mov eax,1 - ret mov eax,ebx pop ebp pop ebx pop esi ret
Given a second parameter "HELLO" l45_compare
produces the following code:
0x0022fc08: mov eax,DWORD PTR [esp+4] 0x0022fc0c: cmp BYTE PTR [eax],0x48 0x0022fc13: jne 0x22fc5d 0x0022fc19: cmp BYTE PTR [eax+1],0x45 0x0022fc20: jne 0x22fc5d 0x0022fc26: cmp BYTE PTR [eax+2],0x4c 0x0022fc2d: jne 0x22fc5d 0x0022fc33: cmp BYTE PTR [eax+3],0x4c 0x0022fc3a: jne 0x22fc5d 0x0022fc40: cmp BYTE PTR [eax+4],0x4f 0x0022fc47: jne 0x22fc5d 0x0022fc4d: cmp BYTE PTR [eax+5],0x0 0x0022fc54: jne 0x22fc5d 0x0022fc5a: xor eax,eax 0x0022fc5c: ret 0x0022fc5d: jl 0x22fc65 0x0022fc5f: mov eax,0xffffffff 0x0022fc64: ret 0x0022fc65: mov eax,0x1 0x0022fc6a: ret
It is easy to see that generated code does not contain loops.
The full source for the example an be found in files `compare.c' and `l45_compare.in' of the distribution directory.
To build the lib45
library the following tools are needed:
automake
v1.7 or higher and autoconf
v2.53 or higher. Some make
compatible with the above tools.
NASM
(Free Net-wide Assembler for Intel CPU) v0.98
Perl
v5.6 or higher
ar
like tool. See see section Portability for a detailed list
All the mentioned above tools have to be on the path.
tar -xzmf Lib45-0.01.tar.gz
cd
into this directory
automake --add-missing
aclocal
autoconf
configure
script e.g
./configurethe example above will search for the C compiler automatically. If You have several of them on path and want to use another , issue the above command with parameter e.g.
./configure CC=icl
make
make check
make install
. All stuff will be installed in `/usr/local/XXX' directories: header, lib, gssp
demo and info.
Tex
installed you can also make PDF version of the documentation issuing the command make pdf
.
Several tests are included. All the results of lib45
functions are compared with the results produced by standard C library functions like strstr
, strchr
, strlen
etc. Several tests also print out some benchmarking data. Some tests for some compilers can take longer time to run due to the odd implementation of strstr
function.
This is an alphabetical list of all gssp
commands, command-line
options, and environment variables.
Jump to: -
This document was generated on 12 October 2005 using texi2html 1.56k.