String Search Library Manual

version 0.11, 12 October 2005

Vadim Alexandrov


Introduction

Why generate code at run-time

To make a program fast each step in code generation has to be done as early as possible. In particular everything that can be done at compile-time should be done at compile-time. It is also possible that a specific algorithm uses input, different parts of which will be available at different stages of the program run-time. The data available at the early stage can be partially applied to existing algorithm, yielding a modified version of it - like curring - but resulting not in closure, containing unmodified code and partial data, but in some reductions applied to underlying code. Such reductions might include replacement the known variables(registers) with constants(immediate values), loops unrolling, elimination of unnecessary tests etc. All that might result in a better performance.

Why Assembler

The intended object of this kind of reductions have to be low-level. It can be some kind of lambda-calculi or a compiler-generated intermediate language but the machine code or its assembly equivalent is practical enough too. Some interesting possibilities arise if the meta-language coincides with the object-language, .e.g. possibility of applying of a code-modifying procedure to itself. That's why the code generating functions used in lib45 themselves are written in assembly language. The downside of it is restricted portability and might-be-not-effective-on-next-generation-of-CPU issue.

Why String Search

String search is widely used and there are plenty of stuff to compare with. The intended use of routines comprising the library is the situation when the substring to be searched for (also referred as a pattern) is known at earlier stage than string to be searched. For example a pattern might be obtained via the user input like in the grep program. In all cases the generated routine have to run long enough to pay back for the time spent on its generation.

Searching a zero-terminated string

Synopsis

 #include <lib45.h>
 void *buf = malloc(l45_searchfn_zt_size(strlen("fox")));
 l45_searchfn searcher = l45_make_searchfn_zt(buf, "fox");
 char *found = searcher("A quick red fox ..."); 

/* or */

 char *found = ((l45_searchfn)buf)("A quick red fox ..."); 

Description

l45_searchfn  __stdcall l45_make_searchfn_zt(void* buf, char* pattern); 

Generates routine for searching PATTERN in a zero-terminated string and places it in BUF buffer. The buffer has to be at least l45_searchfn_zt_size(strlen(pattern)) bytes large

Returns

Returns pointer to produced routine of type:

typedef char* (__stdcall *l45_searchfn)(char* string);

The produced routine will return a pointer to the found pattern, null pointer otherwise.

Notes

Searching a string with known length

Synopsis

 #include <lib45.h>
 void *buf = malloc(l45_searchfn_n_size(strlen("fox")));
 l45_searchfn_n searcher = l45_make_searchfn_n(buf, "fox");
 char *found = searcher("A quick red fox ...", strlen("A quick red fox ...")); 

/* or */

 char *found = ((l45_searchfn_n)buf)("A quick red fox ...",strlen("A quick red fox ...")); 

Description

l45_searchfn  __stdcall l45_make_searchfn_n(void* buf, char* pattern); 

Generates routine for searching PATTERN in a string with known(at the moment when the routine is run) length and places it in BUF buffer. The buffer has to be at least l45_searchfn_n_size(strlen(pattern)) bytes large.

Returns

Returns pointer to produced routine that is of the following type:

typedef char* (__stdcall *l45_searchfn_n)(char* string, unsigned int string_size);

The produced routine will return a pointer to the found pattern, null pointer otherwise.

Notes

Searching an arbitrary buffer

Synopsis

 #include <lib45.h>
 void *buf = malloc(l45_searchfn_size(strlen("fox")));
 l45_searchfn searcher = l45_make_searchfn(buf, "fox");
 char *found = searcher("A quick red fox ..."); 

/* or */

 char *found = ((l45_searchfn)buf)("A quick red fox ..."); 

Description

l45_searchfn  __stdcall l45_make_searchfn(void* buf, char* pattern); 

Generates routine for searching PATTERN in an arbitrary memory buffer. The generated code will be placed in BUF buffer. The buffer has to be at least l45_searchfn_size(strlen(pattern)) bytes large

l45_searchfn  __stdcall l45_make_searchfni(void* buf, char* pattern); 

This is a case-insensitive variant of l45_make_searchfn. The buffer has to be at least l45_searchfni_size(strlen(pattern)) bytes large

Returns

Returns pointer to produced routine. The routine has the type as follows:

typedef char* (__stdcall *l45_searchfn)(char* string);

The generated routine returns pointer to the first occurrence of the PATTERN in the STRING.

Notes

Searching for pattern or separating character

Synopsis

 #include <lib45.h>
 void *buf = malloc(l45_searchfn_ln_size(strlen("fox")));
 l45_searchfn searcher = l45_make_searchfn_ln(buf, "fox", 32);
 char *found = searcher("A quick red fox ..."); 

/* or */

 char *found = ((l45_searchfn)buf)("A quick red fox ..."); 

Description

l45_searchfn  __stdcall l45_make_searchfn_ln(void* buf, char* pattern, char separating_char);  

Generates routine for searching PATTERN or character SEPARATING_CHAR in an arbitrary memory buffer. The generated code will be placed in BUF buffer. The PATTERN should not contain SEPARATING_CHAR. The buffer has to be at least l45_searchfn_ln_size(strlen(pattern)) bytes large

l45_searchfn  __stdcall l45_make_searchfn_lni(void* buf, char* pattern, char separating_char);  

This is a case-insensitive variant of l45_make_searchfn_ln. The buffer has to be at least l45_searchfn_lni_size(strlen(pattern)) bytes large

Returns

Returns pointer to produced routine. The routine has the type as follows:

typedef char* (__stdcall *l45_searchfn)(char* string);

The generated routine returns pointer to the first occurrence of the PATTERN or the SEPARATING_CHAR in the STRING.

Notes

Searching for pattern and counting separating characters

Synopsis

 #include <lib45.h>
 void *buf = malloc(l45_searchfn_l_size(strlen("fox")));
 char *line;
 unsigned int lines;
 l45_searchfn searcher = l45_make_searchfn_l(buf, "fox", 32);
 char *found = searcher("A quick red fox ...", &lines, &line); 

/* or */

 char *found = ((l45_searchfn_l)buf)("A quick red fox ...", &lines, &line); 

Description

l45_searchfn  __stdcall l45_make_searchfn_l(void* buf, char* pattern, char separating_char); 

Generates code for searching PATTERN and counting SEPARATING_CHARs in an arbitrary string. The code will be placed in a BUF buffer. The PATTERN should not contain SEPARATING_CHAR. The buffer has to be at least l45_searchfn_l_size(strlen(pattern)) bytes large

l45_searchfn  __stdcall l45_make_searchfn_li(void* buf, char* pattern, char separating_char); 

This is a case-insensitive variant of l45_make_searchfn_l. The buffer has to be at least l45_searchfn_li_size(strlen(pattern)) bytes large

Returns

Returns pointer to produced routine. The routine has the type as follows:

typedef char* (__stdcall *l45_searchfn_l)(char* string, unsigned int* c_count, char** last_c);

The parameters are :

Notes

Determine the size of generation buffer

The following five functions return the size in bytes required for generation buffer. The proper alignment of buffer start is also advisable. The functions are written (to be exact - generated)in C language.

a function named l45_searchfn_XX_size returns buffer size needed its l45_make_searchfn_XX counterpart.

Faster strlen for long strings

The next two functions are written in Assembler but do not use run-time generation.

The length of zero-terminated string

unsigned int __stdcall l45_strlen(char* string);

Returns the length of the zero-terminated string STRING.

The length of zero-terminated string prefix

unsigned int __stdcall l45_strlen_n(char* buffer, unsigned int size);

Returns the length of the zero-terminated string STRING if the length is less than SIZE, SIZE otherwise. Useful e.g. when determining whether the given file is binary.

Performance benchmarking

l45_bench provides a simple wrapper around rdts instruction.

Synopsis

 #include <lib45.h>
 #include <stdio.h> 
 unsigned int s[2];
 l45_bench(s);

 ...do what you want to benchmark...
      
 printf("it took 0x%x0000 CPU cycles\n",  l45_bench(s));

Description

unsigned int __stdcall l45_bench(unsigned int[2] buffer);

Fills the contents of BUFFER with the number of CPU clocks counted from the moment of power-up.

Returns

Returns the difference with the previous contents of the buffer multiplied by 0x10000.

Notes

The elapsed time in seconds can be computed as <returned value> * 0x10000 / <frequency of CPU>.

Sample - gssp - greplike program

gssp has been written to demonstrate usage of lib45 functions. Command interface tries to resemble grep as much as possible. It implements only a subset of grep options, also there are certain restrictions on size of file processed, but it runs rather fast and can be used along with grep e.g. as Emacs inferior process.

Invocation

gssp [options] substring [files..]

Where SUBSTRING is substring to be searched literally(Regular expressions are not supported). If no files specified stdin will be searched.

Options

`-l'
`--files-with-matches'
Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning of every file will stop on the first match.
`-n'
`--line-number'
Prefix each line of output with the line number within its input file.
`-v'
`--invert-match'
Invert the sense of matching, to select non-matching lines.
`-r'
`--recursive'
For each directory mentioned in the command line, read and process all files in that directory, recursively.
`--mmap'
If possible, use the mmap system call to read input, instead of the default read system call. In some situations, `--mmap' yields better performance.
`-V'
`--version'
Print version information and exit.
`--help'
Display this help and exit.

Limitations

Emacs binding

To use gssp from within Emacs along with grep `misc/gssp.el' is provided. You have to manually copy this file to somewhere in Emacs lisp path, byte-compile it and include the line


load "gssp"

in your `.emacs' file.

The provided commands M-x gssp and C-u M-x gssp are analogous to M-x grep and C-u M-x grep.

Portability

lib45 library can be used only on Intel 32-bit CPUs. Although it not uses instructions that are specific for Pentium IV, it optimized for use on it. As concerning various OS-es or object files formats lib45 is as portable as NASM is. Currently the library has been build and tested for the following object file formats:

Internals

The most important l45_make_searchfn_XX functions are written in Assembly language. The rationale is explained in see section Introduction. Technically the task of code generation is simple, e.g. all that is needed to generate instruction pop ebx would be to write:


       mov  byte[proper address]    ,0x5B

However, the resulting code will be hardly human-readable. To facilitate developing the special macro is used: if a line is prefixed with - it will be not executed at generation time but translated to machine instruction and written to the proper place (the macros will keep track of it) in the generation buffer. For example the above sample would be written as:


-       pop ebx

To visually discern register names in prefixed lines from their counterparts in the executing code is advisable use the special asmi Emacs mode which can be found in the file `misc/asmi-mode.el' of the distribution directory.

To expand macro Perl modules `Tools/instrgen.pm' and `Tools/codegen.pm' are used . For example to expand macros in file `foo.in' issue the following command from the distribution directory:


        perl -Mlib=./Tools -Mcodegen -e"instrgen::transform 'foo.in'"

The result will be placed in `foo.out' and can be used as input for NASM. Note that Perl is needed only at build time.

Macro commands

To understand the content of this section properly, note that there are three distinct stages in operation of code generating functions:

- prefix

The prefixed line will be translated to machine code and placed into generation buffer.

+ prefix

The braced expression containing labels referring to generated(- prefixed) code will be calculated. The line itself will execute at generation time.

Example

The following generation-time instruction calculates the size of code contained between the labels start and end and puts the result into ecx register

       mov ecx,{end-start}   
-start:
-     xor eax,ebx
-     test eax,eax
-end:

> prefix

Forward jump in the generated code from the address that could be calculated at compile-time to the one that could not. Implemented as an fix-up at generation time.

< prefix

Backward jump in the generated code from the address that could not be calculated at compile-time to the one that could. Note that both prefixes > and are not needed if the distance between the jump origin and target can be calculated at compile-time.

* prefix

Generation-time jump over some generated instructions. Used as hint to expansion program to properly update the assembly point.

-reset command

Sets the assembly point to the beginning of the generation buffer.

-REPEAT and -END-REPEAT commands

The generated(- prefixed) code between these commands will be generated repeatedly repeatcnt times. The register ebp will be decremented from the repeatcnt value to 1. repeatcnt should be defined as NASM macro expanding to a register or memory dword and assigned(loaded) to appropriate value before the first occurrence of -REPEAT. This value should be constant within the generated file (or between -reset commands. Due to this restriction the jumps between different repeat blocks are possible.

$<register> expressions

Can be used in generated (- prefixed) code. The value of generation-time register is inserted as immediate value in the generated instruction.

Example

the generating code below

      mov edx,1
-     cmp ecx,$edx

will result in:

      cmp ecx,1

at run time.

Limitations

Annotated HELLO WORLD example

The "HELLO WORLD" example is included to demonstrate the use of some macros described above. Note that run-time generation of code that only prints a string does not bring you any gains in performance.

;;               -*-asmi-*-
;;	annotated "hello world" example
;;	demonstrates:
;;        * calling of external functions
;;        * defining code and data in the same buffer

  
	global _l45_hello_world
	extern _printf

	section .text

_l45_hello_world: 

_l45_hello_world (can be referred in C as l45_hello_world) is a function receives the single parameter which is the address of generation buffer and returns the same value

	push esi
	push ebx;                       
	mov esi,dword[esp+12]		; assembly point(moving)
	mov ebx,esi	                ; start of generation buffer

The lines above constitute a kind of standard prologue for lib45 generating routines. The registers esi and ebx will used by generating stage at run-time, when expanding the '-' prefixed routines, so they need to be saved


	

+       lea edx,[ebx+{string}]         
-	push $edx

The code above calculates an absolute address of "HELLO WORLD" string and generates code pushing it on the stack.

        mov edx, _printf
+	sub edx,{callsite} 
        sub edx,ebx
-	call $edx
-callsite:

The code above calculates the relative address of printf function and generates calling code for it.

-	add esp,4
-	ret

;clean stack and return

-	nop
-	nop
-	nop

some padding between code and data


-string:
-	dd 0x4C4C4548 ;HELL
-       dd 0x4F57204F ;O WO
-       dd 0x0A444C52 ;RLD
-       dd 0 ;terminating zero

The four lines above define four dwords containing the "HELLO WORLD" string.

	mov eax,ebx

Generation buffer address is returned in eax.

	pop ebx
	pop esi
	ret

Restore the registers and return from the generating routine.

The produced code and data will look like:


0x0022fc08:     push   0x22fc1c
0x0022fc0d:     call   0x4015a0 <printf>
0x0022fc12:     add    esp,0x4
0x0022fc18:     ret
0x0022fc19:     nop
0x0022fc1a:     nop
0x0022fc1b:     nop
0x0022fc1c:     0x4c4c4548
0x0022fc20:     0x4f57204f
0x0022fc24:     0x0a444c52
0x0022fc28:     0x00000000

The full source for the example an be found in files `hello.c' and `l45_hello.in' of the distribution directory.

Annotated COMPARE example

The "compare" example is included to demonstrate the use of some macros described above. For the sake of simplicity the comparison is performed on byte-by-byte basis. If performance is sought for the comparison on dword-by-dword basis should be used.

;             -*-asmi-*-
;;	annotated "compare" example
;;	demonstrates:
;;      * loop unrolling using '-REPEAT'

	global _l45_compare
	section .text

;NASM macro with this name should be defined if '-REPEAT' is used
;it can refer to register or dword in memory
%define repeatcnt dword[esp-4]

_l45_compare:
        ;receives two parameters 
        ;   * the address of generation buffer
        ;   * the address of a pattern(zero-terminated string) to compare against   
	;returns the the address of generation buffer 
        ;   which might be cast to 'typedef int (*type)(char*)' type
	;   the generated function will return 0,1 or -1 like 'strcmp' does.

	push esi
	push ebx                       
	push ebp ;used in 'REPEAT' as loop counter                       
	mov esi,dword[esp+16]		; assembly point(moving)
	mov edi,dword[esp+20]		; pattern point
	mov ebx,esi	                ; start of generation buffer

	;calculating the pattern length+1: 
	xor ecx,ecx
lloop:	mov dl,byte[edi+ecx] 
	add ecx,1
	test dl,dl
	jne lloop

	mov repeatcnt,ecx; 'repeatcnt' has to be defined and contain the loop repetition count

-	mov eax,[esp+4];string pointer	
	xor ecx,ecx
-REPEAT
	mov al,byte[edi+ecx]
-       cmp byte[eax+$ecx],$al
-	jne near noneq
	add ecx,1
-END-REPEAT 
-	xor eax,eax
-	ret
-noneq: jl short less
-	mov eax,-1
-	ret
-less:  mov eax,1
-	ret
	mov eax,ebx
	pop ebp                       
	pop ebx                       
	pop esi
	ret

Given a second parameter "HELLO" l45_compare produces the following code:


0x0022fc08:     mov    eax,DWORD PTR [esp+4]
0x0022fc0c:     cmp    BYTE PTR [eax],0x48
0x0022fc13:     jne    0x22fc5d
0x0022fc19:     cmp    BYTE PTR [eax+1],0x45
0x0022fc20:     jne    0x22fc5d
0x0022fc26:     cmp    BYTE PTR [eax+2],0x4c
0x0022fc2d:     jne    0x22fc5d
0x0022fc33:     cmp    BYTE PTR [eax+3],0x4c
0x0022fc3a:     jne    0x22fc5d
0x0022fc40:     cmp    BYTE PTR [eax+4],0x4f
0x0022fc47:     jne    0x22fc5d
0x0022fc4d:     cmp    BYTE PTR [eax+5],0x0
0x0022fc54:     jne    0x22fc5d
0x0022fc5a:     xor    eax,eax
0x0022fc5c:     ret
0x0022fc5d:     jl     0x22fc65
0x0022fc5f:     mov    eax,0xffffffff
0x0022fc64:     ret
0x0022fc65:     mov    eax,0x1
0x0022fc6a:     ret

It is easy to see that generated code does not contain loops.

The full source for the example an be found in files `compare.c' and `l45_compare.in' of the distribution directory.

Building

Prerequisites

To build the lib45 library the following tools are needed:

All the mentioned above tools have to be on the path.

Building the library

  1. untar the library archive to some directory e.g.
       tar -xzmf Lib45-0.01.tar.gz
    
  2. cd into this directory
  3. run automake --add-missing
  4. run aclocal
  5. run autoconf
  6. run generated configure script e.g
       ./configure
    
    the example above will search for the C compiler automatically. If You have several of them on path and want to use another , issue the above command with parameter e.g.
       ./configure CC=icl
    
  7. run make
  8. run tests: make check
  9. if on UN*X or Cygwin/NT make install. All stuff will be installed in `/usr/local/XXX' directories: header, lib, gssp demo and info.
  10. If you have Tex installed you can also make PDF version of the documentation issuing the command make pdf.

Tests

Several tests are included. All the results of lib45 functions are compared with the results produced by standard C library functions like strstr, strchr, strlen etc. Several tests also print out some benchmarking data. Some tests for some compilers can take longer time to run due to the odd implementation of strstr function.

Changes History

Version 0.01

  1. The following functions have been added
  2. The 'gssp' searching utility based on above functions has been added.

Version 0.10

  1. Case insensitive variants have been added for the following functions:
  2. The 'gssp' utility now accepts '--ignore-case' or '-i' option.
  3. A bug has been fixed in the 'l45_make_searchfn_l' function. Although the function managed to find the search pattern correctly, there might be errors when counting separating characters for the longer search strings.

Version 0.11

  1. Some minor bugs had been fixed in functions for case insensitive search.
  2. Options '--help' and '--version'(-V) had been added for the 'gssp' utility.

Index

Jump to: c - g - l - r

c

  • compile-time
  • g

  • generation-time
  • l

  • l45_bench
  • l45_make_searchfn
  • l45_make_searchfn_l
  • l45_make_searchfn_li
  • l45_make_searchfn_ln
  • l45_make_searchfn_lni
  • l45_make_searchfn_n
  • l45_make_searchfn_zt
  • l45_make_searchfni
  • l45_searchfn_l_size
  • l45_searchfn_li_size
  • l45_searchfn_ln_size
  • l45_searchfn_lni_size
  • l45_searchfn_n_size
  • l45_searchfn_size
  • l45_searchfn_zt_size
  • l45_searchfni_size
  • l45_strlen
  • l45_strlen_n
  • r

  • Rationale
  • run-time
  • Options Index

    This is an alphabetical list of all gssp commands, command-line options, and environment variables.

    Jump to: -

    -

  • --files-with-matches
  • --help
  • --invert-match
  • --line-number
  • --mmap
  • --recursive
  • --version
  • -l
  • -n
  • -r
  • -V
  • -v

  • SourceForge.net Logo



    This document was generated on 12 October 2005 using texi2html 1.56k.