AFIS Data Compression: An Example of How Domain Specific Compression Algorithms Can Produce Very High Compression Ratios
Givon Zirkind - B.Sc., Computer Science, Touro College; M.Sc. Computer Science, Fairleigh Dickinson University
This article describes the development and implementation of a data compression algorithm designed specifically for fingerprints, referred to as GBP compression. The algorithm is herein discussed. Data Compression algorithms can be designed for general applications, meaning the input data is unknown. This is more commonly referred to as generic data. [LI01] Or, data compression algorithms can be designed for specific applications. E.g. AFIS [Automated Fingerprint Identification Systems] “When the input is known, higher compression ratios can be achieved with the knowledge of the input data stream.” To-date, the highest compression ratio for an unknown input data stream, for all data compression algorithms, is JPEG with an average compression ratio range of 1:17 – 1:23. [PEN03] The algorithm herein discussed, has a compression ratio range of 1:68 – 1:92. There is a value, time and place for each design method – generic or specific – depending upon a variety of factors. Due to the nature of the use of AFIS for law enforcement and incrimination as well as criminal conviction, there are social issues that make data integrity of paramount concern. This factor influences algorithm selection and design. A lossless algorithm is a must! Also, the nature of AFIS is such that it operates across borders and between states, municipalities and jurisdictions. In addition to the usual issues and resistance to accepting new technology, including software [e.g. resistance to change, fear of system failure, etc.], there are the issues of changing engineering standards [hardware and software] which are governmentally determined as well as governmental policy decisions. Likewise, the portability required in implementing a new algorithm, will have to deal with a variety of hardware and software; as well as be designed to integrate into existing systems. This integration must include the ability to incorporate existing JPEG data files, from existing police databases. This requires a handshaking of standards and conversion programs that maintain data integrity. In addition, there is an in depth discussion of the limits of compression with a novel perspective.
Categories and Subject Descriptors
C.4. [Computer Systems Organization]: Performance Of Systems – Performance attributes
D.2.11 [Software]: Software Architectures – Data abstraction
D.2.12 [Software]: Interoperability – Data mapping
E.4 [Data]: Coding And Information Theory – Data compaction and compression
E.4 [Data]: Coding And Information Theory – Nonsecret encoding schemes
F.2.0 [Theory of Computation]: Analysis Of Algorithms And Problem Complexity – General
F.2.1 [Theory of Computation]: Numerical Algorithms and Problems – Computation of transforms (e.g., fast Fourier transform)
F.2.1 [Theory of Computation]: Numerical Algorithms and Problems – Computations on matrices
F.2.2 [Theory of Computation]: Nonnumerical Algorithms and Problems – Pattern matching
F.2.m [Theory of Computation]: Miscellaneous
H.0 [Information Systems]: General
H.1.1 [Information Systems]: Systems and Information Theory – Value of information
H.1.2 [Information Systems]: User/Machine Systems – Human factors
H.1.2 [Information Systems]: User/Machine Systems – Human information processing
H.1.2 [Information Systems]: User/Machine Systems – Miscellaneous
H.2.8 [Information Systems]: Database Management -- Database Applications -- Image databases
H.3.4 [Information Systems]: Information Storage And Retrieval – Systems and Software – Performance evaluation (efficiency and effectiveness)
H.3.7 [Information Systems]: Digital Libraries – User issues
I.3.4 [Computing Methodologies]: Computer Graphics – Graphics Utilities – Application packages
I.3.4 [Computing Methodologies]: Computer Graphics – Graphics Utilities – Graphics packages
I.3.6 [Computing Methodologies]: Methodology and Techniques – Device independence
I.3.6 [Computing Methodologies]: Methodology and Techniques – Graphics data structures and data types
I.3.6 [Computing Methodologies]: Methodology and Techniques – Standards
I.4.0 [Image Processing And Computer Vision]: General – Image processing software
I.4.2 [Image Processing And Computer Vision]: Compression (Coding) – Exact coding
I.4.3 [Image Processing And Computer Vision]: Enhancement – Filtering
I.4.3 [Image Processing And Computer Vision]: Enhancement – Grayscale manipulation
I.4.3 [Image Processing And Computer Vision]: Enhancement – Sharpening and deblurring
I.4.4 [Image Processing And Computer Vision]: Restoration – Inverse filtering
I.4.6 [Image Processing And Computer Vision]: Segmentation – Edge and feature detection
I.4.6 [Image Processing And Computer Vision]: Segmentation – Pixel classification
J.1 [Computer Applications]: Administrative Data Processing – Law
K.4 [Computing Milieux]: Computers And Society – Public Policy Issues – Ethics
K.4 [Computing Milieux]: Computers And Society – Public Policy Issues – Transborder data flow
K.4.m [Computing Milieux]: Computers And Society – Miscellaneous
Algorithms, Performance, Design, Reliability, Security, Human Factors, Standardization.
AFIS, Automated Fingerprint Identification Systems, Double Compression, Compatibility, Compression, Data Compression, Data Encryption, Data Integrity, Fingerprinting, Graphics, Image Compression, Image Quality, Limits of Compression, Portability, Retrofitting, Serial Compression, Software Engineering.
For approximately 20 years, the FBI [Federal Bureau of Investigation, United States] had an outstanding engineering prize of $10 Million dollars U.S. for anyone who could come up with a compression algorithm for fingerprints, that could compress fingerprint images at a much more significant rate than JPEG which has a maximum optimal compression ratio of 1:23 for fingerprints. Due to the nature of the application, image integrity and exactitude is a must. Many tried. The most significant attempt, prior to the method herein described, produced a compression ratio of 1:35. Recently, a new compression algorithm, named GBP, specific to fingerprint image data, was designed, tested and prototyped. This compression algorithm far exceeds the compression ratio of the conventionally standardized compression method, JPEG. While JPEG has a maximum compression ratio of 1:23 for fingerprint data; in contrast GBP has a maximum compression ratio of 1:92 for fingerprint data. GBP maintains image exactitude with lossless compression. The logic and design considerations of the compression scheme are herein discussed. Among the design considerations discussed are; that the prototype had to be designed to work across platforms and; to integrate into existing image databases as well as protocols and even, possibly integrate into existing hardware. Testing and evaluation of the software are also discussed. Future research and work are addressed too. Overviews of various pertinent compression schemes are mentioned as well as a comparison of the various compression schemes, especially as applied to AFIS. Optimizing image compression is a ubiquitous topic throughout this paper. A small, but relevant history of the domain – fingerprinting and AFIS – is included along with other pertinent and necessary background information. The social issues of image exactitude and how image exactitude impacts development, testing and possible implementation of this new compression are debated.
2. HISTORICAL BACKGROUND -- A BRIEF HISTORY OF FINGERPRINT RECORD KEEPING
The history of fingerprint record keeping has always involved large amounts of graphical record keeping, with accuracy being a major concern. In addition, there has always been a large demand for data sharing. Although it is theoretically assumed that no two people have the same fingerprints, in reality, this is an unknown. The FBI maintains over 81 million records of fingerprints, going back to the 1930s; with approximately 7,000 files added on a daily basis. [FBI04] Fingerprint records are not discarded and original cards are kept, in spite of the automated system in place. The RCMP as of Fall 2005 had over 4 million fingerprints. [RCM01] In addition, police forces in various cities and countries, have always had a need to share pictures of fingerprints. With the mobility available to criminals in today’s technological world, quick sharing of fingerprint data is even more of an issue than ever.
The FBI has amassed a large number of fingerprint records. In 1985, the number of fingerprints kept by the FBI was well into the millions. In addition to keeping the original records, these images are digitized for storage, transmission and sharing between different policing agencies throughout the United States as well as transmission to different FBI branch offices throughout the United States and the world. The large number of pictures alone generates a massive amount of data.
Having amassed large amounts of fingerprint data, which is constantly growing, there has always been a search for a better way to compress this data. As the data integrity is essential, so that the right person be arrested and the innocent not be harassed; lossless compression is the only possible algorithm.
Historically, as data compression came into vogue, JPEG was accepted as the best generic data compression method. [PEN02] Meaning, for any given unknown data stream, JPEG would compress the data by a significant guaranteed compression ratio.
The JPG [also known as JPEG] file format has a compression ratio ranging from 1:17 to 1:23. [MIA01] [PEN01]
So, the FBI employed JPEG for its fingerprints. Most, if not all of the world’s policing services did likewise. In effect, this standardized the fingerprinting methodology for the entire world.
JPEG is the currently accepted picture format for the Federal Bureau of Investigation (FBI, United States) [FBI01] [NIS01] [SAG01], Royal Canadian Mounted Police (RCMP, Canada) [COG01] [NIS01] [RCM01], Scotland Yard (United Kingdom), Interpol (International Criminal Police Organization) [INT01], the Thai police [NEC01], the Australian police (Australian National Automated Fingerprint Identification System – NAFIS) [NSW01] [SAG01], the New Zealand police [NEC01], the Hong Kong police [COG02], and all state as well as local police forces within the United States that communicate with the FBI. [NIS01] (Effectively, that means all state and local police within the U.S. use JPEG as their fingerprint image standard.)
[The information about governmental contracts may not be up to-date, as contracts are awarded and withdrawn. However, by reading public government standards documents, such as NIST standards [NIS01] and Interpol standards [INT01], it is clear, that most governments require an AFIS to be NIST or Interpol protocol compliant. That means that JPEG is an integral part of the system.
In addition, by reviewing the long list of FBI approved AFIS vendors [FBI03], or the shorter list of Interpol approved vendors [INT02], one can see that most AFIS vendors are NIST compliant (which means they use JPEG as an integral part of their system.)]
The sheer size of the data, due to the number of images, which is constantly growing, prompted the FBI to post a ten million dollar [$10 Million dollar] engineering prize for the development of an effective data compression technique for fingerprints.
In response to the challenge, many tried to develop a method that would both compress fingerprint images and yet maintain data integrity. I.e. Lossless compression. prior to the GBP method, the technique proposed with the most significant improvement upon data compression was able to achieve a 1:35 compression ratio. While that ratio is significant and higher than JPEG, it was not dramatic enough and was deemed to not warrant the cost of implementation and conversion to a non-standard format.
As no worthwhile improved compression algorithm was presented as a solution in a 20 year period; the FBI withdrew its prize. However, the endeavor to find a better way to compress fingerprints has remained a challenge to computer and software engineers as finding new solutions to the Pythagorean theorem is to mathematicians.
Indeed, today, many entities other than police agencies employ AFIS: Voting, immigration, border crossings, asylum, citizen benefits, citizen rights, citizen identification, drivers’ license, private corporate identification systems, hospitals, military and others. [COG03] [MOT01] The utility of a better AFIS compression remains.
3. COMPRESSION ALGORITHM DESIGN CONSIDERATIONS
3.1 Image Compression Considerations
Compressing a picture usually involves some kind of data encryption; albeit, a known encoding method. In implementing all encryption algorithms, there is a tradeoff between processing time and either the data integrity and/or the compression ratio achieved. Smaller pictures will be transmitted faster. -- There is simply less to send. Smaller pictures will allow more pictures in a database limited by a fixed size for physical storage. Any given size storage will house more pictures, if the pictures are reduced in size. But, will the processing time to reduce the picture size, and the processing time to extract the picture, involve so much processing time, as to reduce operation time to a point of unacceptability? This is an application design consideration.
The most significant requirement of image compression is to maintain data integrity and the quality of the graphic image. Computer imaging considers a picture as a grid of tiny dots named pixels (Picture Elements). The method of picture generation with computer equipment is analogous to the artistic method of Pointillism, for which Georges Seurat is famous. Pointillism is a painting style, where a lot of little dots create a picture. [TUR01] Using such a method for picture generation, a reduction in picture size can be achieved by simply removing every other pixel (dot) or, row (of pixels/dots) or, column (of pixels/dots). The critical factor being, “How much detail is removed while maintaining the integrity/clarity of the picture?” [GON01] [MIA01]
The human eye, is capable of perceiving only so much detail. Too much picture detail is not perceivable by human beings. Deleting detail from a picture that is beyond the ability of humans to perceive, will cause no loss of valuable information and be of no consequence. [MIA01] People will still see the same picture. Also, there are details that are too fine to be recognized. E.g. There is a limit to the amount of colors the human eye can see. [HEA01] Having a color palate with a number of colors greater than the human eye can possibly see, is unnecessary. For example, if there are more colors in a picture than the human eye can see, than, if the number of colors in the picture are reduced to a number of colors the eye can see; human beings will still see the same picture.
In addition, the human brain can recognize full images from partial images and; complete the information in a message even though many information symbols are missing. [KAH01] When dealing with pictures, this means the sharpness or integrity of a picture need not be at a maximum for recognition of the indicated picture subject. [MIA01]
These two (2) above mentioned factors about human cognition, permit some graphic applications a larger loss of detail while maintaining data; which permits reduction of the image size.
The ability to reduce the size of an image by eliminating unnecessary details and still maintain an exact image; demonstrates that if there is prior knowledge about the input data stream, better compression methods can be achieved by eliminating unnecessary details. [MIA01]
In addition, different graphical images have different repetitive qualities. Some graphic images have same colored shape areas. Such graphical images lend towards different compression algorithms which identify those same colored areas and; indicate those large areas by a small amount of coding. E.g. Fax compression typically uses Run Length Encoding (RLE), because most faxes are long lines of only two colors, black and white, with large strings of white with small gaps of black.
3.2 Fingerprint Data Compression Considerations
Applying the above image data and image recognition considerations to the application of fingerprint images and Automated Fingerprint Identification Systems (AFIS):
Fingerprint images require exactness. Data integrity must be at a maximum for fingerprint images. For one, it would be a gross injustice to arrest the wrong person for a crime. Indeed, prior to using fingerprints for identification, skull measurements were used to identify criminals. But, errors occurred. Incorrect identification had initially been one motivation for using fingerprints as a form of identification. Secondly, fingerprints have minutiae of individual points that uniquely differentiate one fingerprint from another. [GAL01] [HOO01] See Figure 1 for an example. In the top left and right, one can see ”dots” which are among the relevant minutiae of fingerprint images.
Figure 1. Sample fingerprint with “dots”. [Courtesy Craig Watson, National Institute of Standards and Technology, (NIST), USA. This fingerprint and other test data is available via anonymous FTP from NIST, “www.nist.gov”.]
Dots and other minutiae of fingerprints preclude the use of conventional edge sharpening techniques. Such techniques are prone to reducing or eliminating critical features from fingerprint images. (This is discussed more in detail later on in this article.)
Although fingerprints require exactness, there is a detail of fingerprinting images that does not require a lot of information. This detail lends to a very large scale compression. Physical fingerprints are produced by dabbing a fingertip with ink and then pressing that fingertip on a sheet of paper or a card. Hence, a fingerprint image – in essence – is a two color image: a black and white picture. Technically speaking, the shades of gray in a fingerprint picture are merely noise – unwanted, unnecessary information. The shades of gray are produced from a lack of technical refinement. The shades of gray in a fingerprint image are noise and; errors in detail that can be tolerated. I.e. If all we had was a sharp, clear, exact, black and white image for a fingerprint, then that would provide sufficient information to discern one fingerprint from another. The basic reason for preserving the shades of gray is because of the imprecision of scanning equipment and; because human beings can reassemble complete images from partial images. [MIA01] In reality, fingerprints are partial images and the matching of fingerprints on file from those taken at a crime scene, require human intervention to reassemble the images from partial images. What the shades of gray do, is help maintain the proper edge definition. But, most gray points in the fingerprint image can be considered as white space. It is only the black lines that are real fingerprint image data.
Indeed, previous AFIS systems have used just black and white images. But, it was found that eliminating the gray color significantly reduced image quality. [NSW01]
3.3 Compressing Generic Data vs. Specific Data
Another consideration in producing a method for reducing graphic data is the difference between a generic compression method and a specific compression method: A generic method for compressing any kind of data or picture vs. a specific method of compression for compressing a specific data or kind of picture.
For example, using a compression method that easily reduces large geometric shapes, to compress a fax would be ridiculous. Faxes are mostly text, which are represented with long lines of alternating black and white boxes. Logos, which have large geometric areas, are a minor part of a fax image. Since there usually are no large geometrically shaped areas of the same color on a fax, compression based on geometric shapes would produce minimal compression, if any compression at all.
The selection of a generic or specific compression method relies on whether or not the data is generic or specific. Generic data is data that could be anything. E.g. Executable program code. Specific data has a known set of symbols. Examples of specific data would be: Text – must be an alphabetic symbol. Colors – a valid color or symbol representing a color. Numbers. Etc.
Of all known data compression techniques, JPEG is considered the best data compression technique for generic data, yielding the highest compression ratio for unknown data input. [MIA01] [WIK01]
Generic data requires that great attention be paid to detail or information will be lost. [LI01] Hitherto, unnecessary information has not been perceived in fingerprint images. GBP, the AFIS process herein described, recognizes unnecessary information and exploits a detail of AFIS images that takes advantage of eliminating unnecessary details, such as extraneous color information. Thereby, permitting the reduction of graphical data for an image.
3.4 Compression vs. Computation Time Tradeoffs
Many, if not almost all compression schemes, usually involve computational time. The scan line or data, must be inspected and evaluated for repetitions of symbols which may be pixels with the same color. Because, the most often used repetitive string, should have the shortest symbol. [HUF01] This requires a data analysis, of all the data to be compressed. [WIK02] [WIK04] This can consume time and require much processing. [WIK02] [WIK04] Inspecting for repetitive symbols, e.g. as RLE does, is a minor calculation compared to the compression method which assembles a dictionary of repetitive strings of symbols for reference. Those strings of symbols may be pixels of the same color. Other encoding methods have similar computational considerations. Sometimes, complex mathematical equations are involved, in assembling the dictionary or “code book” or reference scheme between symbol and actual representation.
If a significant amount of time is required to compress the image data, the reduction in file size should warrant the expense of the computing time.
4. A BRIEF DESCRIPTION OF RELEVANT, EXISTING COMPRESSION TECHNIQUES
4.1 A Brief Description of Run Length Encoding (RLE)
RLE is a compression method suited for data that has long strings of repetitive symbols. Bi-tonal images – images of only 2 colors –are a good example of applying RLE. The classic example is a fax. Most faxes are simply black and white. When converted to a digitized image of pixels, a scan line of a fax is a row of black and white boxes; which can be broken up into “strings” or “sets” of black boxes or white boxes. See Figure 2 for a visualization of the flow of data in RLE compression. Notice that RLE encodes the pixels of a scan into a row of alternating numbers and symbols. The first position indicating how many of a symbol is encoded while the next, subsequent position indicates the symbol encoded. This pattern alternates for the entire line, for each line. [MIA01]
Figure 2. Visualization of RLE compression.
The main idea of RLE is to have a count followed by the repeated symbol, which could be a color value. If the value of the data byte is a color value, the color value may be from a color palate. For a bi-tonal image – i.e. black or white, a palate is not necessary. However, RLE can be used for any number of colors. If a multi-colored image is being compressed with RLE, then the count is followed by an index into the color palate, which contains the value. RLE may work well with images with large areas of the same color, such as cartoons. But, there are other methods that work well with cartoons, such as GIF.
As fingerprints by nature, are one big, non-repetitive pattern; fingerprint images create unique, non-linear images without repetitive sub-patterns. There is little chance for repetition in a fingerprint image. RLE compression is quite inefficient for AFIS.
4.2 A Brief Description of Graphics Interchange Format (GIF)
GIF coding divides a picture into images and stores each image separately. The placement of the images on the screen is recorded with the pictures. Multiple images and multiple positions per image are permitted. The data stream is compressed using a dictionary based sequence method, LZW™. Repeated sequences are stored in a dictionary and referenced by a much shorter symbol, each time the sequence is repeated. If sequences are repeated often, the compression rate is high. GIF permits only a small palate of 254 colors. This limitation is quite a drawback, especially compared to the multi-palate option of JPEG. [MIA03]
GIF also works well with large geometric shapes. E.g. Circles, squares, etc. These shapes have large repeated common sequences of image data. Cartoons have large geometric shapes filled with one single color, in just one hue which lends to taking advantage of GIF compression.
Animated images require multiple frames and/or pictures. This feature is a basic component of GIF format. Most animated images use GIF format. This strong point of GIF is irrelevant to AFIS. [MIA04]
As fingerprints by nature, create unique patterns; there is little chance for repetition in a fingerprint image. GIF compression, although popular, is quite inefficient for AFIS.
In addition, fingerprints do not have large geometric shapes such as circles or squares. Fingerprint patterns are complex with multi-directional changes. The cross hatching creates small areas of different colors. GIF would produce little compression on such an image and is thereof not a reasonable choice for fingerprint compression and AFIS.
4.3 A Brief Description of JPEG Compression
JPEG is a most commonly accepted and used compression technique. JPEG uses the mathematical methods Discrete Cosine Transformation [DCT] and Inverse Discrete Cosine Transformation [IDCT]. JPEG is a standard and a process. JPEG is not an implementation. There are many public and commercial implementations of JPEG available.
The JPEG method incorporates a color conversion method from color to black and white with shades of gray.
JPEG also has options for lossy and lossless compression.
To date, the best method of generic compression has been produced by the Joint Photographic Expert Group (JPEG). [FBI01] [ISO01] [MIA01] The crux of the JPEG method is DCT. [ISO01] [MIA01] [PEN01] The practical implementation of DCT assumes the picture is composed of 8 x 8 blocks of pixels. A palate of colors is created with each color assigned a number. The colors of each pixel are assigned their corresponding numerical color value from the palate. The 8 x 8 block is now the equivalent of an 8 x 8 mathematical matrix. The numbers representing the colors can be treated as cosine values. Then, by using sophisticated mathematics, this matrix can be manipulated to produce an identity matrix [GOL01] [GOL02]. The end result is a long list of zeros, which can be indicated by only a few pieces of information: A one, a zero and the symbol’s (zero) corresponding assigned list length. (I.e. One number for how many zeroes.) Thus 64 (8 x 8 = 64) pixels can be represented in far fewer pixels.
In addition, there is an inverse process. IDCT which will take the long list of zeros and recreate the original matrix.
The JPEG standard has become a de facto international standard sanctioned by the International Standards Organization (ISO). [ISO01] It is important to note that JPEG is a standard and not a specific implementation. Many manufacturers and programmers, such as the Independent JPEG Group (IJG), have produced their own implementations of the JPEG standard.
As JPEG is designed for generic data and capable of lossless compression, JPEG is an excellent choice for AFIS.
5. OTHER DESIGN CONSIDERATIONS
5.1 Standardization Considerations
In modern society, fingerprints are universally employed by all governments and policing agencies. All governments employ fingerprinting at some level. Fingerprints are commonly employed in many identification systems. There is also a need to share this data. It should come as no surprise, that there are widely used standards for fingerprint data and fingerprint data compression. The FBI and Interpol standards being the most widely employed. Even between the FBI and Interpol standards, there is much overlap. If a new method is successfully developed, it must be able to interface with pre-existing standards.
Most current standards use JPEG as the file format. [NIS01] [INT01] Which has made JPEG the standard format for inter-agency fingerprint image sharing. Therefore, any new AFIS must have a conversion method to/from JPEG.
5.2 Portability, Integration and Implementation Considerations
To consider developing such an innovative and revolutionary data compression method, requires considering integration and cooperation with existing standards and; the multitude of systems in production worldwide. This would require an implementation that is very portable between platform types: Both hardware and software. A design consideration addressed in the development of this compression algorithm.
6. EVALUATION OF VARIOUS COMPRESSION AS APPLIED TO AFIS
RLE will produce high rates of reduction, provided that there is at least one symbol that is used more than 50% of the time. [WIK04]. The example of a fax is a good one. By simple examination, one can see that either one or the other color, black or white, is dominant in a fax image. In a text, such as a letter, the white background is the majority color.
This feature – one symbol appearing more than 50% of the time – does not hold for fingerprint images. In addition, if one considers a fingerprint image as a curvaceous gird, superimposed upon a standard 90° X-Y grid; repetition or strings of repetitive characters along an X or Y axis, do not usually occur.
GIF uses a dictionary compression method. It will work well with an unknown set of symbols [WIK02]. It is suitable for images with straight edges and large areas of a single color. [MIA01] [WIK03] These features are lacking from fingerprint images. Hence, GIF is not a good choice for fingerprint compression. With fingerprint images, GIF would achieve little compression.
6.3 Huffman Coding
A full description of Huffman coding [HUF01] would be too tangential to this paper. It is addressed because it is a most popular compression method and many consider the most efficient compression method; although, this is disputed. [WIK02] The basic principal of Huffman coding is to use the shortest coded symbol for the most often used data symbol. E.g. Morse code – the smallest symbol, just one dot, is used to encode the most often used letter in the alphabet, ‘e’. Huffman coding, like other dictionary compression methods, must first do a data analysis, to determine what symbols are being repeated and calculate how often each symbol is being repeated.
Huffman coding works best with a known probability distribution [WIK02] which fingerprints do not have. Again, as with RLE and GIF, the inherent lack of repetition in fingerprint images, precludes the use of Huffman coding.
JPEG is the contender. Employing DCT, JPEG works well with generic data, can compress losslessly and; has a high compression rate. JPEG incorporates multi-palates, including full grayscale as well as Huffman coding if applicable.
7. BASIC GBP OPERATION
7.1 How GBP Maximizes Fingerprint Image Compression
GBP is a compression method and file format designed specifically for fingerprints.
The GBP file format has the advantage over current file formats, of having a minimum compression ratio of 1:92 for fingerprint data.
Innovatively, the GBP process uses a method that reduces the bytes needed to store an image by recording image data in just three (3) colors, while maintaining image integrity. The tricolor image data stream is then correlated bit-to-byte. Thereby, reducing the data stream by one quarter. The resultant data stream is then compressed using the standard DCT method.
Contrast the compression necessary for a tricolor format with a multi-colored format. The standard JPG [JPEG] file format used for fingerprint images, uses grayscale imaging of 256 colors, which requires one (1) full byte to represent the color of one (1) pixel. [FBI01] [INT01] However, the GBP method uses only three (3) colors. Hence, only two (2) bits are necessary to describe all the colors in the palate or referenced by a pixel. Two (2) bits are able to encode up to four (4) colors. Therefore, using the GBP image format, every byte represents 4 times as many pixels of a JPG image format! [As 2 x 4 = 8; 2 bits x 4 bits = 8 bits; One byte is 8 bits, which is equal to 2 bits per pixel, for 4 pixels, equals 4 pixels per byte.]
The tricolor method uses just three (3) colors: Black, white and gray. This significantly reduces the amount of data necessary to record a fingerprint image; which significantly reduces the size of a fingerprint image.
7.2 Detailed Algorithm and Process Description
7.2.1 Algorithm Method
Accept a byte stream for a data image in a specific scan sequence. Maintain the scan sequence in all processing. Ascertain if the scan is in color or grayscale. If the scan is in color, then convert to grayscale. Once in grayscale convert the byte stream to tricolor: black, white and one shade of gray. While maintaining a specific scan sequence for the entire original byte stream, assemble a new “target” byte stream. Each byte in the new target byte stream represents 4 sequential bytes from the original byte stream. Two bits in the target byte “correspond” to one byte in the original byte stream encoding the color of that corresponding original byte.
188.8.131.52 Process for a quartet of 4 bytes from the original byte stream:
1. Place the value of the first byte from the original byte stream in the first 2 bit positions (high order bit positions) of the target byte.
2. Place the value of the second byte in the second 2 bit positions (next to the high order bit positions) of the target byte.
3. Place the value of the third byte into third 2 bit positions (next to the low order bit positions) of the target byte.
4. Place the value of the fourth byte in the fourth 2 bit positions (low order bit positions) in the target byte.
When finished processing 4 bytes, read 4 more bytes from the original scan data and repeat the process.
Upon reaching the end of the original byte stream, if there is a number of bytes not divisible by 4, e.g. only 3 bytes at the end of the original byte stream, add dummy bytes with a high value of all 1’s.
Apply DCT compression to the tricolor compression byte stream.
Figure 2. A visualization of a tricolor byte stream and how the compressed data stream correlates to the initial data stream.
Detailed Explanation of Figure 2:
Using conventional hexadecimal notation, one byte is necessary to hold the number value for the color of one pixel. By using just the numbers 0, 1, & 2; for black, white and gray, respectively, the color values of four pixels can be contained in one byte.
On top, the sample byte stream in decimal notation with the numerical value of the color of the pixel.
In the middle, the corresponding detailed bit stream, with conventional hexadecimal notation, showing the value of each byte.
On bottom, the corresponding detailed bit stream, with compressed tricolor notation, showing the value of each individual bit. Notice that byte #1 contains the value of the first four bytes of the byte stream:
Positions 0 & 1 of byte #1 of the compressed tricolor data stream map to byte #1 of the expanded tricolor data stream.
Positions 2 & 3 of byte #1 of the compressed tricolor data stream map to byte #2 of the expanded tricolor data stream.
Positions 4 & 5 of byte #1 of the compressed tricolor data stream map to byte #3 of the expanded tricolor data stream.
Positions 6 & 7 of byte #1 of the compressed tricolor data stream map to byte #4 of the expanded tricolor data stream.
Positions 0 & 1 of byte #2 of the compressed tricolor data stream map to byte #5 of the expanded tricolor data stream.
Positions 2-7 of byte #2 of the compressed tricolor data stream are filled with trailing 1’s which creates a stream of repeating decimal 3’s, or in binary: “11”. As there are only three colors – indicated by the numerical values 0, 1 & 2 – the number three is used as an end of valid data marker.
Figure 3. A flowchart describing the compression process.
The reverse process of writing an expanded byte stream from the compressed tricolor byte stream is similar to the compression process.
184.108.40.206 IDCT & Expansion
Apply IDCT to the tricolor compressed byte stream.
Reading the compressed tricolor byte stream from the beginning, using a bit-to-byte correlation, construct a new data stream, with each byte containing the color value for only one pixel.
Convert the pixel value from a tricolor value to a 256 grayscale palate value.
220.127.116.11 Process for a quartet of 4 bytes from the original byte stream:
1. Read bit positions 0 & 1 of byte #1 (the high order bit positions) of the compressed data stream. Write to a new byte stream with byte #1 containing an appropriately corresponding value for a 256 valued grayscale palate.
2. Read bit positions 2 & 3 of byte #1 (the next to high order bit positions) of the compressed data stream. Write to a new byte stream with byte #2 containing an appropriately corresponding value for a 256 valued grayscale palate.
3. Read bit positions 4 & 5 of byte #1 (next to low order bit positions) of the compressed data stream. Write to a new byte stream with byte #3 containing an appropriately corresponding value for a 256 valued grayscale palate.
4. Read bit positions 6 & 7 of byte #1 (low order bit positions) of the compressed data stream. Write to a new byte stream with byte #4 containing an appropriately corresponding value for a 256 valued grayscale palate.
When finished processing one byte from the compressed tricolor byte stream, proceed processing the next consecutive byte from the tricolor compressed byte stream.
When the last input byte is processed, or, a high value of ‘11’ [in binary notation] has been reached, then the end of the scan has been reached. Close the files. Processing is complete.
If there are no more bytes in the compressed byte stream, the process is complete. Close all files.
Figure 4. A visualization of the decompression of a tricolor byte stream and how the compressed data stream is correlated to the initial data stream.
Detailed Explanation of Figure 4:
The bytes for the compressed tricolor byte stream are read in.
On top, the sample compressed tricolor byte stream in decimal notation with the binary value of each position displayed.
In the middle, the corresponding expanded tricolor byte stream. Notice how one byte carries a binary numerical value of either, 0, 1, 2, or 3.
Notice that byte #1 of the expanded data stream contains the value of positions 0 & 1 of byte #1 of the compressed tricolor data stream.
Notice that byte #2 of the expanded data stream contains the value of positions 2 & 3 of byte #1 of the compressed tricolor data stream.
Notice that byte #3 of the expanded data stream contains the value of positions 4 & 5 of byte #1 of the compressed tricolor data stream.
Notice that byte #4 of the expanded data stream contains the value of positions 6 & 7 of byte #1 of the compressed tricolor data stream.
Notice that byte #5 of the expanded data stream contains the value of positions 0 & 1 of byte #2 of the compressed tricolor data stream.
The trailing bits or pairs of ‘11’ are an invalid color value and used to indicate that there is no more valid picture information in the data stream. So, there are only 5 bytes in the expanded data stream.
On the bottom, the original tricolor data stream in decimal notation.
Figure 5. A flowchart describing the decompression process.
8. COMPARING GBP AND JPEG METHODOLOGIES WHEN APPLIED TO AFIS
8.1 Comparing the JPEG & GBP Algorithms
While both JPEG and GBP use the mathematical approach of DCT; with JPEG, the DCT compression is applied directly to a byte stream of 256 colors. GBP first compresses the byte stream with a bit-to-byte correlation. The bit-to-byte correlation initially reduces the byte stream by encoding four (4) bytes into one (1) byte. Only after first encoding and compressing the byte stream, does the tricolor compression then apply DCT compression. Initially, one quarter (¼) compression is achieved. Or, a compression ratio of 1:4. Then, DCT compression is applied. Thus, the GBP file size is less than one quarter (¼) the currently used JPG file size, or smaller, for the same image compressed with standard JPG compression techniques.
The GBP file format compresses data at a ratio of up to 1:92. (Explained below.) This rate is a previously unheard of rate for AFIS that far exceeds any current rates available or possible.
8.1.1 Mathematical Models
AFIS – Automated Fingerprint Identification System
BW – Black and White image
DCT – Discrete Cosine Transform
GBP – This newly developed compression method.
ISO – International Standards Organization
JPEG – Joint Photographic Experts Group compression standard / format
RLE – Run Length Encoding
TRI – Tricolor palate.
BW – Black and White image (bi-tonal, two colors only)
GBP – This newly developed compression method. The application of DCT upon a picture that is in a compressed TRI (the three colors, black, white & gray) color palate.
JPEG – Joint Photographic Experts Group compression standard / format.
TRI – Tricolor palate. Three colors, black, white & gray.
Compression rate – The compression ratio expressed as a percentage.
Compression ratio – The ratio of the number of parts -- pixels when dealing with images – of the compressed object to the number of parts of the original object. Used to indicate the magnitude of compression.
Grayscale – an imaging method with a palate of the colors black, white and shades of grade. There are no colors in grayscale imaging.
Reduction rate – The reduction ratio expressed as a percentage.
Reduction ratio – The inverse of the compression ratio. I.e. The number of parts -- pixels when dealing with images – of the original object to the number of parts of the compressed object. Used to indicate how much smaller the compressed object will be than the original object.
18.104.22.168 Variables Defined
C – The compression ratio/rate for a given compression type using maximum compression prediction.
C’ – The compression ratio/rate for a given compression type using minimal compression prediction.
R – Reduction rate. The inverse of the compression ratio rate using maximum compression prediction.
R’ – Reduction rate. The inverse of the compression ratio rate minimal compression prediction.
The subscript indicates which compression type is being referred to.
22.214.171.124.1 Values for maximum expectations:
[Eq. 1.1.1] CDCT = CJPEG = 1:23 = 0.0434 = 0.04 = 4%
[Eq. 1.1.2] CTRI = 1:4 = 0.25 = 25%
CGBP=[ CTRI / CDCT ]=[ (1:4) / (1:23) ]=1:92=0.0108=0.01=1%
[Eq. 1.2.1] RDCT = 1 / CJPEG = 1 – 0.04 = 0.96 = 96%
[Eq. 1.2.2] RTRI = 1 – 0.25 = 0.75 = 75%
RGBP=1–[(CTRI/CDCT]=1–[ (1:4) / (1:23) ]=1–0.0108=1–0.01=99%
126.96.36.199.2 Values for minimum expectations:
[Eq. 2.1.1] C’DCT = C’JPEG = 1:17 = 0.0588 = 0.06 = 6%
[Eq. 2.1.2] C’TRI = 1:4 = 0.25 = 25%
C’GBP=[ C’TRI / C’DCT ]=[ (1:4) / (1:17) ]=1:68=0.0147=0.01=1%
[Eq. 2.2.1] R’DCT = 1 / C’JPEG = 1 – 0.06 = 0.94 = 94%
[Eq. 2.2.2] R’TRI = 1 – 0.25 = 0.75 = 75%
9. Software Evaluation & Quality Assurance Testing
9.1 Did the software produced, meet the design specifications?
In regards to the engineering prize, the author feels the design specifications were met. However, as the prize was withdrawn, prior to completion of the project and; the finished product could not be submitted for testing. We will never truly know.
As is common with many problems, solutions are found with work-arounds. For 20 years, the FBI did not receive a satisfactory submission for the engineering prize. So, with the increase in the speed of transmission lines and; the reduction in size of memory and memory devices, i.e. disk drives; the FBI simply bought new hardware that had a lot more space for their files. Such solutions are not uncommon in the computer field. ‘Simply buy a faster processor.’
It should be noted, that the National Institute of Standards [NIST] of the U.S. Government will test AFIS systems for compliance. As the project only developed the data compression engine and; the developer did not have a current AFIS system into which to integrate his data compression engine, a submission to NIST for evaluation was not possible.
However, testing with publicly available images, specifically disseminated for testing AFIS systems, produced compressed files matching the mathematical model predictions. What is necessary, is a fingerprinting expert’s evaluation of the restored images with original images. An expertise the developer lacks.
9.2 Algorithm Results
The tricolor compression reduces the size of the number of bytes from the scan sequence by 75%.
The file produced from the tricolor compression upon the scan sequence, is further reduced in file size by 96% by applying DCT compression.
In relation to the original scan sequence, a tricolor compression with DCT applied to the tricolor compression, achieves a 99% reduction from the original scan sequence.
An original byte stream for a typical fingerprint of 300,000 bytes [300K], compressed with tricolor compression, yields a reduction to 75,000 [75K]. Further reduction, of the 75K byte stream with DCT, yields a reduction to 3,000 [3K].
9.3 Algorithm Improvements
With an initial byte stream of image data, using the JPEG compression method which is predominantly a DCT method: The JPEG method produces a compression ratio between 1:17 and 1:23. This results in a compression rate of 4% to 6% to the original byte stream. [Eq. 1.1.1 & 2.1.1] Using the corresponding reduction rate [Eq. 1.2.1 & 2.2.1], the resultant reduction in file size will range from 94% -- 96%. Under optimal conditions, using JPEG, the file reduction rate is 96%.
Using the same byte stream of image data, the GBP compression method initially produces a compression ratio of 1:4. The compression ratio is then further enhanced by applying DCT compression which has a compression ratio between 1:17 and 1:23, the combined reduction ratio is 1% [Eq. 1.1.3 & 2.1.3] with a corresponding resultant reduction in size of the original byte stream by 99%. [Eq. 1.2.3 & 2.2.3] In addition, the reduction ratio of 1% for the GBP method is consistent.
Hence, the GBP compression method can produce results ranging from 3% to 5% improvement upon the JPEG method. Although this improvement seems small, that is only proportionally in percentage terms. As demonstrated in the sample below, in data terms, in terms of the number of bytes, the reduction is quite significant, especially when dealing in the large volume of data in fingerprint archives.
Given any JPEG file of a fingerprint image, the GBP method can further reduce the file size by a minimum of 25% and a maximum of 40%.
9.4 Test Data and Results
Using an original byte stream for a typical fingerprint of 300,000 bytes [300K] which corresponds to an image of approximately 550 x 550 pixels [an average sized fingerprint image]:
Compressed with tricolor compression, tricolor compressions yields a reduction to 75,000 [75K]. Further reduction, of the 75K byte stream with DCT, yields a reduction between 4,500 [4.5K] to 3,000 [3K].
Compressed with JPG compression, JPG compressions yields a reduction between 18,000 [18K] to 12,000 [12K].
Compare the reduced image size of 4.5K with the corresponding 18K reduction with JPG or; the reduced image size of 3K with the corresponding reduction with JPG to only 12K.
These results were from actual fingerprint test files.
9.5 Computational Tradeoffs
GBP does not use a predictor. [HEW01] [PEN01] A predictor is an algorithm that “predicts” by probability, what the next character in the data stream will be. Again, the prediction will shorten the data stream by being able to use partial symbols instead of whole symbols, to indicate a stream of symbols. E.g. A numerical sequence or series can easily define the next number in a series with an equation. Knowing all the values for all the members of a series only requires:
1. Knowing the value of the first member (also referred to as the lower bound);
2. The proposition defining the class (which can be an equation: All x such that x is greater than 1. Or, f(x)=x>1.) and;
3. Some definition of the end (also referred to as the upper bound) – either a quantity, e.g. the next 6 or; a proposition defining the end of the series. (All x such that x is greater than 1 and less than 10. Or, f(x)=10>x>1.) [LAR01] [RUS01]
Although GBP does not use a predictor, logically, the GBP algorithm lends to further reduction by using a predictor, because a byte has 256 possible values (2^8). However, GBP only has 81 possible values (3^4). Because, there are only 3 valid colors and only 4 pixel values per byte; this is a total permutation of 81 possibilities. So, the author wanted to ascertain if using a predictor would increase performance.
Conventional wisdom indicates that serially using compression techniques appears not to be very effective. Usually, compressing a file twice, especially with the same technique, will produce no additional compression. At most, an additional 2%--3% may be achieved. (Again, it is demonstrated that the computational effort is not worth the additional compression achieved.) This is in accordance with Shannon’s monumental work [SHA01] that there is fundamental entropy for compression. E.g. Compressing any given file with ZIP™ compression more than once, produces minimal additional compression. Compressing this article’s Word doc file with WinZip™ once yields a 37% compression. Compressing that compressed file with WinZip™, yields 0 compression.
Using an additional compression technique with a predictor upon a tricolor compressed file produced significant but minimal additional compression. However, the additional computational time involved was 60 milliseconds. The computational time for tricolor compression is less than 1 millisecond. The computational time for a predictor was 60 times greater than using GBP compression alone. This is a real life, dramatic example, in the tradeoff between computational time and reduction in file size.
9.5.1 The Limits of Compression
Although the author has read and heard much about sequential repetitive compression not being useful, the author has never read a valid explanation as to why and how this is so. The matter has remained more of an urban legend of crackers and hackers than a scientific matter for computer scientists. This caused the author to meditate upon the subject and deduce a reasonable explanation with a supporting mathematical model, symbolizing the hypothesis that repetitive compression does not increase compression and why that is so. As this subject is pertinent and germane to the topics discussed in this article; since serial compression was used and tested with and; without positive results; in addition as this dramatically demonstrates the computational tradeoff involved in compression, I will include a discussion of repetitive compression.
There is a value to such theoretical discussions of the ability to solve a puzzling problem. Sometimes, a breakthrough realizes a solution. Sometimes, a breakthrough realizes the impossibility. For centuries it perplexed cartographers, as to why the minimum colors needed for map making was 4 and the maximum was 5; until 20th mathematicians explained it. [APP01] [APP02] [APP03] [WIK07] [WIK08] Likewise, the Koënegsberg bridge problem is now solved and known to be impossible. [BAR01] More relevantly, one time key encryption was proven to be undecipherable. [KAH01] In the case of one time key encryption, the goal was to prove something was impossible. In the instant case of AFIS compression, further compression is the goal. However, the infeasibility of such action is worth proving. Infeasibility should not be mistaken for impossibility. While further compression is possible, it is not very feasible. The study of GBP and its ability to compress better than JPEG demonstrates that while further compression is possible, while it is possible it requires tremendous computational power and; that this is true for compression in general – that the more gains in compression you try to make, the more computational power is necessary and; that there are limits to compression as well as what those limits are.
Indeed, at one point in the process, using 60 times the computational power, produced no additional compression, while at one point in the process, negligible additional computational power produced three times the compression. E.g. Tricolor compression produces 25% compression. Followed by JPEG compression, this produced 99% compression. The difference, 74%, is almost three times the compression achieved by tricolor compression alone.
Assuming repetitive compression actually works, it becomes apparent, that when a certain amount of compression is too small, then it is insignificant.
Whatever method of compression is used, even if it is designed for compressing generic data, once that algorithm is applied, those factors that are reducible are eliminated and can no longer be reduced. Thus, any compression algorithm is self limiting. (Theoretical examples that have been substantiated by testing will be given later on.)
One can consider consecutive compression, from a set theory point of view (to be discussed shortly) or; as an infinite series or limit, as in the mathematical sense or as calculus would define a limit. To continually compress the size of something, philosophically and mathematically has a limit. There is a density limit. What is the end result, limit, of constantly dividing something in half?
We can consider the compression ratio as a fraction, such as “”.
Now, let us consider repetitive compression. It can be treated as repetitive division. As the compression rate can be expressed as a fraction, repetitive compression can be expressed as dividing a fraction by a fraction.
Thus, total repetitive compression =
(The letter ‘C’ denotes a compression rate.)
Eventually, there should be nothing left to compress. Or, the amount of compression will be so small, it will be insignificant or, immeasurable. – Theoretically speaking.
9.5.2 Double and Serial Compression
Let us define two terms. These terms are arbitrary and merely chosen for the sake of clarity in our discussion.
Serial compression – repetitively compressing a data set with different algorithms. For example: Compressing data set ‘a’ with compression method ‘a’. Resulting in data set ‘b’. Then, compressing data set ‘b’ with compression method ‘b’. Resulting in data set ‘c’.
Double compression – repetitively compressing a data set with the same algorithm. For example: Compressing data set ‘a’ with compression method ‘a’. Resulting in data set ‘b’. Then, compressing data set ‘b’ with compression method ‘a’. Resulting in data set ‘c’.
188.8.131.52 The Results of Double Compression
Double compression may or may not result in further compression. Why? One must factor in, is the data generic or specific? And, is the compression technique generic or specific?
Taking the previous example of the word processing file, that is compressed with 37% compression: Many compression techniques that compress word processing files, remove “white space”. White space is the empty lines, the indents, the space from the period at the end of a paragraph to the end of the line, etc. All these blanks, are replaced with control codes. This achieves significant reduction in file size. Attempting this type of compression twice in a row, will result in 0% compression, as demonstrated. Because, once the white space is removed, there is no more white space or blanks to remove.
Another example: Using RLE compression on a fax repetitively, will not produce further compression. This is because the compression replaces strings of white characters and black characters with counters. Once this is done, there are no more strings of white or black characters to replace.
If the double compression is a generic data compression technique, then, further compression may be possible. Because, some new, reducible factors may have been introduced by the compression. But, if the compression scheme is good, newly introduced repetitive symbols should be small or non-existent. Hence, further compression should not be possible.
184.108.40.206.1 Double Compression of JPEG & DCT
Double compression of JPEG (which is predominantly based upon DCT) or DCT, may or may not yield more compression. While the compression ratio of JPEG or DCT is 1:17 = 6%; and double compression is (1:17):17 = 1/172 = 1/289 = 0.003; DCT and consequently JPEG, is lossy. [MIA01] [PEN01] This is because the numerical values of the pixels, when treated as cosines, have results that are not exact. Truncation and rounding errors are produced due to inaccuracy in fractional parts. With repetitive compression and decompression, distortion of the exact original value (cosine) sets in. This prevents an inverse process of image restoration. When taking care to prevent the loss and distortion, lossless JPEG or DCT has little if any result to repetitive compression. The scenario is similar to double compression of specific data – exact values of a set of cosines.
220.127.116.11 The Results of Serial Compression
One would rightly conjecture that serial compression will achieve greater compression, provided that:
1. The subsequent compression techniques are based upon different algorithms looking for different patterns of repetition and/or:
2. The subsequent compression techniques are generic and not specific.
3. That the remaining repetitive patterns are still of sufficient size to be significant. (While it is debatable as to what or how much is significant, which will very from domain to domain (how much is medically significant may be more sensitive than how much is financially significant) the author points out, that in our society and as human beings, we measure things in percentage. Percentage is a fraction and a relation. The author contends that anything less than 1% is usually considered insignificant by modern society and human beings. Likewise, anything between 5% -- 10% is considered significant, but needs to be pointed out. Anything over 10% is of such magnitude that its significance is considered obvious.)
To use the example of DCT and RLE compression upon a fax:
If DCT compression were applied first, the strings of white and black characters would be eliminated. Thus, there would be no strings of white and black characters for the RLE compression to operate on. The net result would be no additional compression.
However, if RLE compression was applied first, then followed with DCT compression; DCT compression would compress the data set, however significantly or insignificantly.
Returning to the example of compressing a word processing document: If a compression technique reducing white space is used, followed by a different technique, such as Huffman compression; the result would be significant compression. Because, there is the definite possibility of Huffman compression finding often used words, such as ‘the’ and ‘and’, which will be replaced with smaller symbols.
As illustrated in the equations in Section 18.104.22.168 above, tricolor compression alone, compresses far less than DCT. But, in combination with DCT, compresses more than DCT. While this is only an increase of 5% compression, it brings the compression to the threshold of 1%. Whereas, multiple DCT compressions can not be applied, because of the loss of data (as explained above, in the previous section). In addition, it is clear, that tricolor compression must be executed first and DCT second.
Even with serial compression, as each compression method reduces successive sets of repetitive symbols, eventually, there will be no sets of repetitive symbols [RUS01], or the sets of repetitive symbols left will be so small, that no more significant compression can be achieved, if any compression can be achieved at all.
To elucidate: Every set is a class. Every class is defined by a proposition specifying attributes. And, we can consider common repetitive symbols as an attribute. Every class can be divided into subclasses. Ultimately, propositions can be found that will describe each class, subclass and individual member of the class. [RUS01] Once that unity has been reached, no more subclasses can be found. [RUS01] In addition, since the set is finite, the number of members is finite. And, the number of possible combinations of the members is likewise finite – a very large number possibly, but finite. Thus, the number of possible common repetitive patterns to remove is also finite. Therefore, compression has a limit.
In addition, there may be many unions between the members of the subsets. I.e. Members of one set of repetitive patterns may also be members of another set of repetitive patterns. The probability of this increases with the amount of members in the subset.
Also, the number of permutations of all the symbols in the set is finite. This is relevant because each compression will change some of the symbols in the set. Yet, each new symbol set, likewise, has a finite number of symbols, which has a limited number of combinations and permutations. And, each new limit of combinations and permutations, becomes smaller and smaller.
Remember, that in essence, we choose from a finite set of symbols as replacement symbols in our compression methods. [E.g. Letters of the alphabet, cosines, whole integers, etc.] Eventually, a point will be reached, where there are will be no more possible combinations and permutations.
To symbolize mathematically:
SN = n
(The number of symbols for a set is n.)
(The number of symbols for the subsequent compressed data set is less than its predecessor.)
SN0 = = 0
Note that data compression, oftentimes, like encryption/decryption, is based upon a frequency analysis of repeated symbols. [HUF01] [KAH01] We replace the repeated symbols with a new symbol, with the goal of shortening the symbol length, meaning the number of symbols. [HUF01] [KAH01] This creates new symbols and/or a new symbol set. But, because of the correspondence with the original data set, the new symbol substitution set may not change the frequency of symbol repetition, but merely alter the length of intermediate, corresponding symbol sets. So, subsequent compression is not possible or subsequent compression rates do not change.
For example, using a dictionary method such as Huffman or LZW™ upon a word processing file, the word ‘the’ may be tabulated as the most frequently used word. This 3 character word may then be replaced with the number one. Then, the word ‘there’ may be reduced to ‘1re’. There are several ‘there’s in the file. So, there will be several ‘1re’s in the file. There are several ‘therefore’s in the file. So, there will be several ‘1refore’s in the file. New and shorter symbols have been created.
While there are now less ‘the’s; the referenced symbol sets, e.g. ‘the’ retains the same frequency as in the original symbol set. [KAH01] The procedure will be repeated and arguendo, the ‘1re’ will be replaced with a new number, e.g. 50. So, all the ‘there’s now become 50. And, all the ‘therefore’s become ‘50fore’. And, all the ‘50fore’s will become some new number, aguendo, 100.
While new shorter symbols for subsets have been produced, their frequency, has not been changed, from their frequency in the original data set. At the same time, many symbols have been eliminated by reference to a subset. E.g. All the letters representing ‘the’ or ‘therefore’, have been removed.
However, the new symbols correspond to subsets of the original data set. And, these new symbols total less in number than the total number of symbols in the original data set. These new, different symbols, can also be combined and permutated, into new, different sets. But, again, as a finite set, there is a limit to the number of combinations and permutations.
Reiteratively compressed, eventually, there will be no more repetitions to be replaced by new symbols.
Representing this mathematically:
If S is the data set and N the number of sets of repetitive symbols within a given data set: Each serial compression will remove one set from all the different sets of repetitive characters. I.e. The first compression equals S-1. The subsequent compression equals S-2. The number of sets will eventually be exhausted:
The resultant compression will be:
This is the maximum compression.
[S and (S-N) correspond to C and CN respectively.
(S-1) and (S-2) correspond to C’ and C’’ respectively.]
Where ‘C’ is a compression method.]
Using the class definition, the limit of compression will be a complex conjunction of propositions – each compression method can be considered as a proposition – that uniquely defines all the symbols in the data set.
So, the closer you get to 100% compression, the less there is to compress by subsequent compression. E.g. If you have achieved 90% compression, then there is only a possible 10% compression left. Your chance, possibility, probability of finding repeated symbols and being able to achieve more compression, decreases inversely to the compression already achieved. – Serial compression has diminishing returns.
Ergo, it will become more and more complicated, and require more and more computing power, to inspect the data set and find repetitions to remove.
E.g. JPEG and DCT are considered the best generic compressions methods with the highest reduction rate. DCT is not a simple algorithm like RLE or tricolor! Even Miano [MIA01], whose explanation is – in the author’s opinion – the clearest and simplest explanation of JPEG and DCT, is involved and lengthy. The math is clearly more complicated than adding up strings of white and black characters which can be done on the fly. Comparatively, DCT requires a lot more code to implement than RLE requires. Huffman coding and dictionary methods, which compress more than RLE, must inspect the entire data set, sometimes, making several passes, to build a code table and only then, creating a new data set with alternate symbols. Clearly, Huffman and dictionary compression techniques require much more computational power than RLE, which counts characters as it goes along, parses, the data set – just once.
It should be clear now, that no matter how complicated the algorithm, SN0 can never exist and is only theoretical, because if SN0 did exist, that would mean that there is no data in our data set.
However, the limit, to the number of repeating patterns that can be removed from a data set, does exist. Because, eventually, by removing all repeated symbols of any kind, there will be no more repeated symbols to remove.
Likewise, the limit to compression also exists. Because, eventually, by removing all repeated symbols of any kind, there will be no more repeated symbols to remove.
Eventually, all sets of repetitive symbols will be exhausted and no more compression will be achievable.
As applied to the subject of this article, since JPEG achieves 94% compression, too much further compression, is not too realistic. GBP achieves 99% compression. Further compression, is even less realistic.
This applies to whatever the object of compression. I.e. The limits of compression equally to color images, black & white images, data transmissions: anything.
This can be visualized with a graph, Figure 6, plotting the rate of reduction for the compression rate:
Figure 6. A graph visualizing the reduction in the gains for reduction with the increase in the compression ratio.
Note greater compression ratios are symbolized by higher fractions – a greater fraction is a greater reduction. Note how the increase in the reduction rate approaches the ultimate reduction rate of 100%. Note that the difference between the rates of reduction becomes smaller and smaller between greater and greater compression ratios. Also, note that the compression ratio scale starts at 1.000. Because a compression ratio of 0 is a one-to-one correspondence expressed as 1:1 or fractionally: 1/1 = 1.
Tricolor compression is at 0.250
JPEG compression is at 0.059
GBP compression is at 0.015
Tabulating this data, we find:
Increase in Reduction
Increase in Reduction
Raw Data, Not Compressed
Table 1. The increase in the rate of reduction between greater and greater rates of reduction.
Note the gains become less and less. Note how GBP gains less than half the gain on JPEG that JPEG gains on tricolor compression.
From the graph in Figure 6, we can see that the rate of change in the reduction rate rises sharply and quickly tapers off. The graph is an asymptote. We also see that the rate of change in the reduction rate, decreases exponentially. Ergo, between higher and higher compression ratios, the rates of reduction increase less and less. This can be understood with a mathematical analysis.
Considering the reduction rate as the function of the compression ratio and; the compression ratio as a fraction, then:
fReduction(C) = = C-1
Taking the derivative of this function shows, that the rate of change of the reduction rate, decreases exponentially:
f’Reduction(C) = = -C-2
Applied to the instant compression algorithms: Notice that JPEG achieves a 94% reduction, with a compression ratio of only 1:17 and that the additional 4.4% reduction achieved by GBP requires a compression ratio of 1:68! Four times the compression ratio of JPEG! And, this achieves only a 4% gain in reduction!
[The gain in reduction by GBP (more than JPEG) is 4%, divided by JPEG’s reduction rate of 94%, is 0.04 or; 4%. This is the gain in reduction in relation to JPEG’s reduction as opposed to the absolute gain in reduction. The two values are coincidentally the same: 4%. That these two values are the same is true in this instance only. The absolute gain in reduction of JPEG over tricolor is 19.1%. (From table 1.) But, the gain in reduction of JPEG over tricolor is 20.3%. I.e. 19.1% divided by 94.1% is 20.3%. ]
As the saying goes, “Experience is a poor teacher. She takes a long time to teach her lessons.” It appears that the FBI and computer scientists at large learnt by brute force that a large increase in the reduction rate for an AFIS algorithm is not realistic. As there has never been a formal discussion of this matter before, the FBI and computer scientists did not reach this decision intellectually. While it has perhaps been intuitive for computer scientists, it is now proven.
Nevertheless, it must be noted, that in certain applications, even a small gain of 1%--2% at a large expense is considered worth the cost. For example: It is well known that gains in jet fighter performance come with a high cost. [WIK05] Yet, the U.S Air Force will spend millions of dollars to increase the flight speed of their jet fighters by even 1%--2%. [BRI01] [RUT01] The tactical advantage is considered sufficient justification for this strategy and policy. Also, sometimes, it is deemed appropriate to do “engineering stunts” in order to develop other necessary technological advancements. [BRI01] [RUT01] It remains to be seen, if AFIS is such an application.
9.6 Algorithm Impact on Database Size and Transmission Speeds
In addition, given the large number of fingerprints in any given fingerprint database, a 25% reduction is a boon. For example, the FBI, which maintains millions of fingerprint images, would have a large reduction in the actual storage size necessary to archive all fingerprint images. (Twenty-five percent of eighty million is a lot.)
In addition, less transmission time is needed for smaller images. This is important in transmitting to/from mobile units, e.g. police cars. This is also important when transmitting images to extra-jurisdictional areas via the internet. Smaller images will be transmitted in less time. Given that fingerprints are sent in sets of a minimum of ten (10), usually accompanied by additional palm prints, there is a volume factor to the reduction.
10. PROGRAMMING LANGUAGE SELECTION
10.1 The choice of the C language was a basic design consideration:
“Performance and portability are the two main reasons why programmers choose C over other programming languages.” [MIX01]
Many graphic programs utilize C for speed and the ability to easily manipulate data down to the bit level. As this project required bit manipulation, C seemed a very logical choice. Certainly, a low level language was much more appropriate than a high level language. Also, the graphics library [IJG JPEG] that was used for JPEG functions is written in C. It was more expedient and reduced runtime execution, to incorporate routines as object modules than program calls.
In addition, the C language is designed for low-level access and operating system access. [KER01] [HOO02] The possible future need to communicate with the operating system; engage in inter-process communication, especially with web browsers and other programs or hardware devices – all leant towards the need for low-level access ability. Again, the C language was considered the preferred choice.
The C language has only 70 commands and case constructs. [KER01] This is a minimal instruction set. (One must be careful not to include library functions in the list of programming commands. There are many library functions and the function libraries are quite large. But the command set is quite small.) Reduced instruction sets allow for maximum speed of execution, with tight code. [CLE01] (Although RISC programming is usually applied to microprogramming, the programming of microprocessors, the advantages of RISC programming can be applied to any programming task. Good engineering, including software engineering, includes simplicity – minimizing complexity. Unnecessary complexity should be avoided. [MCC01] [IEE01] Also, the greatest advantages of RISC, come with moving data. [CLE01] In addition, the C language avoids the overhead of extra code produced by compilers for languages with extended instruction sets.
The diversity or potential diversity of platforms for the multitude of users of AFIS is staggering. There is a wide range of possible operating systems and hardware in use or; that could be in use. Therefore, maximum portability to any potential platform was a fundamental design consideration. Only the C language presents the solution to such a problem. The C language allows for the greatest portability to different machine types and operating systems. [MIX01] [HOO03] Although the portability may not necessarily be as smooth as desired [HOO04], the availability of C compilers on many different platforms does make portability a possibility [HOO02] without major code rewrites.
10.2 Compiler Selection
Originally, an attempt was made to develop the project with Power C. However, getting Power C to work with the IJG JPEG library was very challenging. Errors, constantly cropped up. So, Microsoft™ Software Development Network™ (MSDN) Visual Studio 6.0 with Microsoft™ Visual C++ was used instead. With minor adjustments, MSDN Visual C++ was easily made to work with the IJG library.
11. Integration Into Current Systems
The software developed is just the compression engine. The compression engine is merely one module in an AFIS system, a subset of the entire graphics library and graphics functionality of the AFIS system, including scanning,, database and indexing, etc. As such, a module within a system, the software was designed to send and receive images. In this manner, the software could be easily integrated into current systems with a simple program call. While the project that was developed, used data files, this methodology could be modified to accept and transmit data using the inter-process communication technique of pipes or; other methods, should the need arise.
11.1 Handshaking With Current Protocols
An interface for the JPEG format is required to handshake with the current, conventional standard used (JPEG) for AFIS. A converter to / from the JPEG format is required to handshake with the current, conventional standard used (JPEG) for AFIS. This was built in, to the prototype.
11.2 Backwards Compatibility
Backwards compatibility and integrating into current systems, has two components: hardware and software. As described in the immediately previous sub-section, an interface is necessary to be able to deal with JPEG images and software that uses JPEG images. Indeed, the massive existing digitized fingerprint libraries would have to be operable by the GBP software.
In addition, since only a data compression engine was developed, with the ability to compress and decompress images as well as send and receive those images; the modularity of design lends to easily integrate with existing systems. Implementation would be with a system call and either file passing or piping image data.
As the software was written in C, it should be easily implemented on a variety of platforms.
12. SOFTWARE DEVELOPMENT & CODING
12.1 Development and Functionality:
The first step was to define the compression algorithm and theoretically evaluate for data integrity.
The logical modules of the program were defined as in the flowcharts in figure 3 & 5.
There was no overlap, reuse, of modules within the compression and decompression processes. Hence, the number of modules to code, could not be reduced.
A graphics library, the IJG JPEG library, was selected. The IJG JPEG library was incorporated into the software to take advantage of existing math routines for DCT and IDCT. Thus, reducing development time. This did require procurement of the IJPG JPEG package and its study in order to know how to use the necessary routines and modules in the library.
Coding then followed from the design specifications.
For ease of deployment, compression and decompression were encoded into one program, with switches to indicate which process was being invoked.
Error control for incorrect input, failure to complete and other processing errors were incorporated.
12.2 Tools Used:
Independent JPEG Group’s [IJG] JPEG library was used in the development of this software. This library was chosen because it is written in C, for portability, as C is considered the most portable language across platforms. These are the same reasons as listed above in Section 9.2 describing the choice of the C language for this project.
The software was written in C for maximum portability – a design consideration.
The software was designed to be easily recompiled on any machine and still operate.
Another example of the generic design, was the handling of the issue of big endian and little endian; and again, creating a program where it became irrelevant and a non-issue in migration from one operating environment to another. As one is handling data on a low level, big endian and little endian could have resulted in incorrect bit manipulation which would have altered the image content.
As the operating system will be consistent in its handling the placement of data bits; all that is required, is that the program likewise be consistent in placement and direction of processing bits. This is what was done for consistency and error avoidance.
12.4 Prototype testing:
Test files were secured from NIST.
Reduction in file size when testing sample fingerprint images, was as per mathematical model predictions.
Independent confirmation of these test results has yet to be made.
A metric had to be devised to assess the difference in processing time between JPEG and GBP. As both processes use DCT, with the exception, that GBP must first do a bit-to-byte correlation and then, the DCT; the bit-to-byte processing time and all necessary pre-processing to enable bit-to-byte processing; was measured as the differential. Clearly, as JPEG just uses DCT and GBP uses DCT and a bit-to-byte conversion; GBP must take longer. The question is, ‘How much longer?’ And, is the time differential significant or so great, as to detract from the advantage of the high compression ratio?
The estimated degradation in performance was expected to be minimal as, the bit-to-byte process can be implemented either as register shifting or masking. Worst case scenario, would be an in memory, but not a register, bit shifting and masking. These are the highest speed machine operations.
(While disk operations are slower mechanical operations [DEI01], registers, including shift register operations are electronic operations. [MLV01] As electricity moves at the speed of light, so do electronic operations move at the speed of light. [DOE01] [LOY01] Certainly, register operations are the fastest operations within a computer’s architecture – no mechanical operation and very short, if not the shortest, wire/transmission length. The only limiting factor would be the number of operations. If the number of calculations and shifts is minimized, high speed performance is attainable.)
The differential measured between the processing time of JPEG and GBP, upon a given image, was negligible. Both processes use DCT. This was considered a constant. Even though, GBP is compressing a smaller byte stream. For compression or decompression, meaning byte stream conversions to/from tricolor from/to grayscale, the difference was measurable in less than a millisecond. The author does not have equipment sensitive enough to measure processing speeds less than one millisecond.
Additional compression techniques employed, added 60 milliseconds to the processing. The author could not determine how much of this additional time was due to computational processing alone.
13. FUTURE DEVELOPMENT
Due to the high speed requirements and heavy computation necessary for many graphics applications; many graphics applications make use of a separate graphics chip or math processor. Developing an add-on graphics circuit board with the compression / decompression software burnt into a specialized onboard graphics chip, would greatly reduce processing time. In addition, this should facilitate PC integration of this new software to PC platforms.
An add-on viewer for Internet browsers that would support this image type might be useful, should this new image type be put into production.
14. PROJECT GOAL SUMMARY
A compression / decompression algorithm was articulated and developed. A mathematical model was made that indicated the compression method would be most successful and achieve the results requested. The algorithm was coded and tested. Test results established satisfactory completion of engineering goals. Implementation required integration with existing systems and protocols. An interface was designed and built. While the interface has been tested internally, it has not yet been tested independently. However, a satisfactory prototype has been built for independent testing.
Upon being presented with an image compression problem, the author assessed the algorithms applied. By analyzing the data and thinking unconventionally, the author was able to resolve the problem. Using a combination of existing techniques and disregarded techniques employed in an advantageous way, the author was able to assemble a compression process that achieved new, high, compression rate for a given image type.
The author was presented with a real world problem that needed resolution with an new algorithm. Thereupon, the author set out independently to write an optimal image compression method for the image type.
The author accomplished the goal and produced the software that implements the algorithm.
Many software engineering techniques were applied. Including, obtaining sufficient domain knowledge, to understand the problem and develop an algorithm. Cost, time and implementation considerations were evaluated prior to commencing of the project. Modular design methodology was used. Portability was considered a necessity from the inception of the project and the matter addressed appropriately. The algorithm was completely understood and then transferred to a linear programming solution.
Author’s Bio: Givon Zirkind received his Bachelor’s in Computer Science from Touro College and; his Master’s in Computer Science from Fairleigh Dickinson University, both schools are located in the USA. His career has involved computer operations; design and management of business applications with extensive database programming and management; Internet, computer communications, data transfers and telecommunications; data conversion projects; reverse engineering of data and legacy software; being a published author and editor of a technical journal; teaching and; automated office support. In addition to his AFIS data compression work; he has done independent genetic database development and research. He may be reached at his email: GIVONZ@HOTMAIL.COM.
A demonstration copy is available for policing agencies; vendors to policing agencies; official government agencies; private entities that work with official government agencies or; legally recognized security services that work with fingerprints for forensics or biometric security systems.
To my grandfather for all his support in all my endeavors.
Allen Scott Gerner, colleague and friend, for his assistance with compiler operations, access to his library and encouragement. B.S. Computer Information Science, NJ Institute of Technology; M.S. Computer Science, NJ Institute of Technology.
To Ramona Brandt for donating resources and support to this project.
Dr. Larry T. Ray, Ph.D. Mathematics/Computer Science, Stevens Institute of Technology (NJ), formerly professor of computer science, Fairleigh Dickinson University, for his support in this project.
Dr. Tan, professor of computer science, Fairleigh Dickinson University, who assisted me in my understanding of DCT.
[APP01] Appel, Kenneth & Haken, Wolfgang & Koch, John, Every Planar map is Four Colorable, Illinois: Journal of Mathematics: vol.21: pp.439-567, December 1977.
[APP02] Appel, Kenneth & Haken, Wolfgang, Solution of the Four Color Map Problem, Scientific American, vol.237 no.4: pp.108-121, October 1977.
[APP03] Appel, Kenneth & Haken, Wolfgang, Every Planar Map is Four-Colorable. Providence, RI: American Mathematical Society, 1989.
[APP04] Appel, K. and Haken, W., Every planar map is four colorable, Contemp. Math. 98 (1989) 1-741.
[ARR01] Your Introduction to Morse Code, The American Radio Relay League, Inc., 1997-2006, ISBN: 0-87259-831-4
[BAR01] Linked: The New Science of Networks, Albert Laszlo Barabasi, Boston, Perseus, 2003
[BRI01] Bright, Charles D.; The Jet Makers: The Aerospace Industry From 1945 To 1972, Regents Press of Kansas, 1978, ISBN 0700601724
[CLE01] The Principles of Computer Hardware, Third Edition, by Alan Clements, Oxford University Press, 2000, Chapter 7
[COG01] Cogent Systems, corporate home web page, www.cogentsystems.com
[COG02] Cogent Systems’ client list from Cogent Systems corporate web site, cogt.client.shareholder.com/ReleaseDetail.cfm?ReleaseID=245826
[COG03] About Cogent, Cogent Systems’ corporate publication, www.cogentsystems.com
[DEI01] An Introduction to Operating Systems, Second Edition; Deitel, Harvey M.; Addison-Wesley Publishing Co., 1990
[DOE01] Speed of Light and Electricity, Tim Mooney, Department of Energy, www.newton.dep.anl.gov/askasci/phy99/phy99473.htm
[FBI01] ELECTRONIC FINGERPRINT TRANSMISSION SPECIFICATION, Federal Bureau of Investigation, Criminal Justice Information Services Division.
[FBI02] Integrated Automated Fingerprint Identification System, FBI, www.fbi.gov/hq/cjisd/iafis.htm
[FBI03] Products Certified For Compliance With The FBI’s Integrated Automated Fingerprint Identification System Image Quality Specifications, FBI, www.fbi.gov/hq/cjisd/iafis/cert.htm
[FBI04] Fingerprint Identification, FBI, www.fbi.gov/hq/cjisd/ident.pdf
[GAL01] Francis Galston F.R.S. Etc., Fingerprints, London Macmillan and Company, 1892
[GOL01] Goldstein, Larry J., Schneider, David I., Siegel, Martha J.; Finite Mathematics and Its Applications, Fourth Edition, Prentice Hall, 1980, ISBN 0-13-318221-5, Chapter 2: Matrixes
[GOL02] Goldstein, Larry J., Schneider, David I., Siegel, Martha J.; Finite Mathematics and Its Applications, Fourth Edition, Prentice Hall, 1980, ISBN 0-13-318221-5, pg. 78, identity matrix defined
[GON01] Gonzalez, Rafael C. and Woods, Richard E.; Digital Image Processing, Addison-Wesley Publishing Co., 1992
[HEA01] Hearn, Donald and Baker, M. Pauline; Computer Graphics, Second Edition, Prentice Hall, 1994, ISBN 0-13-161530-0
[HEW01] The LOCO-I Lossless Image Compression Algorith: Principles and Standardization into JPEG-LS, Weinberger, M., Seroussi, G., Sapiro, G.; Hewlett-Packard, Laboratories Technical Report No. HPL-99-193R1, November 1998, revised October 1999. IEEE Trans. Image Processing, Vol. 9, August 2000, pp. 1309-1324
[HIS01] The Hunt Fingerprinting and Ballistics, The History Channel, (video), A&E Television Networks
[HIS02] Modern Marvels: Forensic Science: The Crime-Fighter’s Weapon, The History Channel, (video), A&E Television Networks
[HOO01] Hoover, John Edgar Hoover; The Science of Fingerprints Classification and Uses, Federal Bureau of Investigation, August 10, 2006, Project Guttenberg, EBook #19022
[HOO02] Write Portable Code: An Introduction to Developing Software for Multiple Platforms, by Brian Hook, No Starch Press, 2005, Chapter 7
[HOO03] Write Portable Code: An Introduction to Developing Software for Multiple Platforms, by Brian Hook, No Starch Press, 2005, Introduction
[HOO04] Write Portable Code: An Introduction to Developing Software for Multiple Platforms, by Brian Hook, No Starch Press, 2005
[HUF01] A Method for the Construction of Minimum redundancy Codes, David A. Huffman, Proceedings of the I.R.E., Volume 40, Issue 9, Sept. 1952, pgs 1098-1102, ISBN 0096-8390
[IEE01] Software Engineering Body of Knowledge, 2004 Version, Angela Burgess (Publisher), The Institute of Electrical and Electronics Engineers, Inc., 2004
[IEE02] LOCO-I: A Low Complexity, Context-Based, Lossless Image Compression Algorithm, Weinberger, M., Seroussi, G., Sapiro, G.; Proceedings IEEE Data Compression Conference, Snowbird, Utah, March-April 1996
[INT01] ANSI/NIST-ITL 1-2000, DATA FORMAT FOR THE INTERCHANGE OF FINGERPRINT, FACIAL & SMT INFORMATION, INTERPOL IMPLEMENTATION, The Interpol AFIS Expert Group, Version No. 4.22b – October 28, 2005
[INT02] Companies, Interpol AFIS Expert Group approved vendor list, www.interpol.int/Public/Forensic/fingerprints/workingParties/IAEG/companies.asp
[ISO01] International Standard DIS 10918-1, CCIT Recommendation T.81, Information Technology – Digital Compression and Coding of Continuous-Tone Still Images, International Standards Organization, International Organization for Standards, www.iso.ch
[KAH01] Kahn, David; The Codebreakers: The Story of Secret Writing, Scribner, New York, 1996, ISBN 0684831309
[KER01] The C Programming Language (2nd Edition) -- by Brian W. Kernighan, et al; Paperback, by Brian W. Kernighan, Dennis Ritchie, Dennis M. Ritchie
[LAR01] Larson, Roland E. and Hostetler, Robert P.; Calculus with Analytical Geometry, Alternate 4th Edition, D.C. Heath and Company, 1990, Chapter 10
[LI01] Image Compression – the Mechanics of the JPEG 2000, Jin Li, Microsoft Research, Signal Processing
[LOY01] The Speed of Electricity, Jim Loy, 1999, www.jimloy.com/physics/electric.htm
[MAO01] Maor, Eli., "e: The Story of a Number", Princeton University Press, 1994, ISBN 0691033900
[MCC01] After the Gold Rush: Creating a True Profession of Software Engineering, by Steve McConnell, Microsoft Press, 1999
[MOT01] Motorola Printrak Biometirc Identification Solution, Fact Sheet, Motorola, www.motorola.com/governmentandenterprise/contentdir/en_US/Files/ISD/Biometrics/Motorola_Printrak_BIS.pdf
[MIA01] Miano, John; Compressed Image File Formats, Addison-Wesley Publishing Co., 1999
[MIA02] Compressed Image File Formats: JPEG, PNG, GIF, XBM, BMP, Your guide to graphics; by John Miano, ACM Press, SIGGRAPH Series, 1999, pgs. 10-11
[MIA03] Miano, John; Compressed Image File Formats, Addison-Wesley Publishing Co., 1999, pgs 178-180
[MIA04] ibid., pg 186
[MLV01] Digital Principles and Applications, Fourth Edition, Malvino, Albert Paul and Leach, Donald P.; Macmillan/McGraw Hill, 1986, ISBN 0-07-039883-6
[MIX01] Power C, ANSI Standard High Performance C Compiler; Mix Software Inc., 1993 Introduction
[NEC01] NEC AFIS Customer List, NEC’s corporate web site, www.necam.com/ids/afis/worldwideDeployment.cfm
[NIS01] Information Technology: American National Standard for Information Systems – Data Format for the Interchange of Fingerprint, Facial, & Other Biometric Information – Part I, McCabe, R. Michael and Newton, Elaine M.; NIST Special Publication 500-271, ANSI/NIST-ITL 1-2007, Information Access Division, Information Technology Laboratory, National Institute of Standards, fingerprint.nist.gov/standard/index.html
[NSW01] The Thin Blue Line, Information Section; New South Wales Police Information Page, www.policensw.com/info/fingerprints/finger15.html
[PEN01] Pennebaker, William B. and Mitchell, Joan L.; JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, 1993
[PEN02] ibid., Introduction, pg. 4
[PEN03] ibid., Chapter 15, Compression Performance, see Table 15-8.
[SAG01] Things to Ask When Choosing an AFIS, Sagem Morpho Inc., corporate web page, www.morpho.com/products_solutions/law_enforcement/ask_afis.html
[QUA01] Corporate Description of OmniTRACS, http://www.qualcomm.com/OmniTRACS/
[QUA02] Corporate history provided by Qualcomm, http://www.qualcomm.com/corporate/
[RCM01] RCMP Fact Sheet, Reat Time Identification,
[ROW01] Rowling, J.K.; Harry Potter and the Deathly Hallows (Book 7), Arthur A. Levine Books; 1st edition, July 21, 2007, ISBN 0545010225
[RUS01] Whitehead, Alfred North and Russell, Bertrand; Principia Mathematica, Vol. I, Cambridge University Press, 1910
[RUT01] Ruttan, Vernon W. ; Is War Necessary for Economic Growth?: Military Procurement and Technology Development, Oxford University Press, USA; December 9, 2005, ISBN 0195188047
[SAM01] Samuelson, P.A. and Nordhaus, W.D., Economics, 16th ed. New York: McGraw-Hill, 1998. ISBN 0070579474.
[SHA01] A Mathematical Theory of Communication, Shannon, Elwood; Bell System Technical Journal, 1948
[WIK01] “en.wikipedia.org/wiki/JPEG”, Wikipedia entry for JPEG
[WIK02] “en.wikipedia.org/wiki/Huffman_coding, Wikipedia entry for Huffman Coding
[WIK03] “en.wikipedia.org/wiki/GIF, Wikipedia entry for GIF
[WIK04] “en.wikipedia.org/wiki/Arithmetic_Coding, Wikipedia entry for Arithmetic Coding
[WIK05] http://en.wikipedia.org/wiki/Fighter_aircraft, Wikipedia entry for Fighter Aircraft
[WIK06] http://en.wikipedia.org/wiki/Four_color_map_theorem, Wikipedia entry for ‘Four Color Map Theorem
[WIK07] http://en.wikipedia.org/wiki/Kenneth_Appel, Wikipedia entry for Kenneth Appel
[WIK08] http://en.wikipedia.org/wiki/Wolfgang_Haken, Wikipedia entry for Wolfgang Haken