Dark Theme

The QOIG Image Format

Parameterized Extensions to QOI

From QOI to QOIG

Opportunities left by QOI

Before reading this document, it wouldn't hurt to read Dominic Szablewski's blog post regarding the Quite OK Image Format. In that post, he made it clear what his goals were with the format:

  1. Remove complexity (e.g. stop trying to support palettes, greyscale, interlacing, error detection and correction, etc.)
  2. Speed up compression and decompression on a single thread without vectorization
  3. Provide "good enough" lossless compression somewhere in the "space between" RLE and LZW

We can also infer from the choices made in the format and his benchmarks for it that he also had the following goals in mind:

  1. Aim to compress well images selected from the distribution of images as they actually appear on the web
  2. Allow the format to be read as by a bytecode interpreter

However, he admits "I have no idea what I'm doing."1 And this is clear from how much outside input he took and how much he ended up changing the opcodes. However, there is a lot of room to shrink with what he ended up with:

  1. INDEX (1 byte): pixel color caching
  2. DIFF (1 byte): color computed from previous color
  3. LUMA (2 bytes): color computed from previous color
  4. RUN (1 byte): run-length encoding
  5. RGB (4 bytes): literal color
  6. RGBA (5 bytes): literal color

It's striking that he decided that if he can't find a way to encode a pixel in 1 or 2 bytes, he should jump all the way up to 4 or 5 bytes. There's a lot of room in the middle there for 2, 3, and 4 byte codes that could still result in some savings in the length of the data.2 It's also notable that only INDEX and RGBA can possibly be used to change the alpha value from one pixel to the next.3 This is a bit surprising given that he said the format arose from attempting to come up with an alternative to MPEG-1, but it definitely makes sense if you're targeting images that are common on the web or in games. It's uncommon to have a large variation in alpha value. As such, I have not attempted to address this lack. However, I do see enormous opportunities in exploiting the former. In fact, I have found room for more compression in all four categories of opcode without having to make any major sacrifices in performance:

  1. Run-length encoding improved via long runs
  2. Computed colors improved by indexed DIFFs and indexed LUMAs
  3. Pixel caching improved with longer secondary caches
  4. Literal colors compressed by runs of raw RGB and RGBA pixel color data

While exploiting these shortcomings to improve compression ratio does add some complexity to the compressor, they do not substantially compromise the spirit of the format while making it more flexible.

Goals of QOIG

The goals of QOI were to strip out a lot of the extras that PNG implementations provide but which lead to bloat and complexity in the encoder and decoder. One of the goals of QOIG is to add exactly one of those things back in, even if not by much: control over the time-space tradeoff. Here are my priorities:

  1. Consistently outcompress QOI without sacrificing compression or decompression speed
  2. Maintain backward compatibility—every valid QOI file should be a valid QOIG file
  3. Provide options to spend a little or a lot more time to squeeze out a little more compression.

QOIG's New Extensions

More Powerful Run-Length Encoding

The earliest version of QOI featured a two-byte RUN op, which shows that Szablewski understands there is room to encode longer runs. However, the final version can only encode runs up to 62 pixels long. A run of 62 pixels is represented by the byte 0xFD. Some of the benchmark images QOI was tested on result in QOI encodings with runs of thousands of 0xFD bytes. For example, this results from uniform backgrounds around an image (such as transparent backgrounds). Indeed, I would bet that the most likely byte to follow an 0xFD byte in a QOI file is...another 0xFD byte. This kind of redundancy shows there is room for multi-byte run encodings like that which was removed. It also demonstrates how we can introduce them without adding any new bit codes: we'll just let the 0xFD byte serve its original purpose as a run of 62 bytes while also being the opcode for long runs.

QOIG with the longrun option enabled uses the byte following an 0xFD byte to extend the 62 pixel run's length by up to 127 more pixels. Or, it uses the following two bytes to encode a run of up to 2^15+128 more pixels. So, a run of 70,000 pixels, which QOI encodes using over a kilobyte of 0xFD bytes, can be encoded with just 9 bytes in QOIG. See the comments in qoig.h for the full spec.

Colors Computed From More Than Just The Previous Pixel

The idea behind the DIFF and LUMA op codes in QOI is that even when a pixel's color is different from the one just before it, it will often still be very similar to it, such that we can just store the small difference between the two rather than the whole new pixel.4 I have decided to take this idea further: When the current pixel's color is not similar to the one before it, it is still likely to be similar to some other color we've seen before. And don't we already have an INDEX op that is used to refer to colors we have seen before? So why not use an INDEX to fetch a similar color followed by a DIFF or LUMA to transform it into the correct color?

It seems like a good idea, as it allows pre-existing code to be used in a new way. (In fact, in the decoder, implementing it amounts to adding just a few lines of code between the INDEX op case and the DIFF and LUMA op cases.) However, there are two problems:

  1. An INDEX op followed by a DIFF or LUMA already has a meaning in QOI: they encode two different pixels.
  2. INDEX was designed to match exactly the same pixel, which means the hash function into the cache was designed to map similar pixels to very different locations to reduce collisions and the ejections of needed colors before they can be referenced.

I elected to solve both problems the same way: split the cache into two parts. The first part is used exactly like QOI's cache: an INDEX into that section of the cache indicates an intent to use exactly that pixel. The second part, the "similarity cache," is entirely different:

  1. Indexing into this part of the cache means that the following DIFF or LUMA op is to be applied to the cached color. That is, the two operations are collapsed into a single "indexed DIFF" (2 bytes) or "indexed LUMA" (3 bytes).
  2. The hash function into this part of cache is designed to ensure that colors similar enough that a DIFF could encode their difference will hash to the same location.
  3. Because we don't need an exact match, we can actually potentially gain a little bit of compression via a linear search through this part of the cache for similar colors if we want to make that time-space tradeoff.

Of course it isn't at all clear how much of the cache should be reserved for exact matches and how much should be reserved for similar colors. In fact, I've found experimentally that different types of images achieve maximum compression by splitting in a wide variety of places. So, in keeping with the goals of this project, the place to draw that line is a tunable parameter to the encoder, and the value selected is stored in the file. See the comments in qoig.h for details.

Caching Even More Colors

From the beginning, Szablewski decided that the color cache for QOI would contain 64 colors exactly. But after playing around with the cache-splitting scheme above, it became clear to me that spending an extra byte indexing into larger caches would likely yield a great deal of savings. This was due to the fact that in practice, the cache is never really saving the last 64 colors. Thanks to hash collisions, it could be kicking out colors seen just a few pixels before.5 So I set about adding two-byte indexing options with some larger secondary caches, and, indeed, I was able to bring down compressed file sizes by 0.5% or more without affecting speed at all.

In QOIG, when the longindex option is enabled, two new op codes cut into the INDEX op the same way the RGB and RGBA op codes cut into the RUN op. The last two slots in the main cache are no longer used, as 0x3E and 0x3F are new op codes indicating to use the next byte to index into the 256 color secondary exact-match cache or the 256 color secondary similarity cache respectively. As with the main similarity cache, the latter is expected to be followed immediately by a DIFF or LUMA to modify the fetched color. See qoig.h for the full details.

One more thing: since these secondary caches are new to QOIG, they can use any hash function I want and come pre-populated with any colors I want. I have elected to fill them with all the web-safe colors plus a bunch of grays. The longer exact-match cache uses a new hash function that happens to minimize collisions among these popular colors. As a result, QOIG compresses web-safe images to less than 50% the size QOI does.

Shaving Opcodes From Blocks of Literal Colors

If none of the above techniques can find a way to compute the next color from a previously seen color, there is no recourse but to spell out the exact color to be inserted. QOI uses the opcodes RGB and RGBA to do this. Each is an entire byte marker followed by the color data for the next pixel. However, in many cases, there are several such literal colors listed one after another, especially near the beginning of an image before the commonly used colors for that image can be cached. With QOI's scheme, every single one of these new colors needs an additional byte to encode it over and above the raw color data. Once I noticed this, it became an obvious attack point to squeeze out more compression, and it turned out to be extremely effective. The basic idea is this: if there are going to be several literal pixels in a row, just mark off the next section of the file as a block of literal colors. Then each literal color does not need to be marked separately. In the QOIG "rawblock" scheme, a 2-byte marker is used to mark such a run. The first is a new opcode, RGBRUN. (It is assigned the bytecode 0x6A, which QOI never uses for anything.) The second byte encodes whether it is a run of three (RGB) or four (RGBA) channel colors and the length of the run (up to 129 colors). Following these two bytes, the colors are laid out one after another in the file without any extra markers. For encoding, this is done anywhere there would be at least two RGB or RGBA ops in a row, since we would need two extra bytes to encode two such colors anyway under the QOI scheme. However, for three or more colors, we start saving bytes: one byte saved for each color after the first two in the run. Refer to the comments in qoig.h for details.

It's worth noting that using this scheme actually speeds up compression slightly: the literal colors are buffered and written to disk all at once rather than separately and individually, saving on I/O overhead.

Results

First and foremost, as planned, encoding and decoding are both just as lightning fast as QOI.6 However, my first goal is to achieve better compression than QOI, so I took an assortment of different kinds of images, found the best way to split the cache for each, turned on longindex, longruns, and rawblocks, and compared the file sizes with those of plain QOI. Here's what I got:


	Image   QOI file size   QOIG file size  Ratio (QOIG/QOI)
	--------------------------------------------------------
	alphanoise      34630            32051             92.5%
	bricks        2122775          2075202             97.8%
	diamondtrust   231486           187120             80.9%
	mines            2523             1854             73.4%
	noise          817592           550065             67.3%
	panel          300948           286520             95.3%
	perlin         261999           197648             75.4%
	plant         1820118          1809457             99.4%
	rock          2283625          2238510             98.0%
	thinwall        60697            53100             87.6%
	unknown        240624           215780             89.7%
	ventwall        84118            74130             88.3%
	ventwall2      120759           114537             94.8%
	wspeppers      398999           184705             46.3%
                                             

Some notes: Raw blocks are the reason QOIG beats QOI on very noisy images like noise, alphanoise, and perlin.7 The low ratio on wspeppers is largely due to the prepopulation with web-safe colors: that image uses only such colors. The low ratio on mines is due to the expanded caches, as the image has a relatively limited palette, though still much larger than main cache. The low ratio on thinwall is due to both the expanded cache (narrow color range) and long runs (large areas with solid color).

Reference Implementation

The reference implementation can be found at https://github.com/quintopia/QOIG, but the format should not be considered final. It is an experiment, after all, and I might come up with additional tweaks in the future. However, it is still a good starting point for experiments along similar lines. However, the basic structure and interface of the application should remain consistent.

This implementation builds as a streaming converter between PNG and QOI/QOIG. "Streaming" means that it does not store the input or output files in RAM, which means it can handle very large files. I don't know exactly how large, but the main functionality should continue to work even as the inputs grow to many gigabytes in size. The PNG encoding and decoding is handled by libspng, a fast streaming PNG codec (compresses about 30% faster than libpng). If compiled as qoigconv, it can be called like this:


	qoigconv image.png image.qoi
										

This creates a plain QOI file with none of the optimizations described above. The same can be done with the -q option.


	qoigconv [options] image.png image.qog
	                                    

Encoding with a .qog extension allows the use of the extensions selected as options. A brief rundown of the options available:

  1. -q Use plain QOI
  2. -cNUM Set the size of the main exact-match cache. A parameter from 0 to 30 here maps to a cache length from 0-64 roughly double that value, but with the guarantee that it will not be a multiple of 3, 5 or 7 (as the QOI hash function would certainly underuse the cache)
  3. -r Enable longruns extension
  4. -i Enable longindex (secondary caches) extension
  5. -b Enable rawblocks (rgb/rgba runs) extension
  6. -s Enable linear search of the similarity caches for similar colors.
  7. -nNUM Test NUM different cache lengths on at least the first 10th of the input. The lengths are tested in an order that tries very different lengths first so that testing only a few will still find a decent result. Prints the best length found to stdout for use with future similar inputs.
  8. -mNUM Equivalent to -irbs -cNUM if NUM is given. Otherwise, equivalent to -irs -n31. In short, prioritize (but don't guarantee) maximum compression over speed.
  9. -fNUM Equivalent to -irb -cNUM if NUM is given. Otherwise, equivalent to -ir -c27. In short, balance speed with compression strength

	qoigconv image.qoi image.png
	qoigconv image.qog image.png
	                                    

This will decode the QOIG file back into a PNG.

Note that the filename endings must exactly match one of the above formats. One must be .png and the other must be .qoi or .qog.

Possible Future Changes to the Format and the Codec

  1. Improved hash function for long similarity cache
  2. Animated QOIG format (whenever libspng adds APNG support)
  3. Implement fast modulo for hashing speed

1To be clear, I have slightly more idea what I'm doing when it comes to compression, but I do not mean to imply anywhere that I have given the format more thought than Mr. Szablewski. We have slightly different goals, so I have taken his work in a different direction.

2Indeed, some of the op codes he ended up removing from the final spec fill in this space with 2 and 3 byte opcodes. I can guess why they were removed, and I have attempted to stick with the spirit of that choice while adding such codes back in.

3Some of the removed DIFF variants also allowed the alpha to be adjusted incrementally, so it is clear he was considering making it possible, but realized such opportunities didn't come up often enough in practice to be worth it.

4This is least likely to be true at the beginning of a row. It is more likely to match the pixel above it than the pixel at the end of the previous row. This is a potential tiny optimization that is not likely to yield enough savings to make it worthwhile to exploit and goes unexplored by QOIG.

5As a side note, I'm not particularly satisfied with QOI's hash function. It's not terrible, but I would have avoided using 3 and 5 as multipliers since 255 is a multiple of both. Subpixels are more likely to be multiples of these, which could make the distribution less uniform.

6I haven't actually benchmarked it, but in theory, longruns should speed up compression, longindex should double the amount of work done on a color at most, indexed DIFFs and LUMAs do similarly, and searchcache slows things down significantly, especially combined with longindex, and so should be avoided when speed is required. In short, I would be surprised if compression takes more than 4 times as long as QOI with all three major new features on. It should certainly still be beating stbi.

7In actuality, neither does particularly well on noisy images. For the perlin image, both QOI and QOIG produce a larger file than would leaving the image data uncompressed. However, QOIG's output is only 0.5% larger than uncompressed, while QOI's is 33% larger.