Chromium Internals: PAK Files
2022-5-3 23:59:27 Author: textslashplain.com(查看原文) 阅读量:16 收藏

Web browsers are made up of much more than the native code (mostly compiled C++) that makes up their .exe and .dll files. A significant portion of the browser’s functionality (and bulk) is what we’d call “resources”, which include things like:

  • Images (at two resolutions, regular and “high-DPI”)
  • Localized UI Strings
  • HTML, JavaScript, and CSS used in Settings, DevTools and other features
  • UI Theme information
  • Other textual resources, like credits

In ancient times, this resource data was compiled directly into resource segments of Windows DLL files, but many years ago Chromium introduced a new format, called .pak files, to hold resource data. The browser loads resource streams out of the appropriate PAK files chosen at runtime (based on the user’s locale and screen resolution) and uses the data to populate the UI of the browser. PAK files are updated as a part of every build of the browser, because every change to any resource requires rebuilding the file.

High-DPI

Over the years, devices were released with ever-higher resolution displays, and software started needing to scale resources up so that they remain visible to human eyes and tappable by human fingers.

Scaling non-vector images up has a performance cost and can make them look fuzzy, so Chromium now includes two copies of each bitmap image, one in the 100_percent resource file, and a double-resolution version in the 200_percent resource file. The browser selects the appropriate version at runtime based on the current device’s display density.

Exploring PAK Files

You can find the browser’s resource files within Chrome/Edge’s Application folder:

Unfortunately for the curious, PAK is a binary format which is not easily human readable. Beyond munging many independent resources into a single file, the format relies upon GZIP or Brotli compression to shrink the data streams embedded inside the file. Occasionally, someone goofs and forgets to enable compression on a resource, bloating the file but leaving the plaintext easy to read in a hex-editor:

If you want a better look inside of a PAK file, you can use the unpack.bat tool, but this tool does not yet support decompressing brotli-compressed data (because Brotli support was added to PAK relatively recently). If you need to see a brotli-compressed resource, use unpack.bat to get the raw file. Then strip 8 bytes off the front of the file and use brotli.exe to decompress the data.

brotli.exe --decompress --in extracted.br --out plain.txt --verbose

Despite the availability of more efficient image formats (e.g. WebP), many browser bitmap resources are still stored as PNG files. My PNGDistill tool offers a GROVEL mode that allows extracting all of the embedded PNGs out of any binary file, including PAK:

You can then run PNGDistill on the extracted PNGs and discover that our current PNG compression efficiency inside resources.pak is just 94%, with 146K of extra size due to suboptimal compression.

Fortunately, the PNGs are almost all properly-stripped of metadata with the exception of this cute little guy, whose birthday is recorded in a PNG comment:

Have fun looking under the hood!

-Eric

Impatient optimist. Dad. Author/speaker. Created Fiddler & SlickRun. PM @ MSFT '01-'12, and '18-, presently working on Microsoft Edge. My words are my own.


文章来源: https://textslashplain.com/2022/05/03/chromium-internals-pak-files/
如有侵权请联系:admin#unsafe.sh