Iterating Over Zip File Entries with libzip

libzip is one of the most full-featured and widely-used open source libraries for working with Zip arhives and has been adopted by numerous open source projects as well as commercial products. The library is implemented in C, which also makes it ideal for implementing language-specific bindings.

While its documentation is very comprehensive, I find the descriptions of its various API’s to often be unclear, and there are no examples provided for how various common use cases could be implemented using the library.

One such example is iterating over file entries within a Zip archive. How this could be done is rather non-obvious from reading the documentation, but I was able to find a useful post in the libzip Mailing List Archive on this specific use case.

As a sidenote, the fact that the mailing list archive page is entitled “List Archive”–combined with the fact that there is no search functionality on the website–makes it annoyingly difficult to find information on this topic due to a lot of unrelated mailing list threads showing up in the search results.

In short, this can be achieved using the zip_get_num_entries and zip_get_name functions. zip_get_num_entries returns the number of file entries inside the archive, and then it is possible to fetch the file names of each entry by its index:

// archive is a zip_t* returned by either zip_open or zip_open_from_source.
zip_64_t num_entries = zip_get_num_entries(archive, /*flags=*/0);
for (zip_uint64_t i = 0; i < num_entries; ++i) {
  const char* name = zip_get_name(archive, i, /*flags=*/0);
  if (name == nullptr) {
    // Handle error.
  }
  // Do work with name.
}

Note that the Zip format does not have the concept of “directories” the way that file systems generally do. Each entry in a Zip archive has a name, and the entry names would implicitly reflect the “directory structure” in the archive. This directory structure can then be reconstructed by the program when extracting the files onto the local file system. Generally, Zip file libraries treat “directories” within Zip archives as entries with names ending with '/', for example as described in the javadoc for java.util.zip.ZipEntry.isDirectory().

I plan to write a few follow-up posts on this topic, including some more details about how directories are handled for Zip files (such as the libzip zip_dir_add function and how Zip archive programs use these entries), the libzip zip_source data structure and its related API’s, and possibly a few others.