Not an exhibitionist thread ((un)zip and windows)



  • I inherited some C++ code that runs on Linux and I'm porting it to Windows. Most of my pains is that the code use a lot of bash scripts to do some bulk file system manipulations (it actually launches another process and does some setup/clean-up and reading of results of that other process), which as always with old code looks like a :wtf:, probably is a :wtf:, can be explained by hysterical :raisins: but still is a :wtf:, and is complicated to change and actually kind-of, sort-of makes sense so it's not gonna be changed despite being a :wtf:.

    Except that bash scripts on Windows are a non-starter (well, maybe not bash scripts by themselves, WSL is a thing now, but the scripts really assume too much things about the file system to make sense in Windows), so I'm rewriting them directly in C++/Qt. On the whole, it's painful but not hugely complicated, and I'm paid for that, so... that's life. 🤷♂

    Now the thing is that one of those scripts calls gzip (actually tar + gzip) and I'm not sure how to replace that by a simplish Windows C++ equivalent.

    I can see that there are tons of 3rd-party libraries to do that, of course, but I'd rather avoid adding one more dependency to our stack (especially since it means paper work to get it approved and :kneeling_warthog:).

    The code uses Qt, which is linked against zlib and has a nice and easy qCompress() function, but that only compresses a byte stream, so while I can trivially use it to compress a single file (as long as I don't care about preserving attributes, which is the case), it's more work to compress a directory (and at that point it looks a lot like reinventing the wheel!). It also has the drawback that the compressed file cannot be uncompressed by anything than my application since it's not some sort of standard format, which isn't really a big issue but still, would be nicer (for debug) if it wasn't the case. So I'm using that for single files at the moment, but for directory... I'm still looking for something better!

    Since at least a few versions, Windows can natively handle ZIP archives (without buying WinZip), so there should be a C++ API for that, right? Right?

    I found some Microsoft doc for a Compression API (cmpapi) but from a quick glance it looks pretty low-level (i.e. at best equivalent to qCompress() in Qt, but not stuff I can use easily to compress a whole folder).

    Googling around, I see some code example (e.g. here) that basically creates an empty file, writes a magic header into it, and then simply copy the source files into that file (opening it as a directory first). I assume that behind the scene Windows detects that the target is a ZIP file (thanks to the magic header) and compresses stuff. Not quite sure how to list contents and extract from it though. But anyway, this sounds a bit weird.

    So, is it the official way to do that or am I supposed to do it some other way? Is there a usable doc somewhere?


  • Considered Harmful

    Hell knows when exactly it appeared, but ï…º 10 also has native tar now.

    .

    If that doesn't do, can you use Powerhell? There's a Compress-Archive that can easily create ZIP files for you.

    .



  • @Applied-Mediocrity That probably isn't very clear in my first post (it sounds obvious for me but of course it does...), but my preference would be to replace scripts entirely by C++ code (to avoid relying on chunks of code in other files, or at least in other languages).

    Though I guess at worst I'll write a Windows script to sit next to the Linux script and do things that way, but I'd rather avoid it if I can (for most scripts, since I can rewrite them in pure Qt I can make it so that they work in Linux as well, meaning I can remove the script entirely from the code base, which is IMO much cleaner overall).



  • Also based on the examples I found I can indeed create an (empty) ZIP file by writing the magic header, and I manage to call various functions down to CopyHere(), but that one fails with a "Invalid pointer" exception and I can't find out why.

    That still feels somewhat clunky to me as I have to do stuff with COM that I have no idea what it is/does/can fail etc. but maybe there is no way to avoid that...

    Code (hacked together just to see if I can get the correct function calls, don't worry about error handling etc. for now) currently looks like:

    	QString archive_file; /* something like "C:/foo/bar.zip" */
    	QDir dir_to_archive; /* something like "C:/foo/baz/", it contains a bunch of regular files (no subdirectory or fancy thing) */
    
    	FILE* f = fopen(qPrintable(archive_file), "wb"); // yeah qPrintable() is ugly
    	// Same value as registry key HKEY_CLASSES_ROOT\.zip\CompressedFolder\ShellNew\Data.
    	fwrite("\x50\x4B\x05\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 22, 1, f);
    	fclose(f); // At that point I have what Windows recognises as a valid ZIP file.
    
    	CoInitialize(NULL); // something something initialise COM... no idea actually, I'm just copy-pasting from random web pages... there's no way this can go wrong, right?
    	_COM_SMARTPTR_TYPEDEF(IShellDispatch, IID_IShellDispatch);
    	_COM_SMARTPTR_TYPEDEF(Folder, IID_Folder);
    	IShellDispatchPtr shell;
    	
    	// OK, let's get to work. If I get it right, we create a shell, then use it to point to a directory (our ZIP file), then copy stuff into it.
    	HRESULT hr = CoCreateInstance(CLSID_Shell, NULL, CLSCTX_INPROC_SERVER, IID_IShellDispatch, (void **)&shell);
    	if (SUCCEEDED(hr)) {
    		FolderPtr destFolder;
    		hr = shell->NameSpace(variant_t(qPrintable(archive_file)), &destFolder);
    		if (SUCCEEDED(hr)) {
    			variant_t options = FOF_SILENT | FOF_NOCONFIRMATION | FOF_NOERRORUI; // check correct flags to use here... probably not the most urgent of things to do...
    			foreach(QString entry, dir_to_archive.entryList(QDir::Files)) {
    				// I'm just picking regular files for now, don't worry about it.
    				// dir_to_archive.absoluteFilePath(entry) is e.g. "C:/foo/baz/file1"
    				try {
    					variant_t item = qPrintable(archive_dir.absoluteFilePath(entry));
    					hr = destFolder->CopyHere(item, options);
    				}
    				catch (const _com_error& ex) {
    					// And at this point I always get an "Invalid pointer" exception. But why?
    				}
    			}
    		}
    	}
    
    	CoUninitialize();
    

    ETA: file paths from Qt use / e.g. "C:/foo/bar" but if I translate those to proper Windows paths (with QDir::toNativeSeparators()) the result is the same, so that's not the issue.



  • To be honest I would just get some third party library and use that - the extra dependency is surely less pain than doing it manually.

    There's some stuff in .Net for this (https://docs.microsoft.com/en-us/dotnet/api/system.io.compression.ziparchive?view=net-5.0 - since .Net 4.5 so should be there for you) if you can use the C++/.Net bridge,



  • @bobjanova there's some .NET in our code base somewhere and it interacts with our C++ code so I guess in theory I could bridge to it but at that point it's adding so much stuff into my current project (which itself is pure C++) that it's indeed probably easier to get a 3rd-party library to do it, or use a (Windows) script.

    That's annoying, but unless I can find an easy alternative, that's going to be the only viable short-term (i.e. permanent) solution.



  • If you have access to the zlib stuff, you could invoke that directly. I have some old code, no idea if it works still:

    #include "unzip.h"
    ...
    
    	std::string archiveFile = ...
    	unzFile uf = unzOpen(m_pImpl->m_zipFile.c_str());
    	if (uf)
    	{
    		if (UNZ_OK == unzLocateFile(uf, archiveFile.c_str(), 0))
    		{
    			char filename_inzip[256];
    			unz_file_info64 file_info;
    			if (UNZ_OK == unzGetCurrentFileInfo64(uf, &file_info, filename_inzip, sizeof(filename_inzip), nullptr, 0, nullptr, 0))
    			{
    				uInt size_buf = BUFFER_SIZE;
    				char* buf = new char[size_buf];
    				if (UNZ_OK == unzOpenCurrentFile(uf))
    				{
    					rc = true;
    					int err;
    					do
    					{
    						err = unzReadCurrentFile(uf, buf, size_buf);
    						if (err < 0)
    							break;
    						else if (err > 0)
    						{
    							outData.write(buf, err);
    						}
    					} while (err > 0);
    				}
    				delete [] buf;
    			}
    		}
    		unzClose(uf);
    	}
    #endif
    #endif
    

    I currently use wxWidgets and that has support builtin via wxFileSystem and wxFSFile.



  • @dcon That's basically the same as rolling out my own directory traversal and compress each file one by one (with the added bonus, I guess, of getting a "proper" zip file rather than a custom half-backed thing, which is a big advantage!). So that's more or less what I called reinventing the wheel as I have to handle all the fishy file system stuff (in practice it doesn't matter for now, except of course until the day it will...).

    Still, thanks for the code, I'll have a look and see if I can do something from it. If I have to roll out my own, it's better if I can get a valid zip archive that I can use anywhere, and I've never used zlib directly before so your code will help me get there.

    Btw your code is for extracting files, since I'm lazy do you have the same for zipping as well?

    (that'll for next week, it's the end of the day for me...)


  • Considered Harmful

    Perhaps you could execute tar as windowless process from your C++ code and wait for the exit code?

    And if you really want ZIP from now on, when on Windows, I believe you can execute windowless Powershell the same way, just pass the command and its parameters in args.

    Now, waiting for a process still involves a fair helping (4-5 calls) of WinAPI, but it's simple. You set up two structs, call CreateProcess(), WaitForSingleObject() on it, then GetExitCodeProcess() on it and finally CloseHandle().


  • BINNED

    @remi said in Not an exhibitionist thread ((un)zip and windows):

    That still feels somewhat clunky to me as I have to do stuff with COM that I have no idea what it is/does/can fail etc. but maybe there is no way to avoid that...

    I had to use COM once about ten years ago (integrating a plugin into an Autodesk product). I’ve read a lot of stuff about it but (thankfully) already forgot most of it, but there’s like a crapload of things to be aware of, threading apartments, in-process / out of process servers, etc.



  • @remi said in Not an exhibitionist thread ((un)zip and windows):

    Btw your code is for extracting files, since I'm lazy do you have the same for zipping as well?

    No, I haven't needed that yet - it's still just a // TODO placeholder.


  • ♿ (Parody)

    @topspin said in Not an exhibitionist thread ((un)zip and windows):

    @remi said in Not an exhibitionist thread ((un)zip and windows):

    That still feels somewhat clunky to me as I have to do stuff with COM that I have no idea what it is/does/can fail etc. but maybe there is no way to avoid that...

    I had to use COM once about ten years ago (integrating a plugin into an Autodesk product). I’ve read a lot of stuff about it but (thankfully) already forgot most of it, but there’s like a crapload of things to be aware of, threading apartments, in-process / out of process servers, etc.

    Most of that stuff probably isn't necessary for something simple like he's looking to do. Still, it is a PITA to use from C++. It basically has a bunch of built in reflection type stuff that you need to use, but which VB (and VBA) did for you. Plus reference counting.


  • Trolleybus Mechanic

    I thought 7zip had some DLLs available that can work with archives.



  • @mikehurley said in Not an exhibitionist thread ((un)zip and windows):

    I thought 7zip had some DLLs available that can work with archives.

    In the original post:

    can see that there are tons of 3rd-party libraries to do that, of course, but I'd rather avoid adding one more dependency to our stack (especially since it means paper work to get it approved and :kneeling_warthog:).


  • Trolleybus Mechanic

    @dcon said in Not an exhibitionist thread ((un)zip and windows):

    @mikehurley said in Not an exhibitionist thread ((un)zip and windows):

    I thought 7zip had some DLLs available that can work with archives.

    In the original post:

    can see that there are tons of 3rd-party libraries to do that, of course, but I'd rather avoid adding one more dependency to our stack (especially since it means paper work to get it approved and :kneeling_warthog:).

    Fair enough. However zip/tar/gzip type things seem like obvious candidates for 3rd party libraries.


  • Notification Spam Recipient

    @remi said in Not an exhibitionist thread ((un)zip and windows):

    that basically creates an empty file, writes a magic header into it, and then simply copy the source files into that file (opening it as a directory first)

    Probably using Explorer shell objects. Yeah that's possibly going to work but just as Kludgy as calling out to shell scripts regardless.

    @remi said in Not an exhibitionist thread ((un)zip and windows):

    a simplish Windows C++ equivalent.

    Unlikely. You're probably going to need to include an external library for this.



  • If you're going down the third party route, there's miniz. MIT license, single source+header, so it's somewhat easy to integrate in most projects. Not exhaustively documented, but the examples were sufficient to get me to the point where I could do what I needed.



  • @boomzilla said in Not an exhibitionist thread ((un)zip and windows):

    @topspin said in Not an exhibitionist thread ((un)zip and windows):

    @remi said in Not an exhibitionist thread ((un)zip and windows):

    That still feels somewhat clunky to me as I have to do stuff with COM that I have no idea what it is/does/can fail etc. but maybe there is no way to avoid that...

    I had to use COM once about ten years ago (integrating a plugin into an Autodesk product). I’ve read a lot of stuff about it but (thankfully) already forgot most of it, but there’s like a crapload of things to be aware of, threading apartments, in-process / out of process servers, etc.

    Most of that stuff probably isn't necessary for something simple like he's looking to do. Still, it is a PITA to use from C++. It basically has a bunch of built in reflection type stuff that you need to use, but which VB (and VBA) did for you. Plus reference counting.

    Yeah, as a program using COM objects from other programs/libraries on the same machine, you don't have to worry about too much. Do everything dealing with the COM objects in one thread, make sure you Release() object pointers when you're done with them, free BSTRs and SAFEARRAYs if you're their owner, and likely some other stuff I'm not remembering because it's been so long.

    I used the ATL when working with COM back in the day. It has wrapper classes that help you create data items you want to pass around and deal with owned objects that go out of scope.

    @remi, while this is an option I think you're better off finding a C++ library that does everything for you.



  • @Parody said in Not an exhibitionist thread ((un)zip and windows):

    while this is an option I think you're better off finding a C++ library that does everything for you.

    OK, thanks everyone for the comments on that approach (the COM/shell stuff). I wasn't comfortable with it anyway, but it really sounds like a bad idea.

    I'm currently looking at @dcon's code for directly using zlib (thanks for it!). It's already in our stack so I can use it easily and as I said I'd rather not have to go through the effort of getting yet another 3rd party library added to it just for that. So if zlib doesn't quite do it for me I'll do what @Applied-Mediocrity suggests and call tar directly from the code. Not quite as good as not calling any external process, but still better than a shell script, and I forgot that nowadays we can use tar on Windows without any special install. Actually I might go with that option directly as it's much less work for me (:kneeling_warthog:) and probably "good enough" for the code I'm dealing with.



  • Current conclusion:

    @dcon's code got me on the right track, I can easily make something that works from there, so thanks again.

    Unfortunately until I tried it I hadn't noticed that this relies on minizip, which is 3rd-party code bundled with zlib, but not part of zlib itself, and as a result it's not in our current stack. So using it would require adding that to our shared libraries etc. Not an impossible task (I've done it many times), and perhaps easier than a truly new 3rd-party lib since it's coming from the same source package as a lib that's already there, but still a bit of work. Also using it means I still need to handle creating the unzipped files etc. which again is not very hard but a bit more work.

    So I think I'll go for a call to tar (through a QProcess), it's good enough for how this code is used (plus, remember that before I took it over it was using bash scripts all over the place, and still is in some places that I haven't ported yet, so it's not like I'm making things worse! Just... a bit less better than I could have otherwise, but that's all).



  • @remi said in Not an exhibitionist thread ((un)zip and windows):

    which is 3rd-party code bundled with zlib, but not part of zlib itself, and as a result it's not in our current stack

    I didn't even think of that possibility...



  • @dcon I also didn't... when I googled the name of the functions in your snippet and saw it was in zlib I thought I would have it, and then was confused for a minute when I didn't find the includes even though zlib.h was there...

    Never mind, I learnt how to unzip by hand, which is always nice, even if I'm going for a cruder approach in the end.


  • Discourse touched me in a no-no place

    @remi said in Not an exhibitionist thread ((un)zip and windows):

    Now the thing is that one of those scripts calls gzip (actually tar + gzip) and I'm not sure how to replace that by a simplish Windows C++ equivalent.

    tar is messy; you don't really want to implement that yourself. Call out to a program or use someone else's library. The one really good thing (among the myriad bad things) is that it is a format that is totally designed to be streamed; you really don't have to put the uncompressed data on disk or hold it all in memory.

    gzip is trivially easy with zlib (which you say you already have) except that the zlib API is its own type of weird. The -z option to tar just passes the tar archive through gzip/gunzip as a stream so it is easy to replace.



  • @dkf said in Not an exhibitionist thread ((un)zip and windows):

    tar is messy; you don't really want to implement that yourself.

    I don't really care about all the complications that tar can deal with, but nonetheless that was the part that truly bothered me from the start.

    Properly compressing a stream of data is Hard™ and you definitely don't want to reinvent the wheel here, but in practice it's actually easy since because it's hard (!) there are tons of functions to abstract it (e.g. qCompress() in Qt). File system handling, on the other hand, is so easy (!) that writers of algorithms don't care about it ("just compress each file with our handy compression functions..." 👋) and you have to write it by hand, making it hard (=lots of code) in practice.

    If that makes sense. It does in my head.

    Call out to a program

    That's what I ended up doing. The bit that I initially didn't think about is that nowadays tar can be assumed to exist in Windows, which definitely wasn't the case some years ago, so I didn't immediately go for it. Also if I could have done it with the libs I already have, it would have been nicer but really that's nitpicking. A call to a command is fine.

    The -z option to tar just passes the tar archive through gzip/gunzip as a stream so it is easy to replace.

    Yeah, I wasn't clear on that initially, tar -z is what the (old) script was doing, and what the (new) call-to-a-command does. Once I've decided to go through a call to tar, any way to compress other than passing it -z (or -j, I don't care) would be stupid.


  • Java Dev

    @dkf said in Not an exhibitionist thread ((un)zip and windows):

    @remi said in Not an exhibitionist thread ((un)zip and windows):

    Now the thing is that one of those scripts calls gzip (actually tar + gzip) and I'm not sure how to replace that by a simplish Windows C++ equivalent.

    tar is messy; you don't really want to implement that yourself. Call out to a program or use someone else's library. The one really good thing (among the myriad bad things) is that it is a format that is totally designed to be streamed; you really don't have to put the uncompressed data on disk or hold it all in memory.

    Yeah, back when that was first relevant for me I took one look at /usr/include/tar.h and noped out.

    gzip is trivially easy with zlib (which you say you already have) except that the zlib API is its own type of weird. The -z option to tar just passes the tar archive through gzip/gunzip as a stream so it is easy to replace.

    gzip using zlib isn't that bad IMO. It just means using a different set of IO functions. We used to have a bunch of code which read gzip files using popen( "zcat ...", r ); (decompressing in a different thread must be faster!) with zlib calls because gzip is cheap and pipes are expensive.
    A big plus of zlib is that you can use it to open a file which might be gzipped but might also be plain and it will work with it either way, saving you from writing your own abstraction on top of it. And on the command line as well so much stuff Just Worksâ„¢ with gzipped files.



  • @remi said in Not an exhibitionist thread ((un)zip and windows):

    Since at least a few versions, Windows can natively handle ZIP archives (without buying WinZip), so there should be a C++ API for that, right? Right?

    I'm late, but no, and this is by design (for licensing reasons):

    (the "Bonus chatter" part).



  • @Zerosquare TIL, thanks.

    I'm a bit surprised that across the years MS didn't find any resources to fix that, but as Chen says, "the minus 100 points deficit" is a very apt image for how priorities go, so maybe not so surprising in the end.

    The Powershell at the end is interesting. If I ever find that I truly need to use zip (rather than tar.gz), I can always replace my invocation of tar by that one, so thanks also.


Log in to reply