Progress Report: 2019-04-11 Spiral 0

I managed to convert an earlier project of mine (odie) into the foundation for new project (sobyk). Odie had the tcl based build system, and a first stab at the packages that kit would need. There were apparently a few bugs that only came to light when trying to spin up a new project from a clean build system. But they are all fixed (and the relevant changes pushed to Tcllib.)

In addition to the packages (sqlite, and rl_json in particular) that odie included, sobyk adds the tcltls extension in order to secure communications via SSL.

Now that I have a self-contained executable with all of the packages I need for the Mac (my current development platform), I'm going to start prototyping the package delivery system as well as the website the drives that package delivery system. I am also building in from the start the concept of a local mirror. Each sobyk executable will actually be able to:

host it's own complete copy of the sobyk package system and the website
clone the web content (packages and docs) from an external source
Pull incremental updates
Perform all of the above from disk media as well as over a live internet feed

No I have work out the packaging system. Some friends have been steering me to replicate the teapot system of ActiveState. It's nice. Has it's oddities. But by in large it trades in flat files largely generated by scripts.

But today, I was also asked to review how Python projects manage packages, and in particular Anaconda. I've done my fair share of maintenance on large collections of packages. So I cut to the chase and looked up how their website distributes data. It largely trades in flat files generated by scripts. The metadata is encoded in Json. No problem. But then I realized... all of the metadata was encoded in Json. One Json file. To do so much as know what the md5 sums are on a file, you have to load that large and growing json file. And env(LC_DIETY) help you if one record is misformed by a careless admin manually tweaking, or a corrupted download. The entire file is unreadable at that point.

Ok, not a design element I want to replicate. Especially if I want to offer incremental updates. And checksums really need to be available for random access. (I rather like what many sites do, and offer up the checksums right next to the file.)

Years ago, I worked on a project that never went anywhere called "sherpa". Sherpa was supposed to be a teacup replacement, that would build from source. (Nevermind that teacup could build from source.) In the process I discovered all of the problems that would later be solved in what is now the practcl module in Tcllib. But I also found that you need a relational database, and not flat files, to manage any decent packaging system.

The problem goes thus:

When you build on a Mac, you have your choice of Mac's native GUI or old fashion UNIX X11 Gui. Once that decision is made, your selection of packages changes. Some packages are Mac native GUI only. Some Tk extensions flat out refuse to build on Mac native GUI. And now there's a third way: building Tk inside of SDL. SDL packages more or less behave like the X11 packages, but you have to tell them about the fact that Tk was configured to run under SDL.

So here we are, one platform, 3 different graphics engines. But there's more! Apple has a tendency to alter the GUI between OS releases often enough to require the Tk maintenance team to have to re-code chunks. Sometimes the change is profound enough that it knocks Tk extensions out of orbit. Many times, the modifications to these extensions are not backward compatible with prior OSes. Plus you need to track a blanket prohibition on all extensions that have not been modified to the new standard from trying to compile or install on machines the aren't compatible with anymore.

Yes, one could build an elaborate naming convention to capture all of that in one string. But after a while it starts to read more like a geek block on the sig of someone's email than it does a decent explanation of what the platform is.

Another interesting wrinkle is that many extensions now have competing implementations. Some work better than others in certain environments. Some simply have that one patch that makes it work for that one version of the OS. Other times the project forks, and the new version may go so far as to change the name of the package!

It's not enough to track the packages. You need to track the provenance.

Add enough complexity and the need for a relational database is clear. So I'm going to start with a relational database, and try to simplify things later. Much better than going the opposite way and starting with a flat file and thinking you can pretend to be a database.

More on this database, and past efforts I've had to create a similar database, in my next blog.