byte-stream to key feasibility/feature request

I’m doing some benchmarking to help decide if umobj will be an appropriate tool for the CRoCCo lab to use for storing, archiving, and moving large CFD/Trubulence data sets. As they are currently being written, these data sets have many,many small and medium sized files. (Medium = 10-100 MB) Without delving into whether or not a more optimal storage scheme is appropriate, I was wondering:

Would it be feasible to create a utility that can read in a byte stream from stdin (the output of tar being sent to stdout) and then write it as a file/key to a UMIACS object store bucket?

My thinking is that tar is pretty good at reading a whole bunch of files and shoving them into a streaming archive. If this streaming archive can be sent and checksummed as it’s being created you could amortize some of the network/disk time associated with both reading the files from disk and sending them over the network, and you would no longer need to checksum a million different small files, just a few larger ones.

To fetch the files from the bucket, catobj might be able to be used and piped into tar… This work flow would be similar to a back-to-back tar with the intermediate archive living in the bucket. What do you think?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information