Hi! Recently I’m interested in digital archiving. I want to tidy up my own files and I’m also building my home server which will act (among many other purposes) as a storage for… everything, including archives - files I might never touch again but I also don’t want to lose.

I would appreciate some descriptions of how Lemmings are archiving their files. I mean mostly personal files, not bought media. In particular:

  • family photos,
  • home-related documents,
  • job-related documents,
  • school materials,
  • medical documents,
  • abandoned projects (software of other),
  • travel related stuff,
  • receipts, invoices,
  • and more!

Some example questions I’m interested in:

  • Do you ever delete anything or do you archive everything?
  • Do you use dedicated software or do you just store plain old files on disk?
  • Do you use archive formats? For instance ZIP, tar, etc.
  • Do you use compression? Like gzip, zstd, xz, etc.
  • What naming convention do you use?
  • Do you use spaces in the filenames?
  • What directory structure do you use?
  • FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    3
    ·
    19 days ago

    Answering your questions specifically:

    I mostly never delete anything. Storage space is pretty cheap these days. The exception is stuff that I’ve downloaded that’s large and likely to be easy to download again in the future, like popular TV shows or movies.

    I store them as plain old files in a plain old directory tree. I actually don’t like using zip files for this sort of thing because if one gets corrupted somehow that could destroy everything in it. Why take the risk for minimal benefit? Compression doesn’t gain much, as I said storage space is pretty cheap these days.

    No particular naming convention. I give the directories names that seem meaningful to me and I put them in a structure that seems meaningful. Some stuff is a bit more rigorously organized, for example I keep audio logs as a personal journal and those get automatically sorted into folders based on the date they were recorded. Same with photos. But the rest is just however seems right to me. Spaces are fine, it’s the 2020s, technology has advanced quite a bit since the olden days.

    The result is that there’s a large amount of data that I would have no idea how to find or sort through easily. But I actually anticipated AI to some degree so that never bothered me, and now I can be pretty confident that within a few years I’ll have an agent running locally that I can point at my archive and say “hey, what was the name of my neighbors ten years ago, again? I’ve forgotten.” And it’ll dig out everything relevant to that. I’m already almost there for my audio journal, I’ve transcribed it all and built a little search engine for it.