Young people probably would not know that watching VHS tapes anything could happen because they got reused.
So you'd be watching a movie and halfway through it would suddenly switch over to a shuttle launch or a music video or a documentary or something cause someone decided to record something else at that point.
I once recorded a three hour British detective show. I watched it for three hours and it got to the final scene to reveal whodunnit and the tape ran out.
It's great that they are archiving the old content but I don't miss VHS in the least.
> I watched it for three hours and it got to the final scene to reveal whodunnit and the tape ran out.
I ran into one of these on Youtube just a couple of weeks ago. The reaction in the comments was something less than amused.
(Bittersweet memories of taping over some of Mom's seemingly non-important video in order to record...what was it, the first X-files episode? A _Wings_ episode about a favorite jet? Something like that. But man, she was not happy.)
You’re the first person in any forum online I’ve seen reference Wings. My dads friend used to call my dad to tell him it was on and he’d switch from whatever ninja turtles / golden girls / simpsons episode we were watching. We despised wings! But actually it’s a great show.
Funny. I hadn't searched for it online much but I can understand its lack of traction. It was like a lot of commercial promotional clips spliced together, in some ways. It left a lot of questions unanswered, too! But it was great for an overview and the theme music will be forever engraved in my memory, right next to the _Strike Commander_ theme. :-)
Why I miss that aspect of VHS: In '95 I took a VHS tape that was in a pile in a shared apartment with no label and recorded my friend joking about her bakery job and then she popped the tape in and we watched it and every time I paused the tape a few seconds of the porn underneath it showed through. It ended up launching my video art career and working with Colin Campbell, who invented video art working with FARC (Colombia) tapes artefacts recorded for secret communication.
#!/usr/bin/python
from random import choice
c=["Miss Scarlet","Mr. Green","Colonel Mustard","Professor Plum","Mrs. Peacock","Mrs. White"]
w=["candlestick","dagger","lead pipe","revolver","rope","wrench"]
r=["kitchen","ballroom","conservatory","dining room","cellar","billiard room","library","lounge","hall","study"]
print("%s did it with the %s in the %s."%(choice(c),choice(w),choice(r)))
There's a great doc that came out last year on Marion Stokes, who recorded decades of television. Her footage is being added to The Internet Archive https://blog.archive.org/tag/marion-stokes/
How does one archive data for long term storage in 2020? From what little I've read, all of the media accessable to the layman has an archive lifetime of less than 30 years before physical degradation- NAND, tape, disc, whatever. That makes for a brittle civilization when the vast majority of our knowledge is stored on media and would be unrecoverable just 3 decades after a global calamity.
My files have gone from magtape to 8" floppy to various 5.25" floppy to 3.5 floppy to zip drives to cdroms to dvdroms, then to hard disks of ever-increasing size.
(My old hard drives are completely unreadable now.)
I'm sorry I never kept my punch card decks. I'm sure there was nothing but crap on them, but it would be fun to see what kind of crap it was.
Very little has survived on paper considering how many people have lived in the past.
It's just life. A lot of what may seem important with our backups really isn't. Whatever we have produced today will return to the earth, and will be reinvented by generations in the future.
Well, sure. But I have ancestors who go back in the US before the American Revolution, but all I know about them is a name and a date or two. I'm curious about more.
You also don't really know what future people might find interesting. Archaeologists like to sift through ancient trash dumps :-)
Wow, I had no idea that Empire went back that far. My friends and I had lots of fun staying up all night playing the PC version in high school. Glad to see many before us got to enjoy your work too.
The Fortran I used for Empire was FORTRAN-10, a DEC variant with many extensions to the then-Standard FORTRAN IV.
There was an earlier Empire I wrote in BASIC, but alas it was lost. Some of its vestiges remain in the Fortran version, such as a variable named Z6. (Variables in that BASIC were one letter and one digit.)
Storing the holes isn’t an issue. They compact infinitely, and don’t degrade under any conditions. It’s reapplying them to some card stock that’s time consuming and error prone.
I remember paper tape being dodgy when both punching and reading the holes, meaning that once a certain number of bytes are processed, random bit errors are a certainty.
If you want data to survive you've always needed to copy it. Digital storage just makes that easier to do in bulk. Copying Bibles was a full time task for huge teams of monks but you can (and should) make backups routinely on a daily or weekly basis with barely a thought.
The big issue isn't the technology, it's the vast amounts of data that are being created at this point. Storage is cheap, but the labor that goes into managing the longevity of datasets isn't: it's essentially continually keeping your infrastructure up-to-date whilst also ensuring the integrity and readability of the datasets as was intended when they were first created. It implies regular checks of bit integrity, readability of your data, checking that you can restore your data, ensuring that you can access the data, making sure that you can find the data and everything is catalogued, ensuring that you have the rights and license to use the data,...
When it comes to physical archives of the past, you have to be aware of your own survivorship bias. We only have an idea of what is preserved to the extent that documents are archived, recorded and thus discoverable.
What we do not know is how much knowledge and information was lost to the past. When you look at documents, you're always limited to what's there. And when you hit the boundaries of what's there, then you may have indications that there was far more in the past, but you have to conclude: sadly that's lost. Either because it is physically lost, or because it might be somewhere in the archive but it's not registered yet in a catalogue and therefor not accessible.
That's why I think that making backups with "barely a thought" is only as effective as to the extent to which you have organized your data, used accessible / readable data formats and filesystems.
For instance, most people these days generate endless streams of photos with their digital devices, which then get automagically uploaded to cloud services. And that's great. The downside of that is that your ability to find a specific picture from 5 years ago is entirely restricted to the extent that you were able to organize and add specific metadata to that picture. Let alone, if you did take the opportunity to do so.
That's why I advise people to sit down, and take time to go through their digital albums to pick the nicest or most important pictures they have, print them out on quality photo paper in several copies and store them with labels in albums at different physical locations.
When it comes to longevity, your physical albums will still be accessible to your descendants some 70 or 100 years down the line. Something that isn't remotely guaranteed by cloud solutions.
And that's just photos. Consider e-mail or the countless of closed messaging apps you have been using these past years. And then scale the problem beyond the personal but to entirety of large organizations, many of which are required by law to keep an archive of their documents, correspondence and so on, not just for decades but sometimes also for perpetuity.
> The downside of that is that your ability to find a specific picture from 5 years ago is entirely restricted to the extent that you were able to organize and add specific metadata to that picture.
I disagree with the premise that we should spend time manually organizing and tagging our pictures all that much.
The metadata that the phone adds to pictures – time stamp and GPS coordinates – is already sufficient in a lot of cases for finding pictures that I look for.
And where that metadata is insufficient, improved search powered by machine learning will come to the rescue. And not just tomorrow but even today.
Just the other day, a few weeks back, I was standing in the kitchen that I share with two other people and I wondered to myself whether the kitchen knife in the dishwasher was mine (I’d bought a new one a few days prior but couldn’t remember what it looked like). I take a lot of picture of random stuff and mundane things, most of which I never bother to organize or tag or anything. I pull up my phone, search my photo library for “knife” and lo and behold, I did take a picture of it when I bought it and my phone has recognized the object in the photo to be a knife so it was able to find it for me.
Important files and photos I do organize. Specifically for three reasons:
1. Ease of access.
2. Grouping related data together.
3. Tying photos and other data to abstract concepts like ideas for possible games or products.
So I am not advocating no organization or tagging at all.
But I think a lot of people are unaware or at least haven’t really incorporated the distinction between information that is already present in the data, and information that must be manually added. So they spend a lot of time manually creating folder structures that encode information which could already be automatically derived from the data itself.
As for messages in closed apps, I just screenshot them. And I am relying on OCR technology to be or become good enough to refind those messages in the future. That way, if the platform itself is gone by then or the messages are not on the platform itself or hard to find on the platform itself for whatever reason.
So far I haven’t even needed to use OCR. Because if I look for a message I often have other memories of where I was, when it was or something else that happened around that time. So I just jump back in time in my photo stream and either find the screenshot right away or I find pictures near-by in time and spend a tiny amount of time looking forwards and/or backwards in time and I find the screenshot.
I do wish though, that iOS would automatically tag screenshots with the name of the app that the screenshot was taken in. And I think it would be cool if the screenshots were stored as SVG with pure text and vector shapes plus embedded bitmaps, so that the whole potentially needing robust OCR in the future thing could be side-stepped.
I would like to agree with you. However, your vision hinges on this massive invisible infrastructure which is the cloud.
Your phone didn't recognize the object, you relied on a cloud service to do that for you.
When you use such services for free, you'll end up with all kinds of legal compromises that don't necessarily benefit you as an individual in the long run. Your personal convenience is subservient to other goals that don't necessarily align with public interests at large.
You could argue that the infrastructure will keep miniaturizing and one day you might not need those services. But that's not how things are currently evolving. Moreover, it will always take massive amounts of data to re-create the same models that are able to recognize patterns that are relevant to your specific context when you have a query.
At the end of the day, it's about what trade offs you are willing to accept. Cloud services based on machine learning do give you a good amount of convenience, but then you have to be willing to accept the hidden costs as well.
> Photos is enabled by powerful machine learning to deliver unique features
like Memories, Search Suggestions, and For You. Photos analyzes every
photo in a user’s photo library using on-device machine learning that delivers
a personalized experience for each user. And this analysis is designed from the
ground up with privacy in mind, with all of the processing done on device—and
the results of this analysis are not shared with anyone, not even Apple.
> Photos uses on-device processing to analyze each photo and video in a number
of ways, including:
> • Scene classification
> Identifies objects, like an airplane or a bike, and scenes, like a cityscape or
a zoo, that visually appear in a photo, using a multilabel network with over a
thousand classes.
> [...]
The training itself as you point out, happens not on the local device but on the servers that Apple own. So that part you are right about, but that is to be expected. Otherwise, manual tagging by each individual user (as well as significantly more processing power) would be required after all in order to train the models in the first place.
Yes, phones are now powerful enough that if the resulting ML core can be crushed down small enough you can just send that to the phone. My Pixel is set to passively identify every song it hears and display it on the lock screen. No Clown network service nonsense involved, Google built it for other reasons and went "Oh, this would fit on a phone. Cool, might as well".
Digital storage depends on a long and complicated chain of formats, standards, technologies, businesses, software, services etc which come and go every decade or two, or even more often. Any of it is lost, and your archive isn't an archive anymore. So to store something long-term, you have to eliminate single points of failure, such as encodings, formats, and even human languages. It can be completely non-obvious, and the physical media isn't the most important one.
Naturally, "civilizational scale archival" is only feasible for a proper archival organization such as a museum, a library or an archive. As a person, you can't have this. You can use the archival-grade media like M-Disc, but don't expect to put something on it and recover it 50 years later easily. You have to design the process to validate and migrate the data every once in a while. Digital storage can't offer something comparable to a simple printed photo.
> when the vast majority of our knowledge ... would be unrecoverable just 3 decades after a global calamity.
The vast majority of our knowledge is encoded in the societal and economic context. There's simply no way to translate it to any media, and any disruption would be the end of it.
How many times can you copy digital media before it degrades beyond error correction? I don't know the error rate per copied bit.
I think this is an excellent place for neural networks. They can preserve vast amounts of data compactly for many data types because they statistically compress high level abstract data which can then be used to fill in regions with high error rate, although if you did that at a large scale you'd probably end up with some constant error rate fluctuating around the average true value.
All indications point to the fact that we seem to be working against the unstoppable Force of entropy - indefinite error free preservation of data is ultimately impossible.
> How many times can you copy digital media before it degrades beyond error correction?
The idea is to act as a repeater, or like dynamically refreshed RAM, detecting and amplifying the signal at each copy before it degrades too far. You still have the error rate of reading and writing each time, but it's often cheaper to get the same error rate by refreshing cheap media than by using a more permanent medium.
Interleaved error correction (such as on a CD) kind of works as you describe, spreading out the error from a single physical point across the logical data so that it can be corrected by intact data elsewhere. This works because it's designed to recover from burst errors, corresponding to a localized scratch across the track.
There are different schools of thought. And it depends on what the scope of the discussion is. If you’re talking about preserving PDF/A or PNG files, you’re probably talking about media preservation. If you’re worried that your important document written in some version of PFS:Professional Write (an obscure word processor from the 80s) being readable, conversion and access become a problem.
I recall working with a group of archivists where one camp was in favor of conversion to standard formats like PDF for certain documents, another wanted to preserve documents in the original format, and still another wanted to do both.
It’s a hard problem and purity of thought makes it worse.
Back in the 2007-2008 timeframe when I added the streaming install/patch/play functionality to World of Warcraft, we also made it operate as an automatic repair system - if a disk block was read that had a hash mismatch, it just downloaded the correct block from the server (and wrote the fixed block to disk, of course). I looked at a month's worth of download logs, and estimated that 1 bit per GB per year flipped on hard disk data for our customers (we had already implemented re-reads before this point, because we had the hashes in place before we did streaming download, and so were pretty sure it was bad data on disk). Even at that fairly low rate, you can imagine that this reduced daily mysterious crashes by a nice amount. And keep in mind that hard disks already have error correction, so bits on magnetic media aren't as stable as you want. And, of course, this was not data at rest, there's lots of Wow play time going on, so lots of reads. I've been told that tape is more durable than spinning hard disks; for the sake of all those enterprise backups, I hope so.
As a side effect, this made me increase how much redundancy I used for data I really cared about.
Can't be zero. At the very least if bits can be randomly flipped during storage from wayward radiation, they can be flipped on copy. Error correction only works if the number of flipped bits are below a threshold. But I don't know the order of magnitude of the error rate, I assume it's quite small given that computers work.
Considering we can read parchments thousands of years old, and stone tablets going back even further, perhaps we should focus efforts on copying what we can to the time-tested preserved data formats.
Not THAT difficult - assuming you have enough stone tablets. I mean the program code exists. The problem I think you'd have (and true of most of these digital artifacts) is all the other hardware and software you would have to similarly encode to be able to recreate the machine and OS capable of running it...
Not to mention a large amount of the artifacts involved to play those games aren’t necessarily documented or accessible. The Nintendo 64’s CIC chip was only reverse engineered quite recently and required a serious reverse-engineering effort. [1]
After going down this rabbit hole, I concluded that M-Disc is the right trade off at the moment. They’re not too expensive, the writers are available, they’ll last, and they could be reverse engineered if discovered in the future.
The second choice is using hard drives (easily available) and every so often power them up and copy data to new drives.
If you have a small quantity of data, then encode and laser print onto paper, with a font designed for optical scanning or QR code’s.
> They’re not too expensive, the writers are available, they’ll last, and they could be reverse engineered if discovered in the future.
IIRC, the writers for M-DISCs are special, but the reader can be any DVD or Blu-ray drive.
Honestly, I think the biggest consideration for digital archival media isn't so much the longevity of the media, but the future availability of equipment to read it.
That's one of the biggest benefits of paper, IMHO. Besides being very well-understood material, nearly everyone is born with the necessary reading equipment and the decoding software is very common.
I also came across M-Disc but iirc there is no real verification of the longevity claims outside of marketing from the manufacturer.
>The second choice is using hard drives (easily available) and every so often power them up and copy data to new drives.
I suppose the question is whether doing so could remain under the error correction threshold indefinitely, since there will be errors accumulating both during copying and over time in cold storage. If manufacture of new drives stops, it also isn't clear to me if only the data stored on them has a 30 year life or if the medium itself decays regardless of whether it is in use or not.
In theory I imaging keeping an unused NAND or even magnetic drive in cool dry storage should preserve it's physical integrity indefinitely...
It may not be “accessible to the layman” (yet?), but this seems rather intriguing: “The memory crystal is capable of storing up to 360 terabytes of data for billions of years.”
Audio CDs can survive a bit of degradation to the odd data bits here or there; the music will just skip the millisecond of missing data and your ears won’t notice. Likewise with vinyl. Data discs on the other hand have the issue that if a single data bit is lost, a whole file could be corrupted, especially if it’s a zipped file.
In addition, printed audio CDs are of a different build than CD-Rs which have been found to not be as resistant to moisture and light.
It takes more than a single bit--there are Huffman codes built into the spec, but it is certainly possible for the bits to degrade enough to render a sector or even the whole disc unreadable.
If you're serious about using optical media as archival storage, you can mitigate this by incorporating your own error correcting codes into the data storage format.
CDROM bit rot is mostly the paint flaking off, but the real data is stored in the plastic. DVD should be even harder to bitrot because data is entirely inside the plastic. Burned disks are another story.
It is primarily humid air diffusing through the platter and corroding the aluminum coating. The "bits" are sandwiched between two layers of plastic (one thin). The paint on the surface is irrelevant.
> I have 30 year old CDs that play as good as the first time, as well as older vinyl discs.
Mass produced CDs are produced using a different process than the one consumer CDR writers use, and they are thus a much more stable storage medium. The data-containing layer is literally formed out of metal using a kind of mold: https://en.wikipedia.org/wiki/Compact_Disc_manufacturing#Ele...
You can extend the lives of writable and rewritable optical media by proper storage. The key is to avoid light and heat as much as possible, to keep the dyes stable. I'm not sure about 30 years, but you can probably get 10 out of properly stored optical media. See the following advice:
>A disc should always be handled by grasping its outer edges, center hole or center hub clamping area. Avoid flexing the disc, exposing it to direct sunlight, excessive heat and/or humidity, handle it only when being used and do not eat, drink and smoke near it. Discs should be stored in jewel cases rather than sleeves as cases do not contact the discs’ surfaces and generally provide better protection again scratches, dust, light and rapid humidity changes. Once placed in their cases discs can be further protected by keeping them in a closed box, drawer or cabinet. For long-term storage and archival situations it is advisable to follow manufacturer instructions. For further information consult the international standards for preserving optical media (ISO 18925:2002, Imaging materials — optical disc media — storage practices). [0]
> You can extend the lives of writable and rewritable optical media by proper storage. The key is to avoid light and heat as much as possible, to keep the dyes stable. I'm not sure about 30 years, but you can probably get 10 out of properly stored optical media.
Though, if you care about longevity, it might be better to use a technology like M-DISC. It uses a different recording technology to "[burn or etch] a permanent hole in the material, rather than changing the color of a dye."
Yep...not to mention all discs are ultimately made of plastic which is subject to environmental degradation - primarily oxidation from atmosphere and UV damage, both which can lead to yellowing/clouding of the transparent medium, and eventually brittleness and fracture.
quite. the writable discs don't last nearly as long. i had a pile of writable DVDs that were unreadable 8 years later when i went back to fetch the data from them.
Yeah, it's similar but Archive.org relies on servers to store information, whereas Arweave is a decentralised peer-to-peer network that offers permanent data storage. Arweave are in talks with them and are working with the IA to timestamp all of their collections.
Its flash memory, physically pushing electrons into a jar and hoping to count them later before all leak out.
Endurance tests of SSD drives showed unplugging a worn drive for a couple of days was enough to lose all data.
Not to mention SD cards usually get the lowest grade of flash memory.
I never understood why SD cards are so unreliable (or a least have this reputation) as they are usually used in cameras to store pictures. Something that people don't really joke with.
Oh, I agree, but it's still a nice blast from the past watching it this way. I was 10 when this came on TV and I still remember the thrill of watching it. I was five when it came out in theaters. I had an older brother who got to go see it there and he kept telling me how awesome it was but I had to wait five years to see it!
So this is more of a "oh yeah, that's what TV used to look like" moment than an actual "I want to watch the original Star Wars" again...
I have a friend at work who actually scored a pristine, never opened VHS copy and of course we ripped that sucker open and watched the day he got it. And we go back once in a while and watch it again, sometimes running it along side the new ones so we can spot and discuss the differences. Fun stuff on a Saturday night!
My step mom was recently going through a bunch of old vhs tapes looking for wedding footage of her mother’s wedding. 99% of the tapes are television from the early 2000s. She was going to just throw them away! So now I’ve got 30 tapes of early 2000s gold. I’m going to digitize them and upload them to YouTube.
I prefer RedLetterMedia's use of VHS tapes. They use them to play a terrible version of Jenga, where they watch all the random VHS tapes collected by the "winner".
Also for some reason, Macaulay Culkin seems to be hanging out with them a lot. Maybe Milwaukie is just that much fun?
The best part about RLM's Best of the Worst is that they all seem to hate doing it. There was one episode where they got terribly drunk and had to stop filming and come back a few days later.
The "removed a rib" thing was started by Gabriele D’Annunzio, father of Italian fascism. Most likely untrue, but a rumor also most likely to be spread by its subject.
What is the copyright situation with these tapes? I get the value of archiving and preserving digitally but if I'd invested time and money creating a vhs catalog of titles I'd be unhappy to see them suddenly all over youtube (assuming they had some appeal of course) with no ability to profit from them.
Good question. At least on the surface it seems only fair that at some point, after the archivist has performed a reasonably thorough search for copyright maintenance, the burden of justification then switches over to the (latest?) copyright claimant. "Yes, we have the internet now. Yes, if you intend to enforce your copyright, your claim should be searchable via the internet. Either way you may still have to reach out and publish notice of your claim." That kind of thing.
Kind of like Gutenberg.org texts, with their disclaimer that "we checked and couldn't find any renewal of copyright," or whatever it says nowadays...
It's a tricky one. If I flew around the world with expensive camera gear and edited up some kind of big budget travel video I'd be a lot more upset about seeing it free than if I had made a 'origami how to' video in 1989 that was very low budget...
Presumably they are all copyrighted, and presumably IA will take them down if it gets a DMCA notification. Perhaps they could then use a library workaround and make them available to one person at a time, or whatever.
I hope they’re capturing this at full resolution and preserving the full 60i signal. Too many times, people think VHS is 320x240 and capture accordingly, dropping half the temporal resolution from the get go. Even though there’s not a ton of horizontal evolution, a full D1 resolution capture will represent the tape much better than a half-resolution capture (but half-D1 would still be better than 320 or 352 by 240).
Young people probably would not know that watching VHS tapes anything could happen because they got reused.
So you'd be watching a movie and halfway through it would suddenly switch over to a shuttle launch or a music video or a documentary or something cause someone decided to record something else at that point.
I once recorded a three hour British detective show. I watched it for three hours and it got to the final scene to reveal whodunnit and the tape ran out.
It's great that they are archiving the old content but I don't miss VHS in the least.