Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you want data to survive you've always needed to copy it. Digital storage just makes that easier to do in bulk. Copying Bibles was a full time task for huge teams of monks but you can (and should) make backups routinely on a daily or weekly basis with barely a thought.


This is a huge problem for any data- https://www.theatlantic.com/technology/archive/2015/02/how-t...

Even communication of a simple warning message for yucca mountain and WIPP proved how hard it was going to be to communicate danger over 10000 years from now. https://en.wikipedia.org/wiki/Long-time_nuclear_waste_warnin...


The LOCKSS project is an approach used in university networks around the world to preserve research data for the long term.

https://www.lockss.org/

The big issue isn't the technology, it's the vast amounts of data that are being created at this point. Storage is cheap, but the labor that goes into managing the longevity of datasets isn't: it's essentially continually keeping your infrastructure up-to-date whilst also ensuring the integrity and readability of the datasets as was intended when they were first created. It implies regular checks of bit integrity, readability of your data, checking that you can restore your data, ensuring that you can access the data, making sure that you can find the data and everything is catalogued, ensuring that you have the rights and license to use the data,...

When it comes to physical archives of the past, you have to be aware of your own survivorship bias. We only have an idea of what is preserved to the extent that documents are archived, recorded and thus discoverable.

What we do not know is how much knowledge and information was lost to the past. When you look at documents, you're always limited to what's there. And when you hit the boundaries of what's there, then you may have indications that there was far more in the past, but you have to conclude: sadly that's lost. Either because it is physically lost, or because it might be somewhere in the archive but it's not registered yet in a catalogue and therefor not accessible.

That's why I think that making backups with "barely a thought" is only as effective as to the extent to which you have organized your data, used accessible / readable data formats and filesystems.

For instance, most people these days generate endless streams of photos with their digital devices, which then get automagically uploaded to cloud services. And that's great. The downside of that is that your ability to find a specific picture from 5 years ago is entirely restricted to the extent that you were able to organize and add specific metadata to that picture. Let alone, if you did take the opportunity to do so.

That's why I advise people to sit down, and take time to go through their digital albums to pick the nicest or most important pictures they have, print them out on quality photo paper in several copies and store them with labels in albums at different physical locations.

When it comes to longevity, your physical albums will still be accessible to your descendants some 70 or 100 years down the line. Something that isn't remotely guaranteed by cloud solutions.

And that's just photos. Consider e-mail or the countless of closed messaging apps you have been using these past years. And then scale the problem beyond the personal but to entirety of large organizations, many of which are required by law to keep an archive of their documents, correspondence and so on, not just for decades but sometimes also for perpetuity.


> The downside of that is that your ability to find a specific picture from 5 years ago is entirely restricted to the extent that you were able to organize and add specific metadata to that picture.

I disagree with the premise that we should spend time manually organizing and tagging our pictures all that much.

The metadata that the phone adds to pictures – time stamp and GPS coordinates – is already sufficient in a lot of cases for finding pictures that I look for.

And where that metadata is insufficient, improved search powered by machine learning will come to the rescue. And not just tomorrow but even today.

Just the other day, a few weeks back, I was standing in the kitchen that I share with two other people and I wondered to myself whether the kitchen knife in the dishwasher was mine (I’d bought a new one a few days prior but couldn’t remember what it looked like). I take a lot of picture of random stuff and mundane things, most of which I never bother to organize or tag or anything. I pull up my phone, search my photo library for “knife” and lo and behold, I did take a picture of it when I bought it and my phone has recognized the object in the photo to be a knife so it was able to find it for me.

Important files and photos I do organize. Specifically for three reasons:

1. Ease of access.

2. Grouping related data together.

3. Tying photos and other data to abstract concepts like ideas for possible games or products.

So I am not advocating no organization or tagging at all.

But I think a lot of people are unaware or at least haven’t really incorporated the distinction between information that is already present in the data, and information that must be manually added. So they spend a lot of time manually creating folder structures that encode information which could already be automatically derived from the data itself.

As for messages in closed apps, I just screenshot them. And I am relying on OCR technology to be or become good enough to refind those messages in the future. That way, if the platform itself is gone by then or the messages are not on the platform itself or hard to find on the platform itself for whatever reason.

So far I haven’t even needed to use OCR. Because if I look for a message I often have other memories of where I was, when it was or something else that happened around that time. So I just jump back in time in my photo stream and either find the screenshot right away or I find pictures near-by in time and spend a tiny amount of time looking forwards and/or backwards in time and I find the screenshot.

I do wish though, that iOS would automatically tag screenshots with the name of the app that the screenshot was taken in. And I think it would be cool if the screenshots were stored as SVG with pure text and vector shapes plus embedded bitmaps, so that the whole potentially needing robust OCR in the future thing could be side-stepped.


I would like to agree with you. However, your vision hinges on this massive invisible infrastructure which is the cloud.

Your phone didn't recognize the object, you relied on a cloud service to do that for you.

When you use such services for free, you'll end up with all kinds of legal compromises that don't necessarily benefit you as an individual in the long run. Your personal convenience is subservient to other goals that don't necessarily align with public interests at large.

You could argue that the infrastructure will keep miniaturizing and one day you might not need those services. But that's not how things are currently evolving. Moreover, it will always take massive amounts of data to re-create the same models that are able to recognize patterns that are relevant to your specific context when you have a query.

At the end of the day, it's about what trade offs you are willing to accept. Cloud services based on machine learning do give you a good amount of convenience, but then you have to be willing to accept the hidden costs as well.


> Your phone didn't recognize the object, you relied on a cloud service to do that for you.

In iOS 13 (which I am running), it is indeed my phone that does this.

https://www.apple.com/ios/photos/pdf/Photos_Tech_Brief_Sept_... (page 4) says:

> Photos is enabled by powerful machine learning to deliver unique features like Memories, Search Suggestions, and For You. Photos analyzes every photo in a user’s photo library using on-device machine learning that delivers a personalized experience for each user. And this analysis is designed from the ground up with privacy in mind, with all of the processing done on device—and the results of this analysis are not shared with anyone, not even Apple.

> Photos uses on-device processing to analyze each photo and video in a number of ways, including:

> • Scene classification

> Identifies objects, like an airplane or a bike, and scenes, like a cityscape or a zoo, that visually appear in a photo, using a multilabel network with over a thousand classes.

> [...]

The training itself as you point out, happens not on the local device but on the servers that Apple own. So that part you are right about, but that is to be expected. Otherwise, manual tagging by each individual user (as well as significantly more processing power) would be required after all in order to train the models in the first place.


Yes, phones are now powerful enough that if the resulting ML core can be crushed down small enough you can just send that to the phone. My Pixel is set to passively identify every song it hears and display it on the lock screen. No Clown network service nonsense involved, Google built it for other reasons and went "Oh, this would fit on a phone. Cool, might as well".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: