Hacker News new | past | comments | ask | show | jobs | submit login
Camlistore – open-source personal storage system for life (camlistore.org)
190 points by hswolff on June 3, 2014 | hide | past | favorite | 45 comments



I can't quite grasp what this is (as in, what the software is, is that a server, a client, or a combination of both), or how could I get started in using it.

>Your data should be alive in 80 years, especially if you are

Is there any special significane for 80 years? And exactly how is it safe? The "Download" sections have me installing the server on my box, and my hdd certainly won't survive 80 years.

Under storage, it says that

>Implementations are trivial and exist for local disk, Amazon S3, Google Storage, etc.

How does encryption work? I can't seem to find out at which step the encryption is done (client side, or server side?). The home page mentions it's "private by default", considering that I don't know that much about security and cryptography in general, how safe would I be in using this, for whatever purposes that it could be used for.

Mention under potential use cases: filesystems backups and Document management CMS. That's ... interesting.


I am not authoritative, but a glance at the documentation suggests encryption is not explicitly part of this spec. Since a blob may be anything, one instance of a blob can be a file encrypted by external means.

I believe Private by default just indicates that objects are not exposed until they are shared: like a file system is private by default and a Github repo is not.


Looks like there is an encrypt module that uses AES-128: http://camlistore.org/pkg/blobserver/encrypt/. The modules are composable so you could encrypt your remote replicas of the main store (or hopefully all of them?)


Brad Fitzpatrick talks about it as a "data dump truck". You put everything you have into it, and it takes it when you want it to go.

What's cool is that you can abstract where it's stored... want to use S3? Point your data there. Google Drive? Point your data there. Instead of the user interface and middleware of syncing and storing being directly tied to a storage backend (a la Dropbox, Google Drive), Camlistore allows you to separate it.

At least that's how I think of it... I might be wrong. :)


Just an FYI, its still pretty unfinished, so that's partially why the website/documentation isn't more clear about its purpose.

I imagine that description getting better when they start wanting more end users to use it.

Right now they have a bunch of pages which are html with 0 css styling, cause it ain't done yet. Despite these things, I find camlistore extremely cool, and look forward to it's growth.


If you're confused as to what this project is about or why it's important, I highly recommend watching the first ~10 minutes of the video posted on the main site. Brad Fitzgerald does a much better way of explaining the value proposition.

I personally think this is one of the most hugely important ideas/protocols to come out of the last decade. Even if Camlistore doesn't do it, it's hard to imagine software programmers of the future not agreeing on a shared protocol to fill this huge, huge need.

I actually wrote a proposal for a Mozilla grant recently outlining a couple of the reasons why decoupling storage from user interface is a fundamentally good thing for society[1].

[1]https://www.newschallenge.org/challenge/2014/submissions/sto...


I'm 18 minutes in and I have no idea what they're actually doing.

I get what they seem to think they're doing.

But "storing arbitrary" data is a problem everyone in the world is trying to solve. It's what a filesystem is. It's what backup tools do. It's a pretty well attacked problem.

A bunch of SHA1 hash addressed data is almost as useless as a raw disk with no filesystem.

Similarly, I'm seeing a lot of JSON. Who says JSON will be remembering in 80 years? It's ASCII! But that's not much of a guarantee.

There's a lot of claims here, which don't seem to translate into something which seems actually useful. For example, "immutable" sounds good till you run out of disk space because you have 1000 copies of a sightly different VM image in the system. It's pretty easy to build a system which stores "everything". It's a lot harder to build one which is useful.


You meant Brad Fitzpatrick :)



The fact that this exists is pretty hilarious!


Genius, although I feel sorry for all the other Fitz's out there that get left out..


wrongwrongfitz.com should still be available: wrongfitz as a service.


Original post from three years ago: https://news.ycombinator.com/item?id=2156374

Camlistore has always been a project that I'd like to try out someday but every time I've looked it's seemed more hypothetical than real. Kenneth Reitz (of python requests fame, among other things) put together a much smaller thing called Elephant[1] that I've been tempted to explore as well, and it's sort of in the same vein.

[1]: https://github.com/kennethreitz/elephant


From Elephant's Github readme:

>> Suddenly, your data becomes as durable as S3

Seems slightly less ambitious.


This is a subject of great interest to librarians and archivists--how to store various types of data so that both it and its associated indexes remain fully accessible and searchable, with a minimum of maintenance, across decades or centuries, even as formats, maintainers, and institutions rise and fall.

Anyone know if these guys are working with existing archival groups and standards? It would be a shame if they're reinventing the wheel.


> how to store various types of data so that both it and its associated indexes remain fully accessible and searchable, with a minimum of maintenance, across decades or centuries, even as formats, maintainers, and institutions rise and fall

Very nicely written description of something useful. IMO much better than the words on that website.

What I don't like about the website is the "jargon". These words are from just the first paragraph:

   formats
   protocols
   modeling
   synchronizing
   post-PC
   objects
   FUSE
Huh?

The website text was probably written by an engineer. Reminds me of a great quote:

   "engineers are all basically high-functioning
   autistics who have no idea how normal people
   do stuff" - Cory Doctorow


You're assuming the website is targeted at normal people. Considering the state of the software, I find that a suspicious assumption. Most likely, the jargon is used because the intended audience can understand it.


You may be right. Maybe its best to only have jargon until the software is ready for prime time.

My general counter argument to jargon filled websites is the great ending of Trading Places:

   What about lunch?
   The lobster or the cracked crab?
   What do you think?

   Can't we have both?
Why can't a website have both a clear explanation for normal people, followed by all the jargon necessary for the target audience?


What's wrong with those words?


Believe it or not, words like "objects" and "FUSE" mean nothing to 99% of the world's population. Actually they do have clear meanings, just not the meanings that computer science people assign to them.

IMO there's a sine-qua-non that most websites should have, and that unfortunately far too few do have. Well known companies like Apple and Google don't need it, but most others do. It's what people have called an elevator pitch. Here's how Wiki puts it [1]:

   An elevator pitch, elevator speech, or elevator
   statement is a short summary used to quickly and
   simply define a person, profession, product,
   service, organization or event and its value
   proposition.
That information should be at the top of a website. It's what tells people, immediately, what the product or site does and how it could be useful for them.

And, not coincidentally, the words I quoted from Wiki are the totality of the first paragraph for that Wiki entry. Simple, clear, easy to understand.

There's nothing wrong with having details on a company's website. But that jargon should not be the totality of the first paragraph on the site.

[1] http://en.wikipedia.org/wiki/Elevator_pitch


If you are learning / playing with Go(Lang) you should definitely take a look at this source.

As a tool it's a bit rough yet, but it's going to be awesome.

Disclaimer : I have a man crush with Brad Fitzpatrick. Well, mostly with his code.



Exactly. But keep it quiet because I'm going to meet Brad in a conference in a few months and I don't want to freak him out.


My understanding of this is that it's "A git style CMS for all your data"... so you can't nuke things as there's history of it, and you can put any data you want into it.

Where I struggle is that either the definition of "your data" is narrow, or I shouldn't be using it for all my data.

Back in 1999 when I first learned about MP3, I started ripping my CDs. I have several thousand CDs, this took a lot of time. Before I completed the task, at a rate of a few CDs each evening, FLACs came into my life and I started back at the beginning. I deleted the MP3s as I replaced them with FLACs.

I really don't ever need to keep some data. But maybe it's not the kind of data that I should be putting in Camlistore? I think of it as my data, after all these are my CDs.

I struggle with the concept of Camlistore as I have an 18TB NAS in RAID6, 12TB usable... and it's 80% full. If I had history I'd have a storage problem today.

I'm perhaps an outlier, I chose to self-host my data locally rather than rely on cloud based things. And I chose to keep everything... photos, documents, email, video, music. And everything I keep is in the highest possible quality: FLACs, DVD VOBs, raw photos, etc.

But then... who is Camlistore aimed at if not the people who like to store and have control over their own data?

I guess I just find delete too valuable a feature for the larger data I store.

And perhaps I'm just wrong on the use-case, maybe it's really "for all your data (that you cannot re-acquire)". I just don't want to ever rip those CDs again. But if I do, those old versions are dead to me.


It's not quite as much like git as you might think it is. It's git-style in that it stores data as blobs named by hash and tracks everything with pointers to those blobs, but isn't as committed to keeping everything forever. Git is designed to be able to reconstruct a set of data at any point in that data's history, so it makes sense to keep all previous data in its storage system.

However, even git will delete data if you delete the "tree" metadata, ie you nuke some branch that has no downstream dependencies because you never merged it or there are no branches off of it. In that case, if the blobs aren't reachable by any tree/graph, git can garbage collect those blobs.

Camlistore does the same thing: if you delete all pointers to the data, those blobs might eventually be reclaimed. As a matter of implementation, camlistore doesn't do that today, but it's not the case that camlistore can't or won't let you delete data.


I think you're an edge case. In fact, there is a very precise parallel with the fact that you store RAW/FLAC/VOB data, while the vast majority of people is perfectly contect of JPG/MP3/AVI.

In addition to that, I think that there is not such a thing as space limits, at least, generally (not always, of course).

Specifically, one would imagine that the general attitude is 'I have to store such and such - where do I find the space'?

Instead, I think that in general, it is 'Oh, I have such space for free/cheap price... let"s store stuff!'. Especially with the advent of the Terabytes order of magnitude, I guess most of the storage is simply composed of movies, even if they're never watched, either once or more times.

Again, pay much attention to the bias. As an amateur photographer, I'm tempted to think that "raw is the law", but in fact, for the vast majority of people, it simply isn't.


I just tried to get this running using release 0.7, but I am puzzled by the web interface. I have not managed to upload a file through it, and once I did upload a file through the commandline tool camput it did not seem to show up on the web interface. Seems like a cool concept, and I hope I'm doing something wrong since I would like this to work.


Spoiler: if you came here all excited about the name beginning with Caml, Camli stands for Content-Addressable Multi-Level Indexed. It isn't related to the Categorical Abstract Machine Language. It's written in Go, not OCaml. (Yes, I know long ago Zinc was substituted for the Categorical Abstract Machine down in the depths of OCaml.)


Is OCaml used anywhere?

I've only met one OCaml programmer in my life, and they were a graduate student and rather strange.


It's used in a lot of places, including Facebook for Hack and Bloomberg as well as the oft cited Jane Street.

OCaml users: http://ocaml.org/learn/companies.html

FB Hack: http://cufp.org/2013/julien-verlaguet-facebook-analyzing-php...

JaneStreet: https://blogs.janestreet.com/category/ocaml/


This sounds like what git-annex does today, except with a front-end. How is it different from this understanding?

git-annex stores things content addressed, gives me different views into the data (tags, etc), and supports different back-ends (S3, remote rsync, local filesystem, external disks, etc). Isn't this exactly what is described here?


http://nymote.org/ is running along similar lines and has some serious technical chops behind it (including some of the original Xen folks). I'm not so sure about their UX skills, but it's worth keeping an eye on.


Also reminds me of Tent [1]. Brendan Eich also mentioned something like this idea in the same post where he stepped down from being CEO of Mozilla [2]. It's good to see a lot of brains working on the concept because I think there's a huge amount of potential in putting data in the customer's warehouse. I think we can achieve a much greater level of composibility of consumer services when each doesn't have to worry about being its own data warehouse.

[1] http://tent.io

[2] https://brendaneich.com/2014/04/the-next-mission/


(I'm the site author) Is the UX comment something about the site or the tools?

If it's the site, that's on me and I have a bunch of work queued up to better represent the tools. Specific feedback would be welcome.

If it's about the tools, the first UI is the command line as we expect these to be components that developers can use. In terms of the initial applications we refer to, they would effectively be CardDAV and CalDAV servers so you'd hook in your existing apps to them.


To phrase it better - I don't have any idea whether the UX skills of the team match up with the technical skills. I wasn't criticising anything in particular, just noting a lack of knowledge on my part.


"Lifelong control of your networked personal data" is ironically offline for me. Here is the Google Cache:

http://webcache.googleusercontent.com/search?q=cache:P_L5IWO...


Seems to be working for me (I'm the author). It's hosted on GitHub Pages so perhaps they had a transient issue.


You can't overwrite your data and can't delete it either? I am puzzled.


Similar semantics to git... it's a super robust approach.


How is it different to git? Or I guess bup, more appropriately.

What is this doing that solves so many problems that apparently they can't outline what it is actually doing on the frontpage of the website in clear language?


To everyone who doesn't understand what this is all about, I suggest you read the presentations and watch the videos [0]. They're going deeper into what camlistore is and can do.

[0] https://news.ycombinator.com/item?id=7842629


As a non-english speaker, I always associated this with caml/ocaml and the ml languages - and I still do somehow even after I have visited the site and read about it, and what it is.

What does "camli" means?


It's an acronym: "Content-Addressable, Multi-Layer, Indexed" Storage. Nothing to do with ocaml.

Content Addressable: What things are named depends on their content. Two identical things have the same name. For example, the "name" or "key" for the data is the SHA-1 for the data, ala git.

Multi-Layer: The whole storage stack is built out of several layers. The blob store sits on the bottom, and only knows about bytes, and access is via the SHA-1 of those bytes. Things that you might store (Files, directories, sets, collections of tweets, social graphs, etc) build on top of the blob store by additional blobs that hold pointers to data blobs. Again, it's sort of like git. A front-end might sit on top of that abstraction.

Indexed: blobs of JSON that have a few special attributes are recognized and indexed. So, you might have a bunch of blobs with these special attributes (ie, "tag") and be able to ask the indexer "Give me all blobs with tag equal to foo", rather than having to search through the blobs directly.


This sounds very similar to diaspora. The whole project is about replacing social networkings idea of giving all of your info to a 3rd party, and only sharing some info with people you trust.


This is the perfect solution for the laymen constant losses of data due to Windows breakages that lead to complete formatting of disks.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: