Hacker News new | past | comments | ask | show | jobs | submit login

Great pains taken to preserve the full commit history... does anyone know how valuable it really is? (Not to imply that it is worthless, but how much effort went into preserving it vs how much effort would be spent if it were not available). More of a general VCS conversion question, I suppose.



In my experience with Django, the commit history has been incredibly critical. There's been a good number of bugs that required digging back to nearly the beginning of the public commit history (July 2005). There have even been a few bugs that we only really understood after prying into the old private repo (which was converted from CVS and goes back to 2003).


+1, you can tell how valuable it because I always end up crying when I'm trying to track the history of something and it turns out the change is from a branch merge that wasn't properly tracked, and thus you can't get the precise commit that introduced it.


Depends on the project, but for a long-running project with a large code base and many contributors (many of whom are no longer actively involved in the project), it can be very valuable. In the case of PostgreSQL, the commit history goes back to ~1995, and most changes are described with a detailed commit message, which can be very helpful when understanding why a certain piece of code behaves the way that it does.


But shouldn't the comments from the source code provide an explanation?


Commit messages are a better place than comments for explaining the rationale behind cross-cutting changes.


I admit that I've been in situations where I had to find out in what change a piece of code was committed, but on the other hand the code and the comments weren't too great. Also one of the purposes was to find out the author, so that I could pass the bug to him.

I'm curious in what situations would a commit message be more appropriate than a comment. A couple of examples would be great.

By the way, is there something like Perforce's Time-lapse View tool[1] for git?

[1] http://www.perforce.com/perforce/press/pr55.html


Example: "Gathered & cut this code from files A, B, and C, because Client X changed their mind about situation Y, and now we can drop those special cases." Anything involving moving / deleting a lot of code. Adding comments to explain deleted code in several places really doesn't make sense.

I don't know about the viewer, but I'm going to look into this soon. We've used p4 for years and are strongly considering switching to git, and if so, I will be doing a lot of helping/explaining during the transition. I really dislike p4 as a VCS (though it was probably the best choice at the time), but agree its diff-viewing tool is actually quite good.


I agree that you can't put that explanation into a comment and that a changelog would be fine, but the documentation of the project should also describe what's needed and what not. I guess this is one more reason for the built-in wiki from the Fossil SCM.


Git has a limited version of that built-in, 'git gui blame' (this is also integrated with the 'gitk' tool). However, that doesn't let you jump quickly to different places in history. If you need some thing that shows you the file 'blame' and quickly lets you easily jump around in history, I recommend 'qgit'.


Thanks for the qgit tip, but it's still no match for Perforce.


And knowing what files were touched for the various commits is a big advantage as well.


It's also worth mentioning that it's sometimes critical for open source projects to be able to identify all authors of a particular part of the code. There were several cases in the past when a project had to go through a painful process of tracking down all the contributors in order to ask for their permission to relicense the code. Without version control history this would have been much harder, if not impossible.

Besides that it just doesn't sound right to throw away potentially valuable historical information just to avoid a manageable amount of work. After all the conversion has to be done only once. No to mention that if you don't migrate the history, there is a significant probability that you'd regret not doing that when it's already too late.

There is also a precedent to the contrary. When Linux migrated to brand new Git all the historical information was discarded. Probably the reason was that the metadata was stored in a proprietary Bitkeeper repository and obviously there were no migration tools available at that time.


Well, since they wanted the ability to build any previous version of PostGreSQL from the version history, for them I'd say it was quite valuable. I would suspect this is the main driving issue for whether or not to preserve commit history when porting to a new VCS; for some developers they may almost never go back and rebuild old versions from scratch. For OSS developers though I can see that they would need to preserve "full disclosure" and if possible make every commit history available for others, if not the devs themselves, to analyze.


Yes, PostgreSQL supports major version releases for five years so preserving the full history is critically important. Developers regularly back-patched bugfixes as far back as 7.4 (released November 2003, and just EOL'd this month).


Perl kept their commit history back to 1987 when they migrated to Git: http://perl5.git.perl.org/perl.git/commit/8d063cd8450e59ea1c... Presumably it’s worth something even if it’s just historical interest.


Also, the alternative to a commit history in Git is not "no commit history at all", it's a CVS with a read-only filesystem.


Sometimes it's absolutely critical for tracking down bugs - going back through the previous versions to locate which exact commit introduced an error can be a huge time saver, and sometimes the only way to track an issue down in a reasonable period of time. Even if that isn't something that happens for your project very often, I tend to think of commit history as being much like insurance - it's better to have it and not need it then need it and not have it.


In another context, when making a point about code maturity, someone noted that people were still finding and fixing bugs in Postgres code that was over 10 years old. In those situations, I think preserving info about the provenance of every piece of code is extremely valuable. Particularly if you expect the project to live long enough to make another SCM system transition in the future.


Even after reading the comments herein surely just keeping a frozen CVS tree would have worked.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: