o1 mini seems to get it on the first try (I didn't vet the code, but I tested it and it works on both examples provided in the notebook, `dates` and `gabe_dates`):
from collections import defaultdict
def find_cheryls_birthday(possible_dates):
# Parse the dates into month and day
dates = [date.split() for date in possible_dates]
months = [month for month, day in dates]
days = [day for month, day in dates]
# Step 1: Albert knows the month and says he doesn't know the birthday
# and that Bernard doesn't know either. This implies the month has no unique days.
month_counts = defaultdict(int)
day_counts = defaultdict(int)
for month, day in dates:
month_counts[month] += 1
day_counts[day] += 1
# Months with all days appearing more than once
possible_months = [month for month in month_counts if all(day_counts[day] > 1 for m, day in dates if m == month)]
filtered_dates = [date for date in dates if date[0] in possible_months]
# Step 2: Bernard knows the day and now knows the birthday
# This means the day is unique in the filtered dates
filtered_days = defaultdict(int)
for month, day in filtered_dates:
filtered_days[day] += 1
possible_days = [day for day in filtered_days if filtered_days[day] == 1]
filtered_dates = [date for date in filtered_dates if date[1] in possible_days]
# Step 3: Albert now knows the birthday, so the month must be unique in remaining dates
possible_months = defaultdict(int)
for month, day in filtered_dates:
possible_months[month] += 1
final_dates = [date for date in filtered_dates if possible_months[date[0]] == 1]
# Convert back to original format
return ' '.join(final_dates[0]) if final_dates else "No unique solution found."
# Example usage:
possible_dates = [
"May 15", "May 16", "May 19",
"June 17", "June 18",
"July 14", "July 16",
"August 14", "August 15", "August 17"
]
birthday = find_cheryls_birthday(possible_dates)
print(f"Cheryl's Birthday is on {birthday}.")
In addition to that after they create the 1st program with mistakes the author should have showed them the invalid output and let them have a chance to fix it. For humans solving this on the first try without running the code also tends to frequently not work.
"seems to" isn't good enough, especially since it's entirely possible to generate code that doesn't give the right answer. 4o is able to write some bad code, run it, recognize that it's bad, and then fix it, if you tell it to.
You can look at who owns what in their portfolios, none of this is especially private information. They publish it all online. I literally just googled "who owns the most commercial real estate" and "commercial real estate bond ownership amounts" and things like that. It's not subtle, companies tout their ownership percentages and REITs list their investors.
I meant evidence of them campaigning, or financing/instigating campaigns, against remote work, thereby influencing decisions of companies to implement "back to work" policies.
ETA: I agree that you did not say this was happening in your original comment, but it seems to me your comment implied that these companies were actually influencing major decisions (since that's the topic of the OP).
Blackrock is a major institutional investor in just about every company, so they have press and backchannel effects. I assume similar things happen with e.g. vanguard and big ibanks. I know Jamie Dimon has been railing about RTO for a while.
Thanks, though I'll note that that article is about Blackrock encouraging/forcing its own workers to do hybrid work, not arguing that other companies should do so.
It's more than that, if you read the part below the fold they talk about "if we get more people back into offices the Fed's job will be easier", which I assume goes to a more systemic argument anyway. Cheers!
Can you link some think tank pieces arguing against remote work? I tried looking but couldn't find any. I found a few things but clearly none of these are part of an anti remote work effort:
> Can you link some think tank pieces arguing against remote work?
That's not the argument I made in my comment. I simply noted that if anyone wanted to hire a group to argue for (or against) remote work then such groups already exist and have done for decades.
If there's a coordinated press placing of "back to work" articles then the starting point would be all the articles that make that case (or talk about that subject) and look for authors, their bio's, whether these are staff writer pieces (and if so whether they heavily quote "research shows" vague sources), opinion pieces, etc.
The hardest to spot and most common is staff writers who cover all manner of things (no obvious bias) who are 90% copy pasta'ing unacknowledged "press releases" "media statements" handed to them on a plate by the Institute for Lazy Reporting.
US work from home isn't an area of any interest to me and I have no particular awareness of any of the US writing on the subject.
I'm an Australian that's largely worked remote (but not always from home) since the mid 1980s, largely for transnational resource companies.
Part of my professional career did involve tracing and sourcing released information intended to sway opinion, but that was all related to mineral and energy resources.
You were responding to a comment saying the world is not so coordinated by giving some examples of how coordination might happen. I gave some evidence that coordination of the type you mentioned does not seem to happen, at least for the topic being discussed, suggesting that the world is indeed not so coordinated (at least in this instance).
> I gave some evidence that coordination of the type you mentioned
was not readily apparent to yourself.
> does not seem to happen, at least for the topic being discussed,
to the best of your ability to discern such activity, if it exists.
> suggesting that the world is indeed not so coordinated (at least in this instance).
suggesting that you were unable to find such coordination in this instance; not in any way negating the point that such agencies do exist and do take on contracts to shape a public narrative to the degree possible with the resources given.
I have no knowledge of your skill levels at picking out such media shenanigans, while they absolutely do happen in general I have no basis with which to weight your inability to find any specific evidence in this instance.
More to the dynamic of the exchange, you asked if I had any personal knowledge of US remote articles being dropped in the US public sphere to order and I responded that I have no interest in such articles in the US public sphere and thus have no such knowledge. That ancedatal singular fact has no bearing on whether such a thing is or isn't happening.
I've been a remote worker for many years prior to covid and it's been interesting to watch. There's a constant drip of negative articles about remote work which increases to a flood around headlines like the Amazon one.
"Think Tanks" "agencies" etc place articles in media outlets by a variety of means (if in fact this is what is taking place).
Media outlets in general are starved for income compared to yester years and are increasingly easy to place material with.
The first link is Euro-centric, the second is Forbes with a contributed piece by an outside writer ( Julian Hayes II ) who has written a number of articles across a number of media outlets that are pro return to the office.
Is this a truly independant free opinion he is spruiking?
Is this an opinion he gets addition income from a third party for supporting?
I personally have no idea, but this is a hint of how to backtrace content sourcing.
It's not unlike working back through subsidiary shell corporations, etc.
"Sure, you could prepare for imagined eventualities, or you could do the actual work of improving efficiency, reducing waste and unnecessary middle-men, and removing centuries old bureaucracies that are now absurdly pointless in the face of the internet. There is an underlying _desire_ for apocalypse encoded in this type of thinking."
OP was written by the person who co-founded GiveWell[1] to make charitable giving more effective, and who while running Open Philanthropy oversaw lots of grants to things like innovation policy[2], scientific research[3], and land use reform[4].
Anyway, more broadly I think you present a false dilemma. You can both prepare for tail risks and also make important marginal and efficiency improvements.
"Grok" in AI doesn't quite describe generalization, it's more specific that that. It's more like "delayed and fairly sudden generalization" or something like that. There was some discussion of this in the comments of this post[1], which proposes calling the phenomenon "eventual recovery from overfitting" instead.
Part of the issue here is posting a LessWrong post. There is some good in there, but much of that site is like a Flat Earth conspiracy theory for neural networks.
Neural network training [edit: on a fixed point task, as is often the case {such as image->label}] is always (always) biphasic necessarily, so there is no "eventual recovery from overfitting". In my experience, it is just people newer to the field or just noodling around fundamentally misunderstanding what is happening, as their network goes through a very delayed phase change. Unfortunately there is a significant amplification to these kinds of posts and such, as people like chasing the new shiny of some fad-or-another-that-does-not-actually-exist instead of the much more 'boring' (which I find fascinating) math underneath it all.
To me, as someone who specializes in optimizing network training speeds, it just indicates poor engineering to the problem on the part of the person running the experiments. It is not a new or strange phenomenon, it is a literal consequence of the information theory underlying neural network training.
> Part of the issue here is posting a LessWrong post
I mean, this whole line of analysis comes from the LessWrong community. You may disagree with them on whether AI is an existential threat, but the fact that people take that threat seriously is what gave us this whole "memorize-or-generalize" analysis, and glitch tokens before that, and RLHF before that.
I think you may be missing the extensive lines of research covering those topics. Memorization vs Generalization has been a debate before LW even existed in the public eye, and inputs that networks have unusual sensitivity to have been well studied as well (re:chaotic vs linear regimes in neural networks). Especially the memorization vs generalization bit -- that has been around for...decades. It's considered a fundamental part of the field, and has had a ton of research dedicated to it.
I don't know much either way about RLHF in terms of its direct lineage, but I highly doubt that is actually what happened, since DeepMind is actually responsible for the bulk of the historical research supporting those methods.
It's possible ala the broken clock hypothesis + LessWrong is obviously not the "primate at a typewriter" situation, so there's a chance of some people scoring meaningful contributions, but the signal to noise ratio is awful. I want to get something out of some of the posts I've tried to read there, but there are so many bad takes written with more bombastic language that it's really quite hard indeed.
Right now, it's an active detractor to the field because it pulls attention away from things that are much more deserving of energy and time. I honestly wish the vibe was back to people even just making variations of Char-RNN repos based on Karpathy's blog posts. That was a much more innocent time.
> I think you may be missing the extensive lines of research covering those topics. Memorization vs Generalization
I meant this specific analysis, that neural networks that are over-parameterized will at first memorize but, if they keep training on the same dataset with weight decay, will eventually generalize.
Then again, maybe there have been analyses done on this subject I wasn't aware of.
Gotcha. I'm happy to do the trace as it likely would be fruitful for me.
Do you have a link to a specific post you're thinking of? It's likely going to be a Tishby-like (the classic paper from 2015 {with much more work going back into the early aughts, just outside of the NN regime IIRC}: https://arxiv.org/abs/1503.02406) lineage, but I'm happy to look to see if it's novel.
I originally thought the PAIR article was another presentation by the same authors, but upon closer reading, I think they just independently discovered similar results. Though the PAIR article quotes Progress measures for grokking via mechanistic interpretability, the Arxiv paper by the authors of the alignmentforum article.
(In researching this I found another paper about grokking finding similar results a few months earlier; again, I suspect these are all parallel discoveries.)
You could say that all of these avenues of research are all re-statements of well-known properties, eg deep double-descent, but I think that's a stretch. Double descent feels related, but I don't think a 2018 AI researcher who knew about double descent would spontaneously predict "if you train your model past the point it starts overfitting, it will start generalizing again if you train it for long enough with weight decay".
But anyway, in retrospect, I agree that saying "the LessWrong community is where this line of analysis comes from" is false; it's more like they were among the people working on it and reaching similar conclusions.
That's true, and I probably should have done some better backing up, sorting out, and clarification. I remember when that paper came out, it rubbed me the wrong way too then, because it is people rediscovering double descent from a different perspective, and not recognizing it as such.
What it would be better defined as is "a sudden change in phase state after a long period of metastability". Even then it ignores that those sharp inflections indicate a poor KL between some of the inductive priors and the data at hand.
You can think about it as the loss signal from the support of two gaussians extremely far apart with narrow standard deviations. Sure, they technically have support, but in a noisy regime you're going to have nothing.... nothing.... nothing....and then suddenly something as you hit that point of support.
Little of the literature, definitions around the word, or anything like that really takes this into account generally, leading to this mass illusion that this is not a double descent phenomenon, when in fact it is.
Hopefully this is a more appropriate elaboration, I appreciate your comment pointing out my mistake.
Singular learning theory explains the sudden phase changes of generalization in terms of resolution of singularities. Alas it's still associated with the LW crowd.
If it's any consolation, that post is...hot word salad garbage. It's like they learned the words on Wikipedia and then proceeded to try to make a post that used as many of them as possible. It's a good litmus test for experience vs armchair observers -- certainly scanning the article without decoding the phrasing to see how silly the argument is would seem impressive because "oooooh, fancy math". It's sort of why LW is more popular, because it is basically white collar flat-earthery, and many of the relevant topics discussed have already been discussed ad infinitum in the academic world and are accepted as general fact. We're generally not dwelling on silly arguments like that.
One of the most common things I see is people oftentimes assuming something that came from LW is novel and "was discovered through research published there", and that's because oftentimes it's really incentivized to make a lot of noise and sound plausible over there. Whereas arxiv papers, while there is some battle for popularity, are inherently more "boring" and formal.
For example, the LW post as I understand it completely ignores existing work and just... doesn't cite things which are rigorously reviewed and prepared. How about this paper from five years ago in a long string of research about generalization loss basins, for example? https://papers.nips.cc/paper_files/paper/2018/hash/be3087e74...
If someone earnestly tried to share the post you linked at a workshop at a conference, they would not be laughed out of the room, but instead have to deal with the long, draining, and muffling silence of walking to the back of the room without any applause when it was over. It's not going to fly with academics/professionals who are academia-adjacent.
This whole thing is not too terribly complicated, either, I personally feel -- a little information theory and the basics, and time studying and working on it, and someone is 50% of the way there, in my personal opinion. I feel frustrated that this kind of low quality content is parasitically supplanting actual research with meaning and a well-documented history. This is flashy nonsense that goes nowhere, and while I hesitate to call it drivel, is nigh-worthless. This barely passes muster for a college essay on the subject, if even that. If I was their professor, I would pull them aside to see if there is a more productive way for them to channel their interests in the Deep Learning space, and how we could better accomplish that.
I appreciate the thoughts. In such a fast moving field, it's difficult for the layman to navigate without a heavy math background. There's some more academic research I should have pointed to like https://arxiv.org/abs/2010.11560
> Part of the issue here is posting a LessWrong post. There is some good in there, but much of that site is like a Flat Earth conspiracy theory for neural networks.
Indeed! It’s very frustrating that so many people here are such staunch defenders of LessWrong. Some/much of the behavior there is honestly concerning.
100% agreed. I'm pretty sure today was the first time I learned that the site was founded by Yudkowsky, which honestly explains quite a bit (polite 'lol' added here for lightheartedness)
To further clarify things, the reason there is no mystical 'eventual recovery from overfitting ' is because overfitting is a stable bound that is approached. Adding this false denomination to this implies a non-biphasic nature to neural network training, and adds false information that wasn't there before.
Thankfully things are pretty stable in the over/underfitting regime. I feel sad when I see ML misinformation propagated on a forum that requires little experience but has high leverage due to the rampant misuse of existing terms and complete invention of a in-group-language that has little touch with the mathematical foundations of what's happening behind the scenes. I've done this for 7-8 years at this point at a pretty deep level and have a strong pocket of expertise, so I'm not swinging at this one blindly.
Memorization of individual examples -> generalization, I can't speak about the determinant of switching as that is (partially, to some degree) work I'm working on, and I have a personal rule not to share work in progress until it's completed (and then be very open and explicit about it). My apologies on that front.
However, I can point you to one comment I made earlier in this particular comment section about the MDL and how that relates to the L2 norm. Obviously this is not the only thing that induces a phase change, but it is one of the more blatant ones that's been covered little more publicly by different people.
No, actually this is just how language evolves. I'm glad we have the word "car" instead of "carriage powered by internal combustion engine" even if it confused some people 100 years ago when the term became used exclusively to mean something a bit more specfic.
Of course the jargon used in a specific sub-field evolves much more quickly than common usage because the intended audience of paper like this is expected to be well-read and current in the field already.
Language devolves just as it evolves. We (the grand we) regularly introduce ambiguity --words and meanings with no useful purpose, or that are worse than useless.
I'm not really weighing in on the appropriateness of the use "grok" in this case. It's just a pet peeve of mine that people bring out "language evolves" as an excuse for why any arbitrary change is natural and therefore acceptable and we should go with the flow. Some changes are strictly bad ones.
A go-to example is when "literally" no longer means "literally", but its opposite, or nothing at all. We don't have a replacement word, so now in some contexts people have to explain that they "literally mean literally".
Language only evolves, "devolving" isn't a thing. All changes are arbitrary. Language is always messy, fluid and ambigious. You should go with the flow because being a prescriptivist about the way other people speak is obnoxious and pointless.
And "literally" has been used to mean "figuratively" for as long as the word has existed[0].
I'm going to take a rosier view of prescriptivists and say they are a necessary part of the speaking/writing public, doing the valuable work of fighting entropic forces to prevent making our language dumb. They don't always need to win or be right.
That's the first time I've seen literally-as-figuratively defended from a historical perspective. I still think we'd all be better off if people didn't mindlessly use it as a filler word or for emphasis, which is generally what people are doing these days that is the source of controversy, not reviving an archaic usage.
Also, it's kind of ironic you corrected my use of "devolves", where many would accept it. :)
Just as an added data point, some languages (e.g. Hungarian) do use double negative “natively”, and I have definitely caught myself having to fight some native expression seeping into my English, including ‘irregardless’. For example a Hungarian would say “I have never done nothing bad” over “anything bad”, but it is used not in a logical sense, but more as an emphasis, perhaps?
(!)Regardless, what I’m trying to say is that due to the unique position of English as the de facto world language, it has to “suffer” some non-idiomatic uses seeping in from non-natives. Actually, I would go even further and say that most smaller languages will slowly stop evolving and only English will have that property going forward (most new inventions no longer gets a native name in most languages, the English one is used).
Sure it's easy -- you can use benchmarks like HumanEval, which Stability did. They just didn't compare to Codex or GPT-4. Of course such benchmarks don't capture all aspects of an LLM's capabilities, but they're a lot better than nothing!