Hacker News new | past | comments | ask | show | jobs | submit login

That, to some degree, is guaranteed to happen naturally. But why not use a sieve approach and narrow the pool exponentially using a range of tools that goes from automatic and instantaneous at the start to manual and time-intensive interviews at the end? The potential upside for a company is relatively large - with a small amount of work they have the opportunity to get the approximately best 30 people out of 100k graduating students. From the company’s perspective, this isn’t a problem at all, it’s a massive windfall of an opportunity. They can afford to optimize it, and they’re motivated to. As a hiring manager or company founder, I’d love to have this “problem”.

That said, your comment reminds me of a Monte Carlo algorithm I think I’ve heard about. There is a way to have some statistical confidence in getting the top K out of a sample of N without examining all N samples. I’m blanking on what it’s called, and I think it’s related to Reservoir Sampling. I don’t know if I read this or am making it up, but my instinct is that you can get to high levels of confidence after looking at sqrt(N) samples.




The problem is noise.

Say you want to take 10% at each round. You have to do this three times to get 100 people, and then another round to get the top 30.

If you tighten the criterion, noise matters more, so you will drop your actual best candidates, since people need a high skill+noise. But if you try to keep the unlucky "good" candidates by having a wider band, you pay more.

Also, if you are successful at finding the best candidates in a batch, noise matters even more the next round.

I've been thinking about how to vizualize this. I have it in my mind and I'll try to describe it.

You plot a bunch of points with S, N for each candidate. They are independent, so a scatter plot looks like a 2D gaussian. Lets say skill is vertical, and noise is horizontal.

You want to find the 30 highest points, but you aren't allowed to just look on the scatter plot and select the top points. You have to draw a line S + N = c, and choose the person with the highest c. Basically sliding a ruler out that crosses the S and N axes as far away from the origin as possible.

Observation: if N is high (wider distribution) compared to S, you just get the luckiest people. If S is high compared to N, you just get the most skillful.


Noise is a reality and a potential problem in ranking candidates for sure, 100%. Of course it’s worse than just measurement noise; the noise in hiring is subjective and situational and human. The sampling method doesn’t fix it, using the Secretary Problem approach doesn’t give you less noise, it gives you more noise in the result. The benefit of stopping early is that it reduces the cost of sampling, but if the company doesn’t care about the cost of sampling, there is no “problem” there to solve.

If you want to reduce noise, the way to do it is to have more independent measurements (interviewers), not to stop interviewing early.

The good news is that there’s no such actual thing as “best” in this situation and people have many dimensions, they can’t be ranked perfectly, but they can still be ranked approximately. We also don’t need to get the exact 30 people out of 100k people with infinite precision, we will get an amazing set if we can take a random sample of the top 1000 candidates out of 100k candidates. Having 100k candidates gives us the opportunity to end up with a selection from the top 1%, say, whereas if the number of applications was 40 people for 30 jobs, you might be stuck accepting people who are below average.


I don't see how your method works better than a sieve with noise.

Instead of a "noisy" sieve, you're just using the time of application as your sieve. Which, unless your job opening values the skill of submitting lightning fast job applications, is pretty much 100% random noise.

You say a sieve approach drops the best candidates. Your approach also drops the best candidates.

And if the bar is actually so high that only the best candidate(s) qualify for the job, then your approach would imply you interview an expected half of the applicants before you fill the positions.

It's pretty obvious a "noisy" sieve (with some signal) is better than sieving by application time or sorting by arbitrary order and taking the top N applications. You don't have a magical solution. The people handling those 100000 applications are not that stupid.


The difference is that you don't need to wait for 100k people to show up.

If you're already reasonably confident from the first batch of people what the potential in the pool is, why would you wait? People have work to do and they want those people in as early as possible.

I'm not saying you will do much better sieving by time, in terms of quality. But you will save all the effort of looking at the people at the back of the queue, for little loss.

Sure, half the best people will be at the back, but what is your loss on taking the second best people earlier? Chances are you won't even be able to tell.


If the problem is having to wait for applicants, then you’re right; stopping early will help. In my experience, companies already do what you’re suggesting simply by having a deadline or time window for filling a job. Even if a lot of companies would love it, most job posts don’t get over three thousand applicants per seat available.

In this particular case, MS didn’t have to wait for people to show up, they got 100k candidates before they could have even made a decision on a small subset. And in general, the problem being discussed in this thread isn’t having to wait, it’s how to deal the flood of applications coming in too fast.

* huge edit to this comment after this rattled around my head a little more: actually, duh, sieve and early stopping are completely orthogonal, this is not an either-or situation, and you are probably assuming a sieve in your proposed approach. You can sieve 1 candidate at a time, since it’s a series of criteria ranked in order of how much time each one takes to evaluate. If someone doesn’t meet minimum requirements, they’re rejected quickly and not invited to interview. If you’re going to stop early but you have a lot of candidates to evaluate, then you still have to sieve. Like, I’m pretty sure you’re not proposing to conduct 37,000 interviews for 100k candidates, right? Even if we stop before looking at all candidates, each candidate will have some early criteria used to cull them, and only the ones that pass all the early criteria are invited for an interview. That’s true whether or not we stop early. The sieve is a given and unavoidable. The only question is whether to stop early, which we would only do if it solves a problem. We can, if we want, interview the same number of candidates either way. The sieve allows you to look at all candidates efficiently, but does not require it. If you do look at all candidates, then the more people in the top of the sieve, the statistically better the final interview pool will be. Stopping early is valuable only if the applications process is slow (which is not the case here) or if the early criteria take significant time to evaluate.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: