In my part of Google, we use TPM for "Technical Program Managers" instead of PgM.
In general a TPM at Level N will have the technical skills of a Level N-1 SWE. So many TPM's have a CS background, and a good TPM is an amazing partner/resource for a TL to have, especially for a large, complex project which spans multiple teams and multiple departments.
For my Hybrid SMR project, my TPM came out of a HDD vendor, and was very well versed in the technologies of HDD internals. At the same time, he could navigate all of the bureaucracy and process to get test racks ordered, populated with servers, and installed in data centers. He could also create the capital budget plan and get it submitted and approved through finance so I could concentrate on the technology. A good TPM is critical for the success of a large projects; I couldn't have done it without him.
I work at Google so my perspective is to be biased, but that's not what I see.
I work on infrastructure, and so a few years back, when I proposed a major project, I had to demonstrate how it would save *many* times the fully loaded cost of the engineers on the team, by reducing the Storage TCO for all of Google (for example). It was not enough for the project to "break even" --- the benefits had to do more than just exceed the "nominal" SWE cost. It had to be multiple times the cost of the SWE's, to account for the opportunity cost of those SWE's --- SWE's are a constrained resource, which is why a project needs to save $$$ (or increase profits) by many multiples the fully loaded SWE cost. (That project has since been completed, successfully, and I got a promotion to Sr Staff Engineer out of it.)
The reason why SWE's are a constrained resource is becaused finding good SWE's is non-trivial. As a TL, I don't want to waste my precious approved headcount on people who just want to rest and vest, or people who believe in the crazy talk of only needing to work 30 minutes each day. I'm trying to find highly motivated, smart, and talented SWE's who can also be team players. And if they need to have domain expertise (say, be proficient kernel engineers), it's super-duper difficult.
So I don't see any indication of people getting hired just to starve statups of talented engineers. We need every single talented engineer we can get for the projects that we want to accomplish. And in the time when we may need to slow down our growth, it may mean that we will need to slow, or shut down some projects. That may suck, especially if it's a project that we had invested a lot of passion into. But it's certainly no reason to panic. Slowing down growth is not the same as layoffs, and there is no shortage of work for us to do.
You also wrote major parts of Linux, so it's possible your contact with Googlers might be biased towards the more functional parts of the organization. Your bar for "talented engineer" may be higher than average parts of Google.
> As a TL, I don't want to waste my precious approved headcount on people who just want to rest and vest, or people who believe in the crazy talk of only needing to work 30 minutes each day.
And yet these people exist and get hired, even at Google. Perhaps not on your team, but they’re definitely there.
SV talks the big talk about hiring the best of the best of the best, then hires 30 thousand people in three months. There probably aren't that many 10x programmers in the world.
Interview process gatekeeping notwithstanding, the idea that you can hire 30,000 people (admittedly a decent proportion is probably backfilling/growing administrative, etc. positions) who are all "the best and the brightest" doesn't pass the sniff test.
As so many people learned over the years in storage org, essential modus operandi was “just do your current shit, this new shit you are proposing is too hard and we don’t want to do it”. I lost count of improvement proposals that ultimately were not getting supported (there was a funny one with (first) hosted NFS prototype, when they tried to turn it off, turned out there were pissed-off customers running business workloads already). Instead we tried to fit square pegs into the round holes “by using existing technology”.
Oh, you can certainly do big projects. My project[1] spanned 3 departments, and involved dozens of engineers, and required that we work with multiple hard drive vendors (our first two partners for Hybrid SMR were Seagate and WDC) on an entirely new type of HDD, as well as the T10/T13 standards committees so we could standardize the commands that we need to send to these HDD's. So this was all a huge amount of "new shit" that was not only new to Google, it was new to the HDD industry. You just have to have a really strong business case that shows how you can save Google a large amount of money.
On the production kernel team, colleagues of mine worked on some really cool and new shit: ghOSt, which delegates scheduling decisions to userspace in a highly efficient manner[3]. It was published in SOSP 2021/SIGOPS [4][5], so peer reviewers thought it was a pretty big deal. I wasn't involved in it, but I'm in awe this cool new work that my peers in the prodkernel team created, all of which was not only described in detail in peer-reviewed papers, but also published as Open Source.
I'm not saying storage didn't do big projects. I'm saying that over time it got calcified and instead of doing proper stack refactoring and delivering features beneficial for customers, it continued to sadly chug along team boundaries.
For example:
RePD is at just wrong level at all. It should have been at CFS/chunk level and thus benefit other teams as well.
BigStore stack is beyond bizarre. For years there were no object-level SLOs (not sure if there are now), which meant that sometimes your object disappeared and BigStore SREs were "la-la-la, we are fully within SLO for your project". Or you would delete something and your quota would not get back, and they would "or, Flume job got stuck in this cell, for a week...".
Not a single cloud (or internal, for that matter) customer asked for a "block device", they all want just to store files. Which means that cloud posix/nfs/smb should have been worked on from the day 1 (of cloud), we all know how it went.
No one asked for a "block device"? Um, that's table stakes because every single OS in the world needs to be able to boot their system, and that requires a block device. Every single cloud system provides a block device because if it wasn't there, customers wouldn't be able to use their VM, and you can sure they would be asking for it. Every single cloud system has also provided from day one something like AWS S3 or GCE's GCS so users can store files. So I'm pretty sure you don't know what you are talking about.
As far as "proper stack refactoring" is concerned, again, the key is to make a business case for why that work is necessary. Tech debt can be a good reason, but doing massive refactoring just because it _could_ help other teams requires much more justification than "it could be beneficial". Google has plenty of storage solutions which work across multiple datacenters / GCE zones, including Google Cloud Storage, Cloud Spanner and Cloud Bigtable. These solutions or their equivalent were available and used internally by teams long befoe they were available as public offerings for Cloud customers. So "we could have done it a different way because it mgiht benefit other teams" is an extraordinary claim which requires extraordinary evidence. Speaking as someone who has worked in storage infrastructure for over a decade, I don't see the calcification you refer to, and there are good reasons why things are done the way that are which go far beyond the current org chart. There have been a huge amount of innovative work done in the storage infrastructure teams.
I will say that the posix/nfs/smb way of doing things is not necessarily the best way to provide lowest possible storage TCO. It may be the most convenient way if you need to lift and shift enterprise workloads into the cloud, sure. But if you are writing software from scratch, or if you are internal Google product team which is using internal storage solutions such as Colossus, BigTable, Spanner, etc., it is much cheaper, especially if you are writing software that must be highly scalable, to use these technologies as opposed to posix/nfs/smb. All cloud providers, Google Cloud included, will provide multiple storage solutions to meet the customer where they are at. But would I recommend that a greenfield application start by relying on NFS or SMB today? Hell, no! There are much better 21st century technologies that are available today. Why start a new project by tying yourself to such legacy systems with all of their attendant limitations and costs?
> So I'm pretty sure you don't know what you are talking about.
Trust me, I intimately know what I’m talking about.
Without personal jabs, let me explain in a bit more detail:
App in VM (kinda posix) -> ext4 (repackaging of data to fit into “blocks”) -> NVMe driver -> (Google’s virtualization/block device stack, aka Vanadium/PD) -> CFS. The moment data got into ext4, it goes through legacy stack that only exists because many years ago there were hardware devices that had 512 byte sectors (as illustration, upgrade to 4K took forever). All repackaging, IO scheduling to work with 4kb block abstraction is wasted performance and cycles.
From customer perspective, all they want is VM with scalable file system. With Kubernetes, etc. they don’t want to ever think about volume size, which is major hurdle to size correctly and provision. BTW, both small and large customers run into volume sizing issues all the time.
There are also internal customers that need posix-compliant storage “on borg” because they run oss lib/software.
Anyway, optimal stack in this case is to plug in into VM on a file system level. Now, is it hard problem to solve? Yes. Would it eliminate PD? No, still required for legacy cases. Would it be enormously beneficial for modern conteinerized cloud workloads? Absolutely.
As I said, there are apps that need a Posix interface although my contention is the vast majority of them are "lift and shift" from customer data centers into the cloud. Sure, they exist. But from a cost, efficiency, and easy of supporting cross-data center reliability and robustness, the Posix file system interface was designed in the 1970's, and it shows.
If you have an app which needs a NoSQL interface, then you can do much better by using a cloud-native NoSQL service, as opposed to using Cassandra on your VM and then hoping you can get cross-zone reliability by using something like a Regional Persistent Disk. And sure, you could use Cassandra on top of cifs/smbfs or nfs, but the results will be disappointing. These are 20th century tools, and it shows.
If customers want Posix because they don't want to update their application to use Spanner, or Big Table, or GCS, they certainly have every right to make that choice. But they will get worse price/performance/reliability as a result. You keep talking about ossification and people refusing to refactor the storage stack. Well, I'd like to submit to you that being wedded to a "posix file system" as the one true storage interface is another form of ossification. Storage stacks that feature NoSQL, relational database, and object storage WITHOUT an underlying Posix file systems might be a much more radical, and ultimately, the "proper stack refactoring". A "modern containerized cloud workload" is better off using Cloud Spanner, Cloud BigTable, or Cloud Storage, depending on the application and use case. Why stick with a 1970's posix file system with all of its limitations? (And I say this as an ext4 maintainer who knows about all of the warts and limitations of the Posix file interface.)
Of course, for customers who insist on a Posix file system, they can use GCE PD or Amazon EBS for local file systems, or they can use GCE Cloud Filestore or Amazon EFS if they want an NFS solution. But it will not be as cost effective, or performant as other cloud native alternatives.
Finally, just because you are using "oss lib/software" does not mean that you need "Posix-complaint storage". Especially inside Google, while those internal customers do exist, they are a super-tiny minority. Most internal teams use a much smarter approach, even if that means that an adaption layer is needed between some particular piece of OSS software and a more modern, scalable storage infrastructure. (And for many OSS libraries, they don't need a Posix-complaint interface at all!)
Posix-complaint means sticking with an interface invented 50 years ago, with technological assumptions which may not be true today. Sometimes you might need to fall back to Posix for legacy software --- but we're talking about "modern containerized cloud workloads", remember?
Don't get me started on how many times Google Cloud started and then killed cloud NFS (I see now they have an EFS-like product). Or how hard it was to buy a spindle.
Here are the latest development statistics from the just-released 5.19 kernel. (Please consider supporting Linux Weekly News by subscribing if you find content like this useful; one of the benefits is you can help share subscriber-only content to friends and colleagues via Subscriber Links):
If you scroll down to the Most active employers in 5.19 by commits you'll see:
1. Intel 10.9%
2. (Unknown) 7.5%
3. Linaro 5.7%
4. AMD 5.5%
5. Red Hat 5.2%
6. (None) 4.3%
7. Google 4.1%
8. Meta 3.5%
9. SUSE 3.1%
10. Huawei 2.9%
The statistics are slightly different if you count by lines of codes changed, but either way, it's not all FANNG companies, not by a long shot. There are plenty of people who get started coding via kernelnewbies.org and other resources.
I very much doubt whether the MIT Press Office cares about what Hacker News thinks of their brand. What they do care about is what Sarah J. Student's parents think when they are trying to encourage their progency to attend Harvard vs Yale vs Stanford. And positive press is good towards achieving that mission, even if it is a bit click-baity. And since all universities are playing this game to one degree or another, it's asking quite a lot for one univesity to unilaterally agree to disarm.
The other "brand" that universities care about is the their reputation by their professor's peers when it comes to hiring the best talent for their departments, and with the granting agencies who are deciding which research proposals they should fund. And here, what matters is the peer-reviewed publications at various academic journals and conferences. Whether a university's press office puts out a press release, which then gets mangled by various newspapers, doesn't really have negative or positive effect when it comes to how a university's research work is measured by the People Who Really Matter --- namely, other professors and the people who dispense the cash. Hacker News falls into neither of these two categories.
Keep in mind that the MIT Press office != the professors at MIT. It's actually quite common for any University's press office to be clueless about the research being done at the university, and it is their job to make as big of a splash as possible.
This is much like the oft-cited issue that the reporters aren't responsible for the headlines --- that is picked by the editors to make as big of a splash as possible.
As far as "patenting a technology from the 60's", even an incremental improvement on an old idea is still patentable. The real test is whether the patent cited the prior art, and you can bet that MIT Press Office didn't read the patent application before breathlessly sending out the press release.
That is a good point. MIT professors are usually very aware of the context of their work and the actual contributions they are making. Nevertheless, the MIT press office is a particularly egregious organization in terms of its tendency to exaggerate - my dad's lab cured cancer about 10 times in the early 2000's if you believe the press releases.
In this case, I'm not sure the MIT professor was aware of the "crackpot" invention that effectively pre-empted him. Many professors don't read patents (and there are lots of reasons why they shouldn't), and I'm not sure there was a paper.
One thing about patents is that if an invention is obvious in light of other inventions, it might not be patentable. In this case, I'm pretty sure that this combination of a technology from the 60's, a boost converter, and a LiPo battery may be "obvious" given the prevalence of other drones: it is a combination of two past inventions that fit together naturally (a drone + a propulsion technology). Of course, it will take millions of dollars worth of arguing for someone to say either way.
Indeed, the statement of slavery not providing any economic benefits as a whole is a country-level statement. Whether or not there exists certain large slaveholders, or business people in the North who made their family fortune building ships for the slave trade, or worse, trading slaves, who did benefit from slavery is not really subject to debate.
The hard question from a reparations perspective is suppose that business person left the bulk of their family fortune to a particular church diocese. Let's further assume that donation was made in the form of the trust fund, so it's very easy to identify the source of a particular trust fund would not have existed but for the fact that this business person was part of the slave trade. What moral obligation, if any, does the church diocese have to repairing the harms that this donor may have inflicted on a group of people more than a century ago, given that in the meantime this church diocese has been enjoying a continuing income stream that originally had its roots in the slave trade?
One could argue, "none at all", and one could also point out that there was an awful lot of good being done by the works enabled from the income stream of that trust fund. Dismantling that trust fund (if it can be legally done; there might be donor restrictions that might make this difficult/impossible) would eliminate the good being done via that trust fund. But one could argue that this is a similar argument made by the British Museum when it was refusing to return the Elgin Marbles, and that it is a bogus one. Or someone could argue that no matter what the value of that trust fund should be, it pales in comparison that the harm that has been done, and so you shouldn't even try. Others might argue that at least admitting the truth of how an organization has benefited by past injustices is the important thing, and that reparations is not so much about money, as it is about repair --- acknolwedging and making at least some effort to repair the damage to the community by past injusticies.
After all, if someone burns down your home, what gets lost is far more than the monetary damages; it's also the emotion impact of having your home being lost, and objects of sentimental value, such as photographs, jewelry once owned by your mother, etc., which can't be compensated using mere money. All of that is true. And yet, having a true and sincere "I'm sorry" by someone who is genuinely sorrowful and repentant, can mean an awful lot.
Bottom line is "reparations" is a deeply complex topic, and it is not just about writing checks. In fact, that's arguably the least important part of the whole process. It's unfortunate that this is the part that most people who are against reparations focus upon.
I don’t disagree that reparations is more than monetary, but it certainly includes monetary compensation and therefore it’s worth discussing.
The dilemma you described (using stolen goods to do otherwise good deeds) boils down to whether you want to take a utilitarian or deontological moral view. Having said that, no one (in our capitalist society, at least) would accept me stealing your car to drive patients to the hospital as OK, even if my car otherwise sits completely unused. And similarly, if I crashed your car while doing my good deed, there would be no question that I owe you a replacement.
I think the more interesting question about reparation is whether there is truly a “continuity” (of the institution, state, or country on the one hand, and of a family unit on the other). If there is, then obviously the institution owes everything needed to make you whole. If there isn’t, then they owe you the same as any other citizen.
> I think the more interesting question about reparation is whether there is truly a “continuity” (of the institution, state, or country on the one hand, and of a family unit on the other). If there is, then obviously the institution owes everything needed to make you whole. If there isn’t, then they owe you the same as any other citizen.
Can you explain this? What is continuity in this sense, and why is it obvious that they owe everything needed to make everyone whole?
Actually, it is only going to be enabled for "work mode". It is not being turned on for everyone by default. Certainly not if you are talking about the consumer version of Google Docs, and not even all paid versions of Google Workspace. So not only does your employer have to pay for the feature, they may have to pay extra if they are currently purchasing one of the cheaper tiers of service.
Specifically, it is only going to be enabled for these editions of Google Workspace: Business Standard, Business Plus, Enterprise Standard, Enterprise Plus, Education Plus
It is *not* going to be enabled for: Google Workspace Essentials, Business Starter, Enterprise Essentials, Education Fundamentals, Teaching and Learning Upgrade, Education Standard, Frontline, Nonprofits, G Suite Basic and Business customers
Even for those business accounts where this feature is enabled, the Workspace Admin for that domain can turn these stylistic suggestions on or off. (And you can turn off the inclusive language suggestions, while leaving other suggestions, such as for "concise language" on or off. There is a certain amount of granularity as to which classes of stylistic suggestions are enabled or not.) And users can also turn it on or off for themselves, regardless of what your Workspace Admin has decided about the defaults.
So sorry for bursting your righteous outrage bubble, but the intent is to enable this for those companies that might want to nudge their employees towards using more professional style of language. And if you don't like that, you can always leave and go work for some other employer....
I thought this was a great explanation up until I got to the unnecessary quip apologizing for "bursting my righteous outrage bubble". I haven't exhibited any outrage in my simple question so I don't know why you felt it necessary to add that.
They way you phrased your "simple question": "Then why don't they give the option to enable this in a "work mode" and not turn it on for everyone by default?" assumed that Google had turned it on for everyone, and it read as if you thought that was unreasonable and outrageous.
It might be nice if people assumed good faith, as opposed to assuming that anything that $BIG_COMPANY might do is unreasonable and evil. Certainly many people on these threads immediately leapt to the assumption that it was enabled for everyone and was trying to coerce people into some kind of DEI hell that conservatives hate.
I think you read a little too far into my original question since there isn't any indicator of how I feel about this one way or another. Though, to your second point, $BIG_COMPANY doesn't always have the general public's best interests at heart, especially not Google, so it's not hard to see why people are skeptical or worried about this sort of a change.
In practice it's not hard that hard to solve if you only are supporting a limited number of CPU architectures (e.g., all the world's x86) or only one bootloader. Even if some of the BSD systems support multiple architectures, in practice, they are mostly used only in x86 servers --- and are mostly judged by how well or poorly they work on x86. In contrast, Linux has to work on a very large number of embedded architectures, and some of the CPU architectures don't even have a fine-grained CPU cycle counter, let alone something like RDRAND. And some architectures have practically no peripherals that provide unpredictable input, and some of them very cleverly generate RSA keys and x.509 certificates as the very first thing they do as part of their "out of box" experience.
If you can assume that you're always running on x86 architecture, with RDRAND and RDSEED, and pretty much all desktops, servers, and laptops have TPM chips (which have their own hardware random number generator) and are using UEFI boot (which also has a random number generator) --- and while maybe one of these are either incompetently designed, or backdoored by either the NSA or MSS, hopefully not all of them have been compromised, it's really not that hard.
The challenge has always being on the crap embedded/mobile devies, where manufacturers live and die based on a tenth of a penny in BOM costs..... (and where they tend to have hardware engineers writing firmware and device drivers, and contractors implementing their Minimum Viable Product, and no one ever goes back to retrofit security....)
As does FreeBSD, and of course those non-x86 platforms are where the pain points are re: RNG -- same as Linux. Tytso is just unfamiliar with the BSD landscape.
Part of the whole point of this blog was to explain why it's not possible to explain and satisfy their curiosity, because the reasons can't be disclosed externally outside of the company (either because the reasons are under NDA, or involve lawyer's concerns over terms or conditions, or because the company doesn't want to call out some project leader as a toxic jerk).
The blog post is a general answer of why sometimes people's curiosity can't be satisfied; pamphlets are so 18th century, after all. Sure, that's what the authors of the Federalist Papers used, but in the 21st century, we use blog posts instead of pamphlets. :-)
1) I personally use Google, because it works for me. I will admit freely that part of this may be that I've gotten used to framing queries such that it works with the search engine that I'm used to using, and that's probably true for many people commenting on HN.
2) Every so often, these posts will inspire me to do my own non-scientific experiments such as using the same query, say, "Dua Lipa Levitating" or "Go modules vs packages" on say, DDG, Bing, and Google. When I did this experiment most reently, I generally find that Bing and Google are both more personally useful for me and DDG is less useful (but as the old latin saying goes, "De gustibus non est disputandum"). I will note that DDG had the more obstrusive advertising at the top of the results (Stubhub and Urban outfitters), and Bing and Google did not have any adds that I could find for that first query.
Given my personal evaluations, it tends to cause me to discount what many of the DDG enthusiasts have to say about DDG being is way better, simply because it doesn't accord with my own quickie experiments. But hey --- maybe it's because they enter in search queries differently than I do.
3) Given the hyperbolic and/or highly emotive nature of some of the comments about "holding users hostage" and complete lack of nuance over "violating users' privacy", again, it causes me to have a hard time taking everything else that rhwy have to say seriously.
4) I sometimes suspect people are remembering the past with rose colored glasses. I remember the search quality of Alta Vista (and with all due respect to the people who worked on Alita Vista and having had friends who worked there), the results were pretty crappy, and Google's results were heads and shoulders above somoe of the other competitors that were available back in the day.
5) All of this is my personal opinion; and users should feel free to use whatever search engine they want, and competition is a good thing.
Since you live in San Francisco, a number of companies (I know for sure Facebook and Google) have ways where you know somone who works at that company, and who can vouch for you, they can help you get control back to an account that has been lost or taken over by someone malicious. Maybe you know someone at those companies? The companies themselves generally don't advertise this, because it obviously doesn't scale, and they'd be concerned with people who try to strike up a "friendship" with an employee just so they can backdoor access to an account --- this is something that can be used as a security attack vector as well! (So it works best for, "I've known this person for the last X years, and last month they completely lost control over their account. I can say for sure they are who they say they are and not a conman or a state-sponsored intelligence agent." sort of thing.)
Other than that, what I try to tell everyone to use 2FA authentication, and not just SMS text messages or TOTP's, but FIDO Security Keys to protect your digital identity. Never reuse passwords and use a password manager, yadda, yadda, yadda.
In general a TPM at Level N will have the technical skills of a Level N-1 SWE. So many TPM's have a CS background, and a good TPM is an amazing partner/resource for a TL to have, especially for a large, complex project which spans multiple teams and multiple departments.
For my Hybrid SMR project, my TPM came out of a HDD vendor, and was very well versed in the technologies of HDD internals. At the same time, he could navigate all of the bureaucracy and process to get test racks ordered, populated with servers, and installed in data centers. He could also create the capital budget plan and get it submitted and approved through finance so I could concentrate on the technology. A good TPM is critical for the success of a large projects; I couldn't have done it without him.