Tagged Pointer Strings

weinzierl · on July 31, 2015

    I speculate that Apple originally intended to use a 
    fancier variable-length encoding, probably based on a 
    Huffman code. This turned out to be too difficult, or 
    not worth the effort, or they just ran out of time, and 
    so they dialed it back to the less ambitious version 
    seen above, where strings choose between constant-
    length encodings using eight, six, or five bits per 
    character. The weird table lives on as a leftover, and 
    a starting point if they decide to go for a variable-
    length encoding in the future. This is pure guesswork, 
    but that's what it looks like to me.

A much simpler explanation is that someone just copied and pasted from the Cocoa string character frequency table.

api · on July 31, 2015

I recall a long time ago reading something about potential uses of a 128-bit CPU architecture with 128-bit addressing.

Obviously 128-bit addressing isn't and probably never will be needed to address memory, at least not in this geological epoch. But that's not the point.

Instead, 128-bit addressing would allow every pointer to have 64 bits of tag information that would allow all kinds of VM optimizations. Not sure if the additional memory bloat would be worth it though. Like some RISC architectures it might die by memory bandwidth constraints.

rogerbinns · on July 31, 2015

> I recall a long time ago reading something about potential uses of a 128-bit CPU architecture with 128-bit addressing.

Go back to the IBM System/38 and the AS/400 and that is exactly what you have. Programs are actually 128 bit oriented bytecode, so the actual CPU architecture isn't visible, doesn't matter, and has been upgraded many times over the years.

They also did virtual memory differently. Everything is in the same address space, which means everything can (after access control) communicate and cooperate. Task switches are also very fast. They also use tagged pointers, but it is a single bit outside of what the apps can see, and marks a pointer as modified. Consequently pointers can be given out, and the kernel knowns if the app has tampered with them.

I highly recommend reading the architects book about how all this stuff worked, mixed in with how business machines should behave. Many things are very different than conventional Unix/Linux/Windows NT kernels, and a good way of broadening your mind. The second edition book is good - later editions may have removed earlier interesting detail. http://www.amazon.com/Inside-AS-400-Second-Edition/dp/188241...

amelius · on July 31, 2015

How about "interning" strings [1]? This is also useful for storing larger strings.

[1] https://en.wikipedia.org/wiki/String_interning

chc · on July 31, 2015

Constant strings have always been interned. I don't think any others are, though I haven't looked into it in a while.

mikeash · on Aug 1, 2015

It's not NSString, but it's interesting to note that Objective-C selectors are just interned strings. By interning them, they can be compared with a simple pointer comparison. Because they're just C strings, you can print a selector by casting it to char *. (But don't do this in production code, debugging only!)

jheriko · on Aug 1, 2015

this is an interesting trick, but it only really has benefit if used this way when things are dynamically late binding. the cost of the check if an pointer is tagged can even exceed the cost of any simple operation /in total/ on a static, compile-time known object (e.g. a comparison with no branch, small copies, simple arithmetic etc.). for something like NSNumber this is probably very measurable (compared with, say NSString)

if objective-c had more mechanisms to enable baking out the run-time lookups and operations at compile-time then i think that this would no longer be a valuable optimisation. maybe they are there but are just not easy to find or use... having the ability to use C or C++ does make it unnecessary.

sure, you can use C types in a lot of cases and the C library, but being able to do more with the object-orientation functionaliy would be nice, rather having to do OO by convention in C or using C++ to be seriously fast or efficient - swapping NSDictionary for an std::map or std::unordered_map is an optimisation i've ended up making multiple times in situations that aren't really even tight inner loops, where the difference has even been as great as making the difference between a severe, perceptible hitch (approaching a second run-time) and something you can't even notice (millisecond or less run-time).

as a rule i use Objective-C only as much as the platform forces it, and stick with C++. its enormously harder to make slow and inefficient C++ code by accident. e.g. you can loop over thousands of objects and do things with them in << 16ms without special efforts. on some devices this is impossible with objective-c even if what you are doing with the objects is a nop... this is why gamedevs, for instance, avoid it like a plague.

(however i'm sure there could be other, interesting, uses of tagged pointers...)

mikeash · on Aug 1, 2015

Tagged pointers really have nothing to do with late binding. They're useful anywhere you have a data that is sometimes small enough to fit within a single pointer-sized chunk, and sometimes isn't. Strings are a good example of this: short strings can be tagged pointers, and long strings can be dynamically allocated. As another commenter mentioned, some C++ implementations use tagged pointers for std::string.

Another example might be a big number library, where numbers smaller than the pointer size could be implemented as tagged pointers, and larger numbers would be dynamically allocated. A C++ big number library could easily do this.

Whether it's worth the cost depends on how expensive memory allocation is, how expensive pointer indirection is, how many objects are able to fit within a tagged pointer, and how expensive the tag check is. Typically the answers are, a lot, a lot, a lot, and a lot. And dynamism doesn't change the answer much.

jheriko · on Aug 4, 2015

erm. why are you even using a pointer if its not late-binding?

this is what static, stack, scopes and global variables are for with references as needed...

cpr · on July 31, 2015

Interesting--the kind of trick you'd normally see in an interpretive language.

Marshalling a string object ref to an actual value (before it can be used for string manipulations internally) must incur a fair bit of CPU overhead.

I guess they decided the unboxed memory savings were worth it.

bodyfour · on Aug 1, 2015

Memory savings isn't the important thing. Small strings are, after all, small. You can fit a lot of them in cache. What you're saving is the CPU cost of malloc()+free().

I'll trust that they had good performance data that made them decide to do this. Still, I'm always a bit skeptical about these clever string tricks. It's an optimization that looks amazing in microbenchmarks but has costs in the real world. That's because you're adding a (potentially hard-to-predict) branch to every access to the string.

If your application constantly allocates and deallocates lots of tiny ASCII strings, this clever representation will be fantastic for you. However, if you use a mix of string sizes, tend to keep them around for awhile, and you do a lot of manipulations on those strings, you pay the branch-misprediction cost over and over.

mikeash · on Aug 1, 2015

Apple's implementation actually adds a (potentially hard-to-predict) branch to every single Objective-C message send.

They're extremely careful about the performance of objc_msgSend, because it's so significant to the overall performance of apps. The running time for a hot path message send is down to single digit CPU cycles. Any addition to that shows up pretty loudly. I'm sure they checked to make sure the check doesn't add too much overhead in real-world use, and that the wins are worth it.

paulhodge · on July 31, 2015

It's not really CPU intensive. To turn a 5-bit char into an 8-bit one, just do a lookup into a tiny constant array. Doesn't even introduce any new branches. Roughly the same performance as iterating over every character. (which you're probably doing anyway if you need the conversion)

The savings aren't just memory, it's probably a performance improvement too. Fewer allocations, fewer pointer indirections, and some operations (like equality checking) become O(1) instead of O(n) for these tiny strings.

Someone · on July 31, 2015

The other way round (deciding whether a newly minted string of length 10 can be put in a tagged pointer) is slightly more complex. Also, for tiny strings (the only strings that this supports) on modern hardware, I'm not sure that O(n) takes much longer than that O(1).

I would think the avoidance of allocations and the pointer indirections are the big wins.

mockery · on Aug 1, 2015

Slightly pedantic observation: O(n) == O(1) under the specified constraint (n <= 11)

Bognar · on July 31, 2015

It's likely that the CPU overhead is lower than a cache miss.

nine_k · on July 31, 2015

Looks very similar to the way in which numbers / strings / objects are represented in Javascript (and earlier in some Lisps).

The nice thing is that for pointers, tagging is "free", for the lowest 4 bits of a 16-byte aligned pointer are not used anyway.

apaprocki · on July 31, 2015

It's more complicated than that -- x86-64 systems only use 48 bits of addressable space. JS engines use all the remaining bits for tagging, not just bits resulting from 8- or 16-byte alignment. This works great until you want to run on another CPU architecture that actually has 64-bits of addressable space and uses it regularly...

jhallenworld · on Aug 1, 2015

I'm surprised they need so many bits.. you really just need to know if the item is an object (so the type can be embedded in the object), or a primitive type. But there are not so many primitive types.

Some dynamically typed systems put the tag at the beginning of a page: so all items of a particular type are allocated out of a page tagged with that type. You don't waste any bits from the pointer or data, but you do have to do some masking and worse, a remote memory access to get the type.

Probably safest and fastest to do something like: struct { int type; union { char *s; int n; double d; }};

bdash · on Aug 1, 2015

Many JavaScript engines use IEEE-754 double-precision floats as their immediate value type, and fit pointers and integer within the NaN payload of the double representation. See https://wingolog.org/archives/2011/05/18/value-representatio... for a discussion of a few similar approaches and the tradeoffs of each.

cpr · on July 31, 2015

Sure, free except that every dereference has to clear out those bits, unless the target architecture has a "pointer deref ignoring lower n bits" operator/operand.

bdash · on July 31, 2015

The tag bits are zero when using the pointer as a pointer, so nothing special needs to be done in order to dereference them. The tag bits are only non-zero when non-pointer data is packed into the pointer, and in that case you inherently need special handling anyway.

mikeash · on Aug 1, 2015

It does mean that every dereference needs to check for taggedness, though. For example, newer versions of objc_msgSend check the low bit at the beginning of the call before it proceeds to the normal message send path (or not, if it actually is a tagged pointer).

userbinator · on Aug 1, 2015

Not necessarily every dereference, but only those in which the value could be something other than a pointer.

mikeash · on Aug 1, 2015

Do you mean because it already did the check before, and it could remember the results? Certainly it could. Unfortunately this isn't an option in Objective-C because the compiler can't make any assumptions about the tagged pointer implementation, and that means objc_msgSend has to start fresh at each invocation. In theory you could have sub-functions, like objc_msgSend_really_object and objc_msgSend_tagged_pointer that the compiler could generate calls to directly, but then you'd have to support whatever assumptions the compiler made forever.

I'm guessing/hoping that with modern branch prediction, the check at the top of objc_msgSend is a tiny penalty. I'm sure Apple measured the heck out of it, anyway.

cpr · on July 31, 2015

Ah, right. Brain infarction.

TazeTSchnitzel · on Aug 1, 2015

Note that 8 bits per character isn't ASCII, but ISO 8859-1, which Unicode is a superset of. I expect the string "née" would become a tagged pointer string, for example.