I posted a message on the Unicode mailing list, which eventually lead to an proposal to accept a large number of new characters that encodes symbols used in the old 8 and 16-bit micros.
My original question was specifically about the C64 character set, but we managed to get several others covered as well, including several symbols from the Atari ST character set.
The proposal was accepted, and the work continues to create a new proposal covering the character sets of even more old computers.
I'm disappointed to find that the Atari ST character set doesn't contain a bomb symbol. Obviously two entirely different things have been confused by my childhood mind.
The Atari ST did display bombs when an application crashed. But it was never part of the character set. It's a graphic that is displayed by the trap handler in the operating system. The number of bombs indicate the trap type, so three bombs means trap handler 3 (address error).
The symbols that do exist, but was not included in the proposal was the Atari logo and the J.R. Dobbs picture. Both of which are copyrighted, which is why they are included.
Hmm. The Atari ST did display bombs when it crashed. The number of bombs was a hint about the cause. It always felt very alarming and had me scrambling for the 'off' switch, though.
This is absolutely fantastic. I was under the impression that the Unicode Consortium reviled box-drawing characters, but I find them incredibly useful for documenting code relating to grids.
If I'm mistaken about something, I would appreciate clarification. Is my mistake using box-drawing characters? Or is it thinking that they're disliked?
I've seen different opinions on the topic. I'm not sure there is anything single opinion of the Unicode Consortium here, but I'm not a member of the Consortium.
What we can see is that these characters have become very popular and useful, so it doesn't really matter whether the original intent was to move these things to a higher level protocol. Today they are here, and they are useful.
There was a discussion on the mailing list some time ago when there was a suggestion to add codes for underline, bold, italics etc. I can tell you that that is not a very popular idea.
Sometimes characters were added to Unicode just because they already existed in another character set. The idea is that Unicode should support lossless round-trip conversions to any other character set. Box drawing characters were part of the DOS character sets that were carried over to Windows.
Edit: After seeing the image posted by @iruoy — https://i.imgur.com/OH3QTXQ.png — I have to disagree with you because the circle represents a zero (low voltage - FALSE) and the vertical line represents a one (high voltage - TRUE). If you studied computer science, logic, electrical engineering, or attended any basic course on electronics you would know that ones-and-zeroes are the written representation of an electric current, in fact, they are the basics of computing: Bits.
As the article mentions, the O was already in the unicode, it's definition was just updated, thus your font does already include it, while the I is a new symbol, and is too recent to be in your font and thus shown.
Yeah, the circle represents a zero - but it isn't a zeno, and the line represents a one - but it isn't a one. They're graphics that seem removed from the meanings they represent. Semantically they're removed too, since turning a device on seems to have complex, multi-byte significance.
And as graphics, the zero seems to represent "circuit connected" and the one seems to represent "circuit disconnected". Yet they have the exact opposite meanings!
I did an image search on a few search engines to make sure "on off switch" still returned the same ⏽/⭘ switches I've seen forever. Pretty sure the back of your printer probably has one.
Where are these switches uncommon? Where are they flipped to the opposite meaning as you described? Where has the circle ever represented a connected circuit? Most circuit diagrams are squared off, so which one was circular? I'm so confused.
It's clearly supposed to be a metaphor for genitalia - with the line suggesting 'presence', and the circle suggesting absence. (Note the latin 'vagina', literally means sheath). It works better in the case of the numeral 1, which even has a glans.
Whoever downvoted this was apparently offended by the association. However, there is insight in the point you raised, and you stated it intellectually - not in a juvenile sense, which was in the eye of the beholder.
In anthropology and psychology, primitive symbols like a vertical line or a circle are often found to represent the male and female dichotomy. In fact, it's quite prevalent cross-culturally as basics of a visual language. [citation needed]
Obviously, your source doesn't support your argument. Just because something once had one shape doesn't mean a new shape can't have new connotations. You're jumping from 'the zero was not always a hole', to 'the zero can never symbolize a hole'. Which is pretty strange.
This 1/0 man/woman white/black kind of binary is so ubiquitous in our culture it's kind of redundant to go and find examples - but it is in itself interesting that when you mention stuff like this, some people will always find a way to claim it is nonsense.
I think it's a desire for security amongst political turbulence - you say, this stuff is abstract and clean and without cultural baggage, so I can hide in it from the world, which is ugly and ambiguous and provokes uncomfortable reactions.
It's a kind of desire that's really common around engineering, mathsy people - I mean, part of the attraction of these subjects is you don't have to navigate anything sticky. So that's why it usually provokes a pretty extreme reaction if you profane the temple by bringing cultural stuff in.
A word or symbol does not represent every single thing that can be free-associated with it. If you want to show that "1" represents the Washington Monument, it is not sufficient (or even necessary) to point out that they're the same shape, you have to show some evidence that there once was a culture that used the symbol "1" whenever they wanted to say "2 15th St NW, Washington, DC."
Using '1' to symbolise the washington monument would be reasonable, if the washington monument was something people very commonly referred to. It would also be reasonable to describe that as phallic. It wouldn't be reasonable to use 1 to refer to an address in DC, because you're jumping from three obviously formally similar things, to the street address of one of those things.
It's a pretty good example of what people mean by free association - and why what I'm(1) doing isn't.
1- I mean, obviously I'm about the millionth person to make this particular observation.
You got a point there (pun intended :) about the symbol for zero.
I guess I was being generous with interpreting the parent comment, by mentioning the (possibly common and cross-cultural) visual association of a circle with womanhood, a straight vertical line with manhood.
I didn't mean to imply that the symbols for zero and one have direct historical origins in those ideas - like the parent comment might have suggested - but I do think there is a philosophical or artistic merit in drawing the analogy.
Interesting, I've seen the symbols for Mars and Venus being used to represent genders, but not a square or triangle for male.
As for zero/circle/woman and one/vertical line/man.. I guess in my mind, they couldn't be more obvious, universal and simple - to put it crudely: the hole and the stick.
But also: off and on, dark and light, absence and presence.
(So far, I have found no historical evidence that supports the above theory, no basis in logic or fact. I'll leave it as idle speculation on a possible primordial/mythological way of thinking in pictures, with perhaps two of the most primitive symbols.)
I have always seen it this way too: a broken or completed circuit. But I've accepted that my initial understanding of the visual metaphor doesn't match its intent. I think misunderstandings like these result when we abandon written language to point and grunt at tiny pictures.
I've always looked at the line as a connection and the open circle as no connection. It's just what I came up with when I was young as a way to make sense of the markings.
Interesting. I always took ⭘ to be an open circuit and ⏽ to be a closed circuit, using the symbol for a switch as the rough guide. Perhaps they're poor symbols if people can look at them and come up with rational reasons for each interpretation to be correct.
To be extremely pedantic, I think they’re correct at an implementation level: for most chips, as long as they’re receiving power at all from a power supply, they’ll run, unless a reset line is being held high. In most consumer electronics, that reset line is held high by default (by power coming off the wall or from the battery), unless power is supplied to a NAND gate coming from (a flip-flop in front of) the power switch on the front of the computer.
To “turn on” a computer (using the push-button at the front, rather than by flipping a power-supply toggle-switch) is, then, actually to SET that flip-flop, feeding a high input to the NAND gate, which in turn will turn off the reset line to the CPU.
(And, vice-versa, if your computer has a “reset button” on the front like some old computers do, that one throws the flip-flop back to its RESET state, which puts the NAND gate back low, which brings the reset line to the CPU back high. Wiring for push-button toggles is weird!)
Hmm, in most embedded processors I know (e.g., the ESP8266/32 [0], RPi's BMC2835 [1], or Ti's MSP430 [2]), you send the line down for reset but hold it high for running (either by internal pull-up or by actually holding the line).
I was looking at the dials on a refrigerator, and realized I never really knew which way is which - does turning the dial towards "cold" make it colder, or does the opposite, revealing more of the indicator from the "cold" end make it colder?
There is a tapered blue line with numbers, so is the wider end colder, because blue is cold and more blue is colder...or is the other end colder, because it has smaller numbers, and lower temperatures are smaller numbers?
Having the combining character version of that would be fantastic. Eventually, It would be amazing if all the details of math rendering could make it into unicode.
> Eventually, It would be amazing if all the details of math rendering could make it into Unicode.
I'm not sure if I agree with you or not... Generally I'd say I do, but we're going to have a hard time "finding the line". Meaning what counts as "math"? Surely 1 + 1 is, as is ∇×𝐇, and we can start to do things like x⁰. However, what about a graph with nodes and edges (just as an example)? Is that "math"?
One things strikes me about strings of characters... you can select and copy/paste them (at least in my native alphabet of Latin) very reliably. This property is not present with Unicode in general.
"All details of math rendering" probably fall squarely into "markup" territory, thus outside Unicode. At least so far any efforts trying to standardize this led to this. Math is generally nothing I'd call plain text.
Should Unicode be able to represent Egyptian Hieroglyphs? The lack of similar facilities for Egyptian is why Unicode is useless for representing hieroglyphs, despite having a goodly number of signs encoded.
This would be great. I need a way to cleanly document code implementing equations that contain letters in the superscripts and subscripts. x_i^j, for example, might be the position of Body i in Frame of Reference j. I can mentally pack it back together when it's just the one, but it would be nice if it could more closely resemble the original equation.
Alas, I was told that was "exactly what Unicode doesn't want to endorse."
I love how this was apparently accepted into the standard in 2016, and yet Chrome still displays an empty square instead. Unicode is such an unbelivable mess when it comes to support it's crazy. Windows displays it correctly for me in Word, but when pasted into Teams it comes out as a semicolon instead. Brilliant.
But it’s not a mess, there is a clear chain of responsibility: * consortium -> font developers -> app developers. If anything is broken, bitch to the first level (typically app devs) and wait for your grievance to percolate upwards as necessary; if it doesn’t happen, you know who to blame.
* most modern operating systems have hopefully sorted out their issues a long time ago.
If you have to complain, it's broken. As a consumer I don't care what the issue is and who the responsible party is.
Unicode is a clusterfuck exactly because the chain is too long and the implementation errors are too easy to make and the world is rife with incomplete implementations.
It has nothing to do with the update speed and everything to do with the standard itself.
Even if you were freezing Unicode now, you wouldn't encounter complete, correct implementations in the wild more than 50% of the time in the next 5 years.
I love how iOS doesn't understand this URL, and still doesn't have these characters. Too busy removing the Taiwanese flag, and implementing "Animojis".
I still think unicode should not have added emojis. The big guys are adding animated emojis now, and that's clearly out of scope for character sets. If they continue down that path unicode will eventually become SVG with animation. IMHO they should have stopped short of emojis.
I'm glad they decided to be inclusive since 100-200 million people were using emoji before they got added to unicode and without adding them all of those people's conversations and communication patterns would have been pissed on and likely those countries with high populations of people using them would never have switched to unicode because their customers would have demanded to keep them which would mean we'd have the nightmare of multiple encodings again. No thanks.
I love emoji and have since I started using them in 1999 (weren't added to unicode til 2010). Most of my friends and family love them, it makes their communication richer and more easily nuanced so I'm really glad they were added.
whether you like the emojis or not, they've certainly helped to drive unicode adoption. Good luck making the business case for why you're building support for some obscure international character set into your product, but tell somebody you're adding support for the latest emojis and all of a sudden having unicode 10 support is a big deal.
IMO Unicode shouldn't have ever added characters with color. Let color be the property of the markup. The original emoji were all monochrome, and mainly displayed on monochromatic flip-phone screens. The fact that there now exist emoji that are otherwise identical but have different color is absolutely idiotic.
Emoji were in color before they were added to Unicode. Even the original iPhone 3G Emoji release pre-dated their addition to Unicode (Apple used the SoftBank encoding). They were even in color on flip phones as far back as 1999 https://emojipedia.org/softbank/1999/
That doesn't add colour to, say, the smiley face, it just changes the colour from the default, usually yellow.
As well as adding colour browsers display emoji at a larger point size. It is really annoying when some pre-emoji symbol (like the phases of the moon in TFA) gets emojified and where your document had some discreet use of symbols it now has huge colourful icons drawing the eye.
And afaict there is no way of turning off this behaviour globally.
There are at least 7 variations of the heart symbol ️ differing only in color ️🧡, not including the heart (playing card) which also has multiple variations.
Edit: apparently HN limits the number of emoji per post, I originally included several.
Redundancy has never been a problem with Unicode. After the decision was made to add the symbols of the original encoding to the list of Unicode characters, they had to add all heart variation, otherwise it wouldn't have been able to be backward compatible.
Or alternatively add a color modifier? I'm not sure that would have been a better solution.
> apparently HN limits the number of emoji per post, I originally included several.
I am surpeised HN allows any emoji an all. Pretty sure they have been totally banned in the past. It seems like the orange heart is now (or has always been) whitelisted. 🧡🧡🧡
I didn't take it the GP was talking about Animoji, as those are motion-captured, rendered in 3D and sent as pre-rendered video files (GP mentioned something that could be represented as animated SVG). I don't think anyone is proposing a way to encode facial motion-capture into Unicode.
A graphical or iconographic concept that cannot be unambiguously represented by a single character in any natural language and which does not have a single standardized rendering (e.g. the Power symbol does have a single standard rendering and is therefore not an emoji).
I'm no character or Unicode expert - that's just a wild stab in the dark, largely based on why I dislike emoji being in Unicode.
My sister pointed out that I'm a little random about mixing them. I use both, based solely on what I feel like typing at the time. And I'm acutely aware that the emojis she sees are not the ones I do, but the ASCII ones are pretty much the same, since I use Android and she's an Apple-head.
Besides, I love the slobbering ASCII guy ;-p~~~~ None of the emojis let you amplify slobber according to need like that.
I've changed my emoji writing style from the canonical ;) to ; ) just so that editors won't auto "correct" my proper ASCII emoji into an unholy yellow winking face.
For 99% of all development, Unicode is a non-issue. Perhaps you have to consider Right-To-Left support and that is about it. Even then good RTL support is more about page layout than the actual mechanics of displaying the text.
If you have a better idea to enable people around the world to use computers in any language that doesnt look something like Unicode, I’m sure everybody will adopt it.
I get that I'm in a minority here, but I'm not even sure such a thing is necessary or a good idea. Generally speaking simple things are better than complex things, and from a computing perspective human written languages are ludicrously complicated. I'm not convinced that complicating all our software to support poop emoji is better than allowing human language to adapt to the medium in the manner it has done since language was first written down.
> For 99% of all development, Unicode is a non-issue.
Until it suddenly is an issue. Like, dealing with unpaired surrogates in Windows file paths.
I mean, we're talking about an encoding where you need a rather large database and quite a bit of code just to tell how many characters are actually in a string.
Once a character is encoded, it will not be moved or removed.
This policy ensures that implementers can always depend on each version of the Unicode Standard being a superset of the previous version. The Unicode Standard may deprecate the character (that is, formally discourage its use), but it will not reallocate, remove, or reassign the character.
Apparently it's semantically fac·sim·i·le. Which means (according to Google) "an exact copy, especially of written or printed material."
My argument is something like, why not just write FAX. Or is the counter that some fonts will specialize this character to something closer to the native language? That seems unlikely, and instead people will probably learn that FAX means "to make alike", from Latin. Or is it that we need to make it just a little bit above the baseline to indicate that it's special. Surely "FAX" isn't the only thing that should be allowed to be special, right? But then that's a whole can of worms. Anyway, I'm rambling...
This was the best I could do in the limited time I had (I really should be asleep by now).
Well, fax machines are still in use here (most notably in mdeical offices, it seams), but superscript abbreviations? Never seen them in the last 40 years. I mean, yes, I have seen the letters FAX or TEL in front of the number, but never really explicitely as a new character (AFAI can tell).
This would also fit best with the experience that people crafting cards or letterheads never really know all the intricacies of Word, or Unicode, or whatever they use, and just "make it look good" - use tabstops instead of tables, simply type FAX and mark it as superscript, etc...
> This would also fit best with the experience that people crafting cards or letterheads never really know all the intricacies of Word, or Unicode, or whatever they use, and just "make it look good" - use tabstops instead of tables, simply type FAX and mark it as superscript, etc...
I mean, that’s widely known anyway. Look at most letterhead templates online, pretty much all of them are broken and quite painful in the way they’re built. Broken tables, tabstops, all combined painfully.
Often enough proper tables would simplify a lot, if combined with columns one can create amazing things. If one even adds automated hide/unhide elements (e.g. page count, automated Internetmarke or hiding it if unused, etc) one can create stuff that’d save hours of work every day.
Actually, nowadays PDFs in email are also accepted instead of fax! (or emails signed with the ePerso S/MIME function).
So far a court, a health insurance company, the national pension fund, and some municipal administrations I've had contact with all have accepted PDFs attached to emails just fine :)
And if you ever need actual fax functionality, the most common home router (Fritz!Box) has a virtual fax machine integration, so you can send and receive faxes with an app or an email gateway from your own VoIP SIP "landline" connection.
I assume it was intended to indicate after a telephone number that a telefax was connected to it. For example when writing contact details on a business card.
I imagine the only way that will ever happen is a situation like the gun to water pistol transition, where enough implementers agreed to change the representation.
> There was some discussion around ⏾ as several “moon” characters already existed. None of them [...] convey the semantic meaning of “Sleep” – so ⏾ was accepted.
I'm not convinced ⏾ conveys that meaning either, unless it's explicitly used alongside the other new symbols. And if it is used alongside those symbols, a couple of the existing moons could also work just fine.
(⏾ is appearing as tofu on HN for me, but I'm just going to roll with it.)
It's disappointing that OS-included font support for these has apparently lagged. iOS 13.2 supports emoji from Unicode 12[1], but doesn't ship with a font that includes these symbols. Anyone know if any OSes do support them?
By the way, if you're viewing them on the unicodepowersymbol.com website, they show up because of a web font included on the page. So here they are on a site without the font:
⏻⏼⭘⏽⏾
Does anyone know how to add these to Ubuntu 19.10? When I search for Power in Characters, it has the 'Power On' character but you can't see the symbol [1]
Maybe only allow them if you have a certain reputation/points like downvoting and flagging has? Then if it’s abused, remove that privilege for that person (IIUC, flag rights can be revoked)
I believe those are all trademarked. Unicode tries to avoid those.
> A spray can to represent spray paint, insect spray, or spray lubricant.
Interesting idea! I wonder if there'd be any interest in getting wider coverage of other bitmap paint tools: rectangular selection, lasso, paint bucket...
> There's a power plug but no power socket.
That's problematic from a localization perspective, as electrical sockets vary widely from country to country, and some of them may be difficult to recognize as a socket at a small size. For example, some European countries use a socket which is made of three circular pins in a straight line -- it'd just look like an ellipsis.
Besides, there isn't a lot of symbolic meaning that's conveyed by a socket that couldn't be expressed just as well with a plug.
> There's no staples or stapler.
Maybe. There isn't a lot of symbolic meaning to these either, though.
> There are no appliances—no oven, microwave, toaster, mixer, washer, or vacuum cleaner.
A lot of those will just look like white boxes at text size, and -- again -- they don't have a lot of symbolic meaning.
You might be able to make a case for an upright vacuum cleaner, though, since that's visually distinctive and is associated with cleaning.
> There's no traffic cone.
Oh, I like that idea. It's got some symbolic meanings, too, like "warning" and "under construction". There is already a construction sign (U+1F6A7), though.
To be fair, a lot of existing unicode points don't really have a semantic meaning either. Eggplant only got a semantic meaning after becoming a part of unicode.
I guess the power socket might prove contentious, since they look very different across countries... power plugs, although variable, can be identified more easily since they look more alike and have a power cord attached.
The traffic cone would be good for VLC! They could give rename their app so the name matches the icon :-) That got me thinking: are there any common apps that use emojis in their name?
I miss 7-segments displays. It could be 8 Unicode characters, for instance (empty, plus one per segment), or some other scheme.
A hiking/backpack emoji would be useful at times, but then a lot of other activities could make it in: Rollerblades, music instruments, etc.
One pet peeve of mine is the "person shrugging" [1] emoji looking identical to "woman shrugging" [2] with most fonts. I prefer to use genderless emoji, but that does NOT look genderless. I also dislike skin color selectors, but again, that's personnal preferrence.
That post inspired me to submit some Unicode characters as well! I found 11 Hakka & Taiwanese characters when trying to scrape & parse the Bible as plain-text, and wrote a blog post about my experience of the submission process.
Unicode has 17 planes. Each plane has 65,536 code points, so the total capacity is 1,114,112 code points. In practice it's a bit less, thanks to surrogates, private areas, and a bunch of "non-character" code points. That still leaves close to a million code points.
Last time I checked, just over 13% of the available public space was allocated. Most of the planes remain unused.
And combinations are used, so e.g. a new emoji only takes zero to one new code point, not one code point per variation. (zero because if I remember right the emojis for families are just something like "woman + boy + girl + man", all existing characters, joined by Zero-Width Joiners)
You can blame UTF-16 for this mess. Unicode was originally meant to be able to encode two billions (2^31) characters. It bent over backwards to accommodate the limits of the bastard child that is UTF-16.
Maybe they did it because Windows was backwards and used UCS-2, and later, UTF-16? If somehow, Windows managed to switch to UTF-8, I’m sure they (Microsoft) would mess it up and keep the 4 byte limit (imposed by Unicode) there even if it’s later removed (for backwards compatibility). What Microsoft really needs to do, IMO, is rewrite the Windows API to use UTF-8 or UTF-32. Make a `wwchar` type or something...
... so yes, as per convention it should have a (2016). The best way to get the message to the mods is via email (using the Contact link at the bottom of the page).
The symbol was already present, but it was given new documented semantic content ("meaning"). Previously it was just a circle, now it's also the symbol for "power off".
I know a guy, who was fighting hard with unicode consortium for adding two characters with meaning "begining of blob, end of blob". Imagine how simpler the coding would be without the need to escape anything. Unfortunately he didnt succeed. They were more busy adding smileys.
Not sure why /s. Also, you only need escaped-end-blob. start-blob can safely be used inside a blob. So it's only 3 new codepoints, which is genuinely an interesting proposal.
Because that doesn't round trip. You need to be able to distinguish whether the blob originally had "end-blob" or "escaped-end-blob". So now you need another character for double-escaping, and so on, and so on.
To avoid that issue you're back to either adding a backslash, or doubling up the character inside the blob... but if you're doing that you could have just used " all along. No need for new characters!
If you really don't want to change the contents of the blob, and can't length-prefix, then you could also use a new UUID as your delimiter each time you embed a blob.
If you're telling developers not to embed that character anywhere then you only need one end of blob character. So not 'no', I'm still right in saying that having two end characters is a non-solution.
And you still didn't explain why the existing ASCII control codes don't solve your problem. The suitable ones are also not supposed to appear inside text.
It's an interesting idea, but you still wouldn't be able to encode the byte sequence for "end of blob" as part of the blob data... without escaping it.
They shouldnt occur in blob as they would be designated just for this. All other characters can occur in real world, those two could get there only by mistake.
What if I want to put them in a comment like this, to discuss them? Well, you might say I shouldn't do it and should just refer to their code points, but basically that's a form of escaping as well.
How much blob is blob? If it is binary, then there is a chance of clash, no matter how you dare them bytes to do that. If it is always-valid utf-8, then use 11111111 and 11111110 (or any other out-of-band markers available for source encoding) to open and close your message. Not sure what’s the point for it though.
Are we talking putting a sequence of 8-bit bytes (octets) into Unicode characters by mapping all 256 possible byte values to the first 256 Unicode code points?
If so, if I'm not mistaken, this is actually less efficient than just using Base64 encoding.
When you put those Unicode characters into UTF-8, the first 128 code points are going to require one byte (with a leading 0 bit). The other 128 of them are going to require two bytes. So that's 50% overhead (assuming the blob's bytes are evenly distributed) because half of the values have 0% overhead and the other half have 100% overhead.
Meanwhile, Base64 sticks 6 bits in each encoded character. In 4 characters, you can fit 3 bytes of your raw info. So that's only 33% overhead.
My original question was specifically about the C64 character set, but we managed to get several others covered as well, including several symbols from the Atari ST character set.
The proposal was accepted, and the work continues to create a new proposal covering the character sets of even more old computers.
https://www.unicode.org/L2/L2019/19025-terminals-prop.pdf
I'm quite happy that my modest question led to some real progress.