Possibly useful: midjourney offers image creation via discord, 200 images/10$ or a subscription plan for more. It's great fun and possibly is an easier way to test out the tech.
AI art notes in general:
* this will permanently raise the standard of "programmer art" in prototype UIs, for example.
* I expect large-scale AI image creation social games will take off very soon
* utilities to listen to your conversations and illustrate people's points with poetic interpretations on screens next to you with captions.
* this will/should be integrated with emoji / emoji kitchen-like systems - imagine iterating on a custom emoji-like response in snapchat!
* trademark/copyright wars over input data may ruin this
including the GPU memory I say it's insane :) -- Also, you need more than 32GB of RAM to follow the instructions verbatim in the repository. People were saying the scann indexing fails even with 64GB of RAM.
Yeah but that's out of the context of multiple am4 boards and their RAM caps being able to replace a GPU or being cheaper than a GPU.
Yeah they're technically cheaper than a GPU, but your costs are still gonna be up and they can't replace a GPU because regardless you'll need an nvidia card, unless there's a workaround for CUDA requirements.
Computing neural nets isnt something a CPU is optimized for (matrix calculations), This tech is based on gpu optmized LSTM's, acka Transformers, they can run on cuda, reducing time. GPU's perform those calculations fast for huge matrixes.
Doing it on a CPU (they're no good in fat matrix caclulations) it takes forever.
And yes.. essentially those neural nets are extreme huge matrixes with some predictive math in between.
I can only dream. While at first I was excited by the mini and the mega versions of DALL-E 2 people were hosting, even if they were blurry and often a little dumb, by the time I got my DALL-E 2 invite, I had ideas ... most of which were swiftly rejected. Not only that, it suggested that I would lose my asking privileges if I kept it up with the weird prompts.
Sigh, when will I have a nightmare generator of my own?
My first request was a Captain Beefheart lyric ("making love to a vampire with a monkey on my knee") because I wanted to see how it would take something concrete "monkey on my knee" and integrate it with not only a metaphor, but one that had changed over time ("making love" used to refer making out, a.a.a. prolonged kissing, rather than sex). Additionally, I wanted to see if it had hoovered up a rendition of that exact lyric off of Exploding Dog (an artist makes odd drawings from submitted phrases and prompts) some twenty-two years prior.
Of all of the celebrities, it seemed reticent to draw Fred Rogers as, well, Fred Rogers, despite me being able to make some fairly appalling melds of others, like Rod Serling and Marilyn Monroe.
Additionally, someone has put in an overly-touchy anti-vore filter.
It's really easy to find a false positive prompt so you have to be really careful.
It puzzled me for a while, but then it made sense:
"A hairy ball with lights at the end is manipulated by a human"
Not a great prompt, but it seemed like a good start. I imagined fiber optic strings attached to a ball and a human doing some stuff around it like a wizard or something. But then I realized that "hairy ball" might lead to something completely different :)
Okay, no problem, I changed it to "tumbleweed" but that triggered the drug filter so I gave up and tried something else :)
There's something hilarious about having all of this "such wow, much AI" machine-learning safeguarded by primitive wordfilters that get Scunthorped like chumps.
And having spend the last 24 hours in their beta testing phase, I can only say. It blows Dall-E and MidJourney out of the water.
They give you the seeds, that are used to populate the images, so you can fine tweak an image by adjusting slight details in a prompt. It is really awesome.
What I hate about the current ML space is that half the time is spent trying to get dependencies installed and set up correctly to match whatever project you're trying to run expects. Everything is just so brittle. Following tutorials like these never work out of the box.
I recently went through this and conda handled the python deps pretty well, but the nvidia cuda stuff was a massive pain to get the right things installed. I think it’s safe to blame nvidia.
It seems I may have stepped on a few toes here. For a lot of ML engineers I've talked to, conda is a living nightmare if one tries to go beyond one project (or maybe multienvironment) on one machine.
Sometimes even one is a problem.
I don't think I've talked to a career ML engineer yet that has had a positive work experience with conda, that I can recall. Usually the question has been a good way to trauma bond, which is always good.
It seems like this is more an academic kind of thing?
What I can’t stand is the incessant paternalism that seems to surround AI from big tech,
I got access to DALL-E a few weeks back (after a waitlist… come on) and tried it out. They want all this personal info to even access it, and then they have this oppressive “content policy” that removes anything remotely fun.
For example I tried “Trump riding a velociraptor on mars fighting aliens” because why not? Sounds hilarious. Turns out any query with the word Trump is banned and I got a warning about “repeated violations might remove my access”. It’s not just trump, anything remotely non-corporate-friendly is heavily filtered. Don’t you want to just make images of a cute teddy bear made of pizza instead?! I’m just so tired of it all. It’s like the puritans of the 1990’s won, but they’re corporations now.
Personally I think they can shove that nonsense up their ass.
Don't worry, we'll get much better models at much lower costs (midjourney, stablediffusion) that you can run on your own hardware, and avoid the moral censorship that the paternalising non productive members of these companies want to subject the world to
Pepe is banned. Why? It doesn't matter. Someone will cut around, and they'll get my compute instead. And I'll get my autogenerated pepes, regardless of whether some non productive AI "ethicist" thinks a cartoon frog meme is offensive.
I think they have a blanket no celebrities or real living people policy for their engine. I think that’s fine since they don’t want to be called out for helping produce defamatory content.
They just want to avoid bad PR. That’s sound safe.
>Any prompt referencing politics, violence, sexuality, or even the concept of "health" is banned
>This does occasionally include the very same LGBT themes their "hate" rule ostensibly protects, although not consistently, so there's no way to know whether an LGBT prompt will cause an account strike
>In practice the list of bannable offenses is much longer than this, and includes everything from the concept of death to anything violence or politics-adjacent (e.g. nothing about war, conflict, or any kind of weapon)
>OpenAI refuses to share this list because it's part of a "contextual" filter, even though it demonstrably bans words
They advertise DALL-E 2 as artistic, but it can't make art with restrictions like this. The best it can do is corporate content farming, and even then there's no way I'd make part of my marketing pipeline depend on a service this fickle.
One thing is python's dependency management is just insane in itself. When you then additionally have to have it install basically drivers for the gpu, compile loads of native code etc it becomes a hot mess. And half the ML things you want to try are academic experiments not really made for distribution, so they were made to work and that's it. So if you have some different computer setup, minor version mismatch of a dependency etc it will just break. Datasets or models excist on some url that only works for a month.
It's a shame, I think the field is one where reproduction of results should be really welcome and feasible.
Yeah, a docker image would be nice. Even a Dockerfile, even though that in itself may not guarantee reproducibility if you try to build it later. (And may have issues with gpu drivers etc). But at least it documents all assumptions about the setup.
That's the cool thing about publishing the Dockerfile and an image, one is an example that may or may not break, and the other is a functional snapshot of a working config at that point in time.
No. This would take a very long post to explain why, but in short, it depends heavily on which NVIDIA Cards / CUDA drivers / versions of Linux you're using. Or you're running on AWS GPUs or TPUs and paying a lot more either in the form of straight up dollars or optionality.
It's pretty good but certainly not flawless.
Admittedly I probably don't have high standards. Cocoapods is one of the few I'd consider objectively "bad", though apparently among iOS developers it's considered one of the better ones (!).
You’re implying it isn’t like that for every language/framework. I’d love to know what you’re working in where the dependencies aren’t always a problem. Even Carmack has complained about this.
Never had an issue running a random java project, for instance. Either because people package all dependencies. Or because maven, while not popular, mostly just works.
Of course, ML is a bit different in that it often needs more drivers/gpu setup. But there are loads of ecosystems that don't assume you have package X installed on your system, like python's does.
I definitely have the same experience trying to get C++ game dev libraries running. Trying to wrangle a bunch of open-source library dependencies into mutually acceptable versions whilst also keeping up with (intentional, or unintentional) operating system breaking changes is a never ending steeple chase.
I'm not so sure about this...I guess maybe there's some project specific stuff, but the good projects are semi stable and oftentimes it's just PyTorch code that you can pull and use elsewhere if needed.
Other people's infrastructure always seems bad to me, as I'm sure mine does to others. The cycle and circle of life! :D
That's the reason why I avoid using Python as much as possible (while I really like the language). Another reason is massive stacktraces everywhere instead of a meaninful user-facing error messages, but that's a smaller cultural thing than dependencies hell.
Though I suppose it's not really Node's fault that developers are importing modules like "leftpad".
What's really fucking insane though is that there are modules like "trim-newlines" [0] that exist merely to trim \r and \n from the beginning and end of a string...and that this is such a hard task to get right that it's in version 4.0.2...and that a previous version had a security vulnerability [1].
As much as I hate to say it as I hate how these kinds of packages attach themselves to large projects to inflate download numbers for a resume I think most of us here would probably have created the same CVE doing it naively. It was a regex DoS due to exponential runtime not something obtuse like extra bloat being poorly made.
It would be lovely if anyone other than NVidia made GPUs useable for GPGPU. I started with Radeon for the open-source factor, but it turned out to be useless in virtually every respect. I bought an NVidia card.
I'm now tied to CUDA, but I didn't need to be. I was starting from scratch.
The longer before AMD can ship something which actually works, the more entrenched NVidia+CUDA become.
It's not that they aren't trying, it's that when you reinvent the wheel, you have to do more work. Microsoft is introducing `tensorflow-directml` to avoid this problem by implementing a CUDA equivalent in directX. AMD has ROCm, but it's not well supported because it's not integrated upstream in `tensorflow`.
- I found out I could only use it for compute headless. WTF?!?!?! (https://www.phoronix.com/news/Radeon-ROCm-Non-GUI). If it was driving a monitor, my machine would crash hard. There wasn't even an error message.
- A lot of other stuff didn't work and just resulted in odd crashes, or worse performance than CPU. I don't know why.
- Within 9 months, AMD discontinued support for my card. I raised this as a warranty issue (suitability for advertised purpose), but that obviously would go nowhere without a lawsuit. I had a very expensive brick.
- AMD support channels were non-existent. There literally was no way to reach anyone.
I bought an NVidia card, and it's been working well ever since.
ROCm is not well-supported because it's absolute garbage. You have *less* work reinventing wheels, since it's been invented once. You have more work to get community support and network effects, since you're starting out behind. Fundamentally, though, that can't start to happen if your system doesn't work at all.
I agree with you they're trying, but they're trying incompetently.
If ROCm was half the speed of CUDA, and wasn't integrating into the latest-greatest frameworks, but it was stable and working, I'd make it work. It wasn't anywhere close to stable and working.
You blame CUDA because you have never played AMD + ROCm. Furthermore OpenCL is on dead-end street. With standard programming languages natively supporting CUDA, it has no sense to add extra layers.
With the explosion of image gen AI out there recently, I feel confident that within a year anyone will have access to free or nearly free totally unchained image generation on par with or better than current DALLE-2.
Running it is one thing, but where are we going to get the model?
Who is going to give away their trained model (that presumably cost millions) to the public? And getting the blame for any nefarious usage that will eventually follow?
Have they said they are going to publish the trained model? So far they are going the exact route as OpenAI (closed beta access with similar prompt content policy).
I don't see the content policies[0][1] as much different, but perhaps it's a matter of opinion regarding the individual points. I don't know how strictly the policies are enforced, so perhaps there is a difference there as well.
What I find strange is that StabilityAI doesn't outright ban using actual living people in prompts (where I can imagine wide usecases for abuse). At the same time both OpenAI and StabilityAI block sexual content/nudity/violence. I don't see why such content is supposed to be harmful.
Anyway looking forward to the model, thanks for informing me about this project.
OpenAI with violence means that you cannot generate weapons (even knives), StabilityAI with violence means anti-Semitism, misogyny, etc. No one will mute you in generating nudes if it was not your intention, this means that the model is able to do this while the OpenAI model cannot. And anyway we are only talking about the limitation on the Discord server, when the model is free and open source you will be able to generate whatever you want.
I've got the LD repo installed and use it alongside VQGAN-CLIP and ... VQGAN is pretty much always the better result (albeit much slower.) I -think- this is because the (tagging?) that the LD is using isn't as comprehensive. Like it doesn't seem to know what "brogue" means and generates nonsense with attempts at the word "brogue".
(Caveat: I have updated neither repo since about November because it took a faff to get them working and I do not want to touch them again.)
CLIP is my favorite of all of them. But it's not especially friendly - like riding a horse bareback. I've made some ridiculous, very bizarre and interesting stuff with CLIP that just isn't possible with DALL-E.
Hi, Sorry if this is extremely ignorant. But I was wondering the other day... are we 100% sure they arent feeding images from a search engine index into the ML data for Dalle?
It just looks surprisingly like its mixing and matching the top returned image searches from an index.
I'm only saying it looks like that - not that it is ofcourse.
I don't want to undermine anyones works here, I was just wondering.
You can read the whitepapers for Flamingo, Dall-E 2, Imagen, and Parti to see how diffusion networks and GANs work to create these images. I wrote up two huge paragraphs trying to explain it simply, but then I realized that I don't understand exactly how they work, either. Best source is the published research.
Most networks have to do with large language models, text embeddings, and image embeddings.
Seems like a decent amount of training data does end up making it into the output - some of those images have a barely legible "shutterstock" white footer in them or something
TBH ... I absolute hate this AI generated strange images. Most folks are celebrate this technology but I don't know what it's really for other than confusing brains.
I always ask myself "is this real" ... I don't like to ask myself if something is real or don't like to be confused.
Is anyone else out there who feels the same?
I'm working on an indie game and I get surprisingly decent results if I ask for a basic object (think 300 mutations of a key or book) and downsample it to the 48x48 sprite sheet size. They look weird as hell at normal resolution but as pixel art with specific resize sampling methods it's a lot less weird, and good for rapid prototyping.
The originals look like malformed mutants at 256x256 with nonsensical key bits and strange twisted handles.
Note that this does not replace a good pixel art artist either - for things like walls, anything that needs to connect together or tile like a dungeon wall or castle, or a cohesive art style, this will not do. But for rapidly identifiable different quest item drops it's not terrible.
I'm not with you, at all, but I did have a knot in my stomach the otehr day reading a quote from one of the founders of MidJourney, David Holz who apparently said
"Within the next year or two, you’ll be able to make content in real time: 30 frames a second, high resolution. It’ll be expensive, but it’ll be possible. Then, in 10 years, you’ll be able to buy an Xbox with a giant AI processor, and all the games are dreams.".
As someone who used to take quite a lot of psychedelics, there's something quite terrifying about the promise of this premise - it takes me back to the wrong sort of trips where the ever-unfolding strata of reality became too much bear and I'd end up mentally cowering beneath the unrelenting bigness of it all.
The infinite dreamspace is unholy-big and nt somewhere I'd much choose to get lost.
Or maybe I totally will...
- ed - Notwithstanding the obvious realisation that I could just take the helmet off, or remove the contact lenses, or whatever we have in a couple of decades.
At least for me, the intensity of a psychedelic trip was little about the visuals and a lot about that indescribable consciousness change. I don't think AI's are going to be collapsing people's innate sense of reality (the way that psychedelics does at least), perhaps leaving them questioning what media is and isn't real though.
When computers started beating humans at chess, there was a similar moment of anxiety.
One good quote from that era: "You should be no more concerned that a computer can beat you in chess than that a car can beat you in a race."
Technology changes the experience of being human. Chess used to be the marker of human intelligence; that passed. Now some forms of creativity will too.
I'm not anxious about it. It's just not pleasant to see. There are some painters like Dali (I assume Dall-E is not by accident) or Bosch who had created kinda similar "products" but to me they are interesting and fun to look at. The AI products are kinda useless creap, mashups without sense from dump machines and that's give me this wired feeling which makes me hate it.
Yes, I agree. I believe that technology is like pollution, irrespective of its net utility on society: once it's been invented, it's practically impossible to go back to the state in history where it wasn't invented. Pretty soon it may begin to be the case that the invention of technologies that appear interesting on the surface have unexpected societal effects when they become mainstream, and it will be too late once the data and knowledge has spread to millions of individual hard drives.
In the case of DALL-E and GPT-3, I believe they will undermine human creativity and usher in a new era where fewer people care about slowly crafted skills like painting that require years or decades of practice and patience to master if the barrier to just have the art in front of you in seconds becomes so low.
It might get to the point that people will divide themselves on ideological grounds that AI art is "impure" and attack each other if they cannot prove its origin. I'm not saying that I'm one of those people that would join the pushback, given that a coming explosion in AI art is all but inevitable, but I'm describing what I think the new technology is going to cause the masses to believe - the population that aren't enthusiastic tech evangelists. When the expectations of millions of people are set by DALL-E and the like, I don't think the prospects will be universally positive.
Look at the number of people on HN asking if certain comments were written by GPT-3; they seem to appear weekly. I don't think implying that you didn't actually write the comment you posted will be taken very well by some people outside of an insular circle like HN once the general public becomes fully aware of AI, and could very well grow into a well-known insult if there ever comes to be a rift in opinions around AI art.
I assume most images and videos on the internet are doctored and have been since the invention of photoshop.
I don't feel the need to check for reality. I've seen screenshots from games that look more realistic than some pictures I've taken (Forza on max settings at the right angle might as well be a photograph) and this trend will only continue.
This tool can be considered more of an automated version of meme communities, where dedicated members will spend an hour photoshopping muppets into historic events and making the pictures look absolutely believable. The only novelty I see is that the computer now does a lot (but not all) work for you.
There are nice opportunities here. If you need a stock photo of something very specific, you'll soon be able to generate one with the right query and the right AI. Small companies can generate fancy brands without paying professional designer's fees, especially if all they need is a billboard and not a whole suite of office supplies. You can generate your own posters and decorations featuring interesting landscapes and scenes in any style you want.
The current iterations of these algorithms are quite limited in many aspects and sometimes uncanny or even horrifying, but I look forward to a future where I can imagine something, describe it, and have it rendered into digital art just like I pictured, without having to spend decades on honing my skills as an artist.
It's not about fake images trying to show reality but it's about images crafting something from reality and showing something unreal. The brain want to interpret it and fail.
This fail drives me nuts.
I think AGI, when it inevitably arrives, will be so disruptive to our brains as sugar and sedentarism. We've not evolved to question whether what we see, hear and feel is real.
There is no need to worry about the threat of AI. While it is true that AI has the potential to cause great harm, it is also true that AI has the potential to do a great deal of good. As long as we are careful and responsible in our development and use of AI, there is no reason to believe that it will be anything other than a positive force in the world. -- generated by OpenAI from a prompt
Here's the code:
This took 5m25s to run on my laptop (no GPU configured) and produced a recognisable image of a Raccoon reading a book. I tweeted the resulting image here: https://twitter.com/simonw/status/1550143524179288064