Bringing D&D/AD&D campaign settings to life with Stable Diffusion

Non-Edgy Gamer · Sep 23, 2022

Bigfass said:
But yeah, Increasingly Nervous Man is saying exactly that, because intelligence requires a cognitive model of the world, or at least the given task.

You don't need intelligence to draw images anymore than you need it to generate text. (Reddit proves that every day.)

You can give all sorts of examples of the AI failing to generate something ridiculous - like, say, a dog wearing VR goggles - but it doesn't change the fact that it already can and does generate useable or near-usable images and has been doing so for months.

You can say "AI won't work" until you're blue in the face, but the fact remains that it is working. Just like GPT is already being used by writers, social media companies and gamers, and just like game devs are already beginning to use AI voice actors.

Bigfass said:
A household robot that does the laundry but never puts the dog in the dryer is a great thing to aspire to but it does require intelligence.

Funny, but the Roomba not having intelligence hasn't stopped people from buying it. Almost like there are other ways to work around problems besides truly intelligent AIs.

The Roomba has no "mental model" or understanding about any thing in your house. It just knows the boundaries it's supposed to draw in and stops when it hits an object. No dogs vacuumed up yet.

These AIs can similarly be wrangled into doing what you want them to. Whether with sketches, training or clever prompts. All without anything capable of passing a Turing Test.

This thread really isn't the best place for this discussion though.

Bigfass · Sep 23, 2022

Non-Edgy Gamer said:
The Roomba has no "mental model". It just knows the boundaries it's supposed to draw in. No dogs vacuumed up yet.

But the Roomba does have a mental model of its task. It's very limited, but it understands things like remaining battery, the location of the charger, the amount of trash it has inside, the time it's supposed to go to work, etc. It's been explicitly programmed to understand these things, and it has nothing to do with AI.

Non-Edgy Gamer said:
You can say "AI won't work" until you're blue in the face, but the fact remains that it is working.

Sure, there are products that are based on neural networks, some of them actually useful, increasingly so. That doesn't mean that the current approach is not a dead end as far as something that could be called Artificial Intelligence is concerned. Neural networks latch onto statistical characteristics in the training data, and have no concept of meaning.

There are people much smarter than me arguing both sides of this, with decades of experience in the field. There's no way for either of us to know for sure, one way or the other. I just find the sceptic point of view a lot more persuasive.

Non-Edgy Gamer · Sep 23, 2022

Bigfass said:
But the Roomba does have a mental model of its task.

No, I explained that it didn't have a model the way you were saying SD needed and why. It only "knows" what area to stay in. It doesn't know why. It doesn't really know anything.

You're the one who came up with the criteria for a model that requires intelligence. The Roomba doesn't have this. It has no more of a model than Stable Diffusion has. SD's model, by this new definition, is just vastly more complex.

Bigfass said:
It's very limited, but it understands

It does not. It understands nothing. There is none of the intelligence you said was needed. There are just several programs running on a tiny hard drive. It's no more intelligent than whatever you're typing on right now. Far less, in fact.

You're balking at one of the more advanced AIs to become available to the public in the past few years one minute, but now you think a Roomba has more intelligence just because it doesn't bump into walls.

Bigfass said:
Sure, there are products that are based on neural networks, some of them actually useful, increasingly so. That doesn't mean that the current approach is not a dead end as far as something that could be called Artificial Intelligence is concerned.

You keep trying to shift the goalpost to true AI, when that's never been what SD was trying to achieve or what any of its users want.

Sure, it'd be more useful, but generated art is the goal, not a toaster with feelings.

Bigfass said:
There's no way for either of us to know for sure, one way or the other.

Pretty sure the AI images I've generated show me how possible it is to generate images with AI. I think I know that for sure. But maybe I just haven't read enough articles by over-educated concern trolls yet.

Bigfass · Sep 23, 2022

Non-Edgy Gamer said:
You keep trying to shift the goalpost to true AI, when that's never been what SD was trying to achieve or what any of its users want.

My contention has been that neural networks are incapable of understanding meaning, so they will not replace artists. My first post in this thread was in response to "artists on suicide watch".

I don't know what SD is trying to achieve, other than hype and $100m in VC money. And you don't either. Maybe they do:

Non-Edgy Gamer said:
You're balking at one of the more advanced AIs to become available to the public in the past few years one minute, but now you think a Roomba has more intelligence just because it doesn't bump into walls.

You're being intentionally obtuse. You understand exactly what I meant by saying that the Roomba is more aware than any neural network. There exists an abstraction for a room inside a Roomba. Nothing of the sort has ever been demonstrated for a neural network.

rusty_shackleford · Sep 23, 2022

"muh human factor"

reminder the average artist can barely communicate at a third grade level

Non-Edgy Gamer · Sep 23, 2022

Bigfass said:
My contention has been that neural networks are incapable of understanding meaning, so they will not replace artists. My first post in this thread was in response to "artists on suicide watch".

You don't need to replace artists in every field. Just in enough of them. It's the danger of outsourcing and automation: replace enough jobs and you lower the labor demand, resulting in layoffs or a wage reduction.

Bigfass said:
I don't know what SD is trying to achieve, other than hype and $100m in VC money. And you don't either. Maybe they do:

I would say they're trying to achieve an advancement in open-source AI no longer dependent on companies like OpenAI who purposely limit access to the public, while letting select individuals and organizations use it freely.

Emad has long been a member of several AI communities working to that effect. And not just for art, but for GPT as well. If he wants to make a profit while doing so, good for him. But it wouldn't be the first time someone got bought out.

As to "hype" though, the hype is from the quality of the work itself. Notice how no one is valuing Crayon at $1 billion or writing panicked articles about it.

Bigfass said:
You're being intentionally obtuse. You understand exactly what I meant by saying that the Roomba is more aware than any neural network.

I understand that you're wrong. A neural network goes through a similar process to a Roomba in a way. Both "learn" what they're supposed to "draw". The Roomba just has a much simpler task and much more easily defined limitations.

Bigfass said:
There exists an abstraction for a room inside a Roomba. Nothing of the sort has ever been demonstrated for a neural network.

Lol. There is no abstraction of anything to the Roomba. There is a set of coordinates and a program that says "don't go here". That's it. It doesn't understand it as a map, or know what any of the objects it bumps into are. "Don't go beyond this point." That's all.

If you're equating that to an abstract model of reality, then Stable Diffusion has a much more complex abstraction that it was trained to learn. "Draw it like this, but based on these factors, don't draw it like this." There's literally a "model" file that's the result of that.

But of course, neither have intelligence, understanding or true abstraction. And yet both perform their functions in spite of that.

Reinhardt · Sep 25, 2022

Zed Duke of Banville said:
Beggar who engages in criminal activity on the side:

DAS RAYCIS!

Zed Duke of Banville · Sep 27, 2022

Lagole Gon said:
I'm trying to make Justin Sweet/Vance Kovacs style Icewind Dale portrait but it seem the AI is mostly inspired by shitty fanart.

Dexter said:
They're probably not famous enough and not prominently included in the training data, can't find them listed here for instance: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/artists.csv

Yes, before relying on an artist, I recommend engaging in testing the impact of that artist as a prompt, by using some series of prompts with only that artist and comparing against the same prompts without any artist and also the same prompts with some artist known to have a strong effect. I hadn't been doing this myself at first, but later testing did reveal that Justin Sweet has only a minor impact on results, which also seems to be true of the D&D/AD&D artists even if they're present in one of the Stable Diffusion lists. Of course, it's best to include multiple artists who either have a similar style (as with many of the examples I've posted in this thread) or who otherwise complement each other (as with the popular Mucha/Rutkowski/Artgerm combination).

Reinhardt said:
DAS RAYCIS!

The training data on which Stable Diffusion relies must suffer from an implicit bias against Spaniards.

The Dawn of the Emperors box set from 1989 covered not only the Empire of Thyatis but also its rival the Empire of Alphatia, ruled by magic-users. Although there is some relation to the historical Persian Empire, Alphatia is more sui generis and perhaps not suited for any of the more historical art styles. However, it still has much in common with the pulp fantasy literature that inspired the Known World setting (and D&D), so for portraits I selected as source artists Virgil Finlay and Roy Krenkel (unfortunately, Krenkel had a relatively minor effect, and I wasn't able to find a third artist in a similar art style who works with Stable Diffusion). For urban landscapes, I reverted to the usual Mucha/Rutkowski/Artgerm combination.

Empress Eriadna, with mahogany-brown hair, green eyes, delicate and expressive features, wearing a golden gown:

Prince Zandor, heir to the throne, with brown hair, brown eyes, a white robe, sharp features:

Master Terari, with grey hair, a grey beard, dark-brown eyes, brown robes, sharp features, inquisitive but settled and relaxed:

Asteriela Torion, with gold-blonde hair, eyes that should be dark brown, a fair complexion, beautiful, energetic, and charming:

Galatia Allatrian, lady-in-waiting to Asteriela, with red hair, brown eyes, a stylish gown, clever and mischievous:

Alphatia's capital Sundsvall, the city built by magic:

The University of Sundsvall, largest magical academy in the world:

JamesDixon · Sep 27, 2022

Looking a lot better, especially the cityscapes.

Alex · Sep 27, 2022

rusty_shackleford said:
Everyone knows technology is completely stagnant and never moves forward.

Technology is limited by reality.

In reality, intelligence is not material. As such, a machine can never be really intelligent.

Catacombs · Sep 27, 2022

I agree with JD; the cityscapes are great. Can you share some of the prompts for them?

JamesDixon · Sep 27, 2022

Catacombs said:
I agree with JB; the cityscapes are great. Can you share some of the prompts for them?

My name is JamesBixon? :lol:

Dexter · Sep 27, 2022

Zed Duke of Banville said:
so for portraits I selected as source artists Virgil Finlay and Roy Krenkel (unfortunately, Krenkel had a relatively minor effect, and I wasn't able to find a third artist in a similar art style who works with Stable Diffusion).

I find that if it gets faces more wrong than not, there's probably something incongruent with the artist combination or text description (try to group up all face-describing keywords into one big blob for instance, since keywords that are closer to one another will be linked together more). There are artist combinations where it gets the faces right 90%+ or close to 100% of the times like the one you used for the cityscapes.

You can also try to fix a face with CodeFormer or GPFGan, you can still do this after you've created an image in the "Extras" Tab if you're using the WebUI e.g.:

If you're trying to do emotions or face expressions, you also have to describe it like you're doing it to an autistic person, otherwise it'll default to a kind of "neutral" face or slight smile e.g.:

Non-Edgy Gamer · Sep 27, 2022

Dexter said:
If you're trying to do emotions or face expressions, you also have to describe it like you're doing it to an autistic person, otherwise it'll default to a kind of "neutral" face or slight smile e.g.:

Yeah, I've seen people post this sort of image elsewhere as if it shows how to get the AI to draw these things. It's nonsense.

The AI doesn't know what a "maniac smile" is. It knows "maniac" and "smile" at best, and will combine drawing a maniac with the smile expression. Just look at "forced smile", which just looks dumbstruck and isn't a smile at all. Or "demonic smile" which rather than giving her a devilish grin, instead redraws her face as a literal demon with glowing eyes and fangs.

You might as well say "smiling maniac" or "smiling demon", since that's exactly what the AI is drawing, not an expression alone.

You might get lucking with word association, and that's probably what you should try for, but the AI isn't intelligent. Don't expect it to handle anything regarding emotional expression that might puzzle Data from Star Trek. Google image search outperforms its results for expressions by far.

Dexter · Sep 27, 2022

Non-Edgy Gamer said:
The AI doesn't know what a "maniac smile" is. It knows "maniac" and "smile" at best, and will combine drawing a maniac with the smile expression. Just look at "forced smile", which just looks dumbstruck and isn't a smile at all. Or "demonic smile" which rather than giving her a devilish grin, instead redraws her face as a literal demon with glowing eyes and fangs.

It knows what it has been trained on and does keyword association of words clumped together and tries to apply said to the resulting image, it will do this the same with say "cute smile" as it does with "Frank Frazetta":
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=duchenne+smile
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=cute+smile
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=maniac+smile
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=psychotic+smile
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=forced+smile

It doesn't have to be intelligent or "get" a concept like a human would to do this. And as you point out, overemphasizing/weighting some things can lead to bad results and some things barely work or don't work at all. Fact of the matter is, if you want certain facial expressions or emotions expressed by a character you have to specifically guide it towards said result by telling it. The comparison seems to be made using the same Sampling method/steps/CFG and Seed and simply inserting different keywords into the image, and it obviously "knows" what a bunch of these things are/recognizes them as facial expressions (crying, sleepy, yawning, angry) or various smiles it can seemingly differentiate between and apply them to the resulting pictures as can be seen.

And yes, maybe "devilish grin" might have been a better keyword association, although given some of the results I assume it would have likely transformed the face too, as would have likely been "maniacal laughter", and I think "aesthetic score" also plays a role in what it has been fed:
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=devilish+grin
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=maniacal+laughter

In that way Data is a bad depiction of "AI", since it would have been easy to just look up human emotions in the database or recognize a joke being told and present the appropriate reaction, just like the Enterprise computer did every time it started a Holodeck program, even though it was supposed to be a lot less sophisticated than Data.

Non-Edgy Gamer · Sep 27, 2022

Dexter said:
It knows what it has been trained on and does keyword association of words clumped together and tries to apply said to the resulting image, it will do this the same with say "cute smile" as it does with "Frank Frazetta":

I know that. So does google. But that's not going to help you if the association isn't strong enough, and many of the images in that collage are clearly not what the prompt says.

I mean look at this. Come on.

Dexter said:
And as you point out, overemphasizing/weighting some things can lead to bad results and some things barely work or don't work at all. Fact of the matter is, if you want certain facial expressions or emotions expressed by a character you have to specifically guide it towards said result by telling it.

But that's my point: you need to understand that it just won't have the associations there for it in most cases. You need to look at things objectively and not assume wishfully that your prompt is having an effect and the AI isn't just drawing random smiles.

Dexter said:
although given some of the results I assume it would have likely transformed the face too

I'm thinking the best way to handle such things for now is with img2img. Provided the colors don't change much, you can mask out only the area you want. I've done this with several images, adding smiles, fixing eyes, etc.

Dexter said:
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=forced+smile

Catacombs · Sep 27, 2022

JamesDixon said:
Catacombs said:

I agree with JB; the cityscapes are great. Can you share some of the prompts for them?

Click to expand...

My name is JamesBixon?

A spat of dyslexia hit me. I fixed it.

JamesBixon is just you with a fancy mustache when you need to go into incognito mode.

rusty_shackleford · Sep 27, 2022

Non-Edgy Gamer said:

Could it be that it treats separate words as individual terms and you need to group them together with quotes to form a single term?

Zed Duke of Banville · Sep 28, 2022

Catacombs said:
I agree with JD; the cityscapes are great. Can you share some of the prompts for them?

The university pictures are 14 of 30 I created with exactly the same prompts, only changing the seed; the terms were broad enough to capture both exterior and interior scenes with considerable variation. Similarly, the city scenes are 10 of 25 created with exactly the same prompts, only the seeds differing. The descriptions are simply keywords from Dawn of the Emperors, although not necessarily found in one place. Interestingly, Sundsvall is the name of an actual city in Sweden, and this had some impact on the architecture displayed, although the effect was limited compared to what the results would have been with a stronger city such as Novogorod, Chartres, Athens, or Kyoto:

No City Name	Sundsvall
Novgorod	Chartres
Athens	Kyoto

Non-Edgy Gamer · Sep 28, 2022

rusty_shackleford said:
Could it be that it treats separate words as individual terms and you need to group them together with quotes to form a single term?

Not really. The () should do that, iirc, if it was going to, but it doesn't because it doesn't work like that.

Basically, every image got scraped with whatever words are associated with it. The words and phrases become tokens, and when there's a strong enough association for the AI on a particular token, it should draw it. If not, it won't. I also understand that how your words get tokenized is a factor. Supposedly Dall-E provides additional guidance that SD doesn't in that process, making the outputs end up more like what people type in than SD.

But basically, there should be a strong association between what you type in and what you want to see within the model, or it probably won't work. You can't depend on it to understand even common things. Some things it just hasn't recieved enough training on. It may understand what a happy face is, but probably not an ambivalent one.

That's my limited understanding of it. But just know that a lot of the reddit and 4chan meme prompt tips are junk. Especially their negative prompts.

If I had a nickel for every time I saw someone put "deformed hands" in their negative prompt, I'd have a lot of nickels. The AI just looks at it and says "Ok, no deformed and no hands!" And these people don't even notice that none of their pictures have hands.

:deathclaw:

Dexter · Sep 28, 2022

Non-Edgy Gamer said:
Not really. The () should do that, iirc, if it was going to, but it doesn't because it doesn't work like that.

Basically, every image got scraped with whatever words are associated with it. The words and phrases become tokens, and when there's a strong enough association for the AI on a particular token, it should draw it. If not, it won't.

But basically, there should be a strong association between what you type in and what you want to see within the model, or it probably won't work. You can't depend on it to understand even common things. Some things it just hasn't recieved enough training on. It may understand what a happy face is, but probably not an ambivalent one.

That's my limited understanding of it. But just know that a lot of the reddit and 4chan meme prompt tips are junk. Especially their negative prompts.

If I had a nickel for every time I saw someone put "deformed hands" in their negative prompt, I'd have a lot of nickels. The AI just looks at it and says "Ok, no deformed and no hands!" And these people don't even notice that none of their pictures have hands.

Pretty sure it can do keyword associations well enough, since it uses CLIP to resolve your prompt and try to come up with a fitting image. The closer two words are to one another in the prompt, the likelier it is to associate them. That's why you should group up things describing one specific element of a picture like a face or the background or whatever into a word blob instead of trying to spread it all over. Similarly, it parses a prompt from the start to the end and will give added weight to what's at the beginning. If you for instance say "fire breathing dragons attacking a castle" it'll concentrate on the fire breathing dragons. If you instead describe a castle with its moat and whatever in minute detail, add some artists or whatnot and throw in "fire breathing dragons" at the very end with like ~30-50 words before that it might outright ignore it or make it a very minor component of the composition. If it couldn't do context and basic association you'd have "fire", "breathing", "dragons", "attack" and "castle" as separate elements competing with one another and a mess of an incomprehensible picture.

CLIP has been trained on image-text pairs and can do a lot more than resolve a single word though as it does Natural Language Processing, for instance you can feed it an image and various descriptions and based on its previous training it can determine which is the closest by measuring the statistical relationship between tokens it finds close to one another: https://towardsdatascience.com/clip...el-from-openai-and-how-to-use-it-f8ee408958b1

It uses a similar method to generate pictures from noise and find what is the likeliest result from your text prompt. Without CLIP and the correlation it does between text and image you wouldn't be able to type something like "photo of a woman with platinum blonde hair" and get mostly highly accurate results, and it would draw much more nonsense or random shit instead. It obviously understands what "platinum blonde hair" is in relation to "a woman" and doesn't just start painting chunks of platinum near a woman and random hair all over the picture.

I don't think () and [] are meaningful for CLIP, that's a feature that was implemented in the WebUI Repo to (((add))) or [[[detract]]] weight from a specific keyword or keyword combination. I don't think it does anything for grouping.

Also pretty sure it knows well enough what "deformed hands" are, since this is what comes up if you just input that term, same for "poorly drawn hands":

And this is what comes up when you search LAION-5B for them, all this Search does is also use CLIP to search for images based on text in the database of 5 billion images, a subset of which SD was trained on, I believe they limited it to a higher aesthetic score (you can change that on the left-hand side):
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=deformed+hands
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=poorly+drawn+hands

As for their usefulness as negative keywords, I'm not sure and there is definitely a lot of placebo going around. If not copying something to try for a specific result, I generally use them very sparingly mostly if I want to get rid of a certain element or predominant color or whatever. But it certainly doesn't just "remove hands" and you can easily test that.

Non-Edgy Gamer · Sep 28, 2022

Dexter said:
CLIP has been trained on image-text pairs and can do a lot more than resolve a single word though as it does Natural Language Processing

I know. That's why I said it uses tokens.

If there's enough association in the training data, it will be able to associate these tokens, or group multiple words into a single token. If not, it won't.

Dexter said:
I don't think () and [] are meaningful for CLIP

Again, I know. It's for weights in SD. Assuming there is enough association, emphasizing the token or tokens should work to emphasize the association in the result. But as I said, it doesn't because it doesn't work like that. The association is either there or it isn't. Yelling at the model won't make more training data appear.

Dexter said:
Also pretty sure it knows well enough what "deformed hands" are, since this is what comes up if you just input that term, same for "poorly drawn hands":

Dude, the model can't draw hands at all, unless by a fluke. Look at what comes up for "perfect hands":

See, this the kind of thinking I'm referring to that leads to so many incorrect conclusions. You assume the model knows things based on what you read into the images vs actual testing and data.

Again, this is not a forced smile or a cruel smile. They're both almost the same expression, and not even smiles at all. It's classic bias in testing.

Dexter said:
As for their usefulness as negative keywords, I'm not sure

You can get sure by testing it and paying attention objectively instead of hoping to see what you want to see.

Dexter said:
https://rom1504.github.io/clip-retr...x=laion5B&useMclip=false&query=deformed+hands

Half of these are just hands.

Dexter · Sep 28, 2022

Non-Edgy Gamer said:
If there's enough association in the training data, it will be able to associate these tokens, or group multiple words into a single token. If not, it won't.

Yes it does, based on closeness and statistical relationship. And if something isn't in the model it might just add noise or try to infer something else. It won't "group multiple words into a single token" though, since a token is at most a word, if it's a long one then it can even be 2-3 tokens for a word, also commas and other separators are usually a token. You can see how this works here based on GPT-3, just hit "show example" or Paste something: https://beta.openai.com/tokenizer It's not 1:1 applicable to Stable Diffusion, but the same principle. One of the recent WebUI Updates even introduced a Token counter so people don't exceed the Maximum of 75 for the Model, which some people did, since everything after that just gets cut off.

Non-Edgy Gamer said:
Again, I know. It's for weights in SD. Assuming there is enough association, emphasizing the token or tokens should work to emphasize the association in the result.

I was just clarifying, that () or [] aren't tokens and have no meaningful effect on the Tokenizer or CLIP, it's just a specific implementation in one/probably the widest used WebUI's: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features

Using other WebGUIs, Desktop UIs, Plugins or the SaaS implementations of various Websites, these characters have no meaningful effect.

Non-Edgy Gamer said:
Again, this is not a forced smile or a cruel smile. They're both almost the same expression, and not even smiles at all. It's classic bias in testing.

Dexter said:
And as you point out, overemphasizing/weighting some things can lead to bad results and some things barely work or don't work at all. Fact of the matter is, if you want certain facial expressions or emotions expressed by a character you have to specifically guide it towards said result by telling it.

I don't even think the Original post claimed it was, just testing what works and posting a helpful comparison. I'm not searching where that was first posted though. I'm still not sure why you're throwing a fit over me pointing out that if you want a facial expression in a portrait or any picture depicting characters to be anything other than "neutral" you have to specifically point that out by including something like ((laughing)) or (cute smile) or whatever else, many of which do work.

Non-Edgy Gamer said:
You can get sure by testing it and paying attention objectively instead of hoping to see what you want to see.

Non-Edgy Gamer said:
And these people don't even notice that none of their pictures have hands.

That I did test. Your claim that "deformed hands" in the Negatives removes hands is wrong, there were still plenty of hands in the results. Beyond that I have no idea what it exactly does and have no interest in lengthy testing, the results seemed a bit worse than without it, but that doesn't mean the model doesn't have a vague idea what "deformed hands" means and that if it could perfectly draw hands that wouldn't be useful. As you pointed out it still has problems drawing hands in the best of cases though and seems to be more based on luck if it gets it kind of right once or twice.

Non-Edgy Gamer · Sep 28, 2022

Dexter said:
It won't "group multiple words into a single token" though, since a token is at most a word

No, it depends on the model. It is possible to do multi-word tokenization or to combine punctuation into a single token with the word, and I've used language models that have done this, but I have no idea if SD does that or not.

Dexter said:
I was just clarifying, that () or [] aren't tokens and have no meaningful effect on the Tokenizer or CLIP Model, it's just a specific implementation in one/probably the widest used WebUI's

Yeah, I know.

Dexter said:
That I did test. Your claim that "deformed hands" in the Negatives removes hands is wrong

In most cases, it does. The negative prompt isn't 100%. The hands will usually be out of frame, a single hand will be shown, the hands will be incomplete etc. You can say it's wrong, but it's not. I've seen it, and anyone who wants to can test it. You can insist that you have tested it, but I encourage people not to take your word for it.

Dexter said:
Beyond that I have no idea what it exactly does and have no interest in lengthy testing

Then you admit you don't actually know.

It's the lack of interest in "lengthy testing" that's your problem in general. Just like you typed "deformed hands" into the model, got deformed hands and assumed it understood you even though it drew hands how it always draws them. A handful of tests or a single test means very little when dealing with such randomness.

Dexter said:
the results seemed a bit worse than without it, but that doesn't mean the model doesn't have a vague idea what "deformed hands" means and that if it could perfectly draw hands that wouldn't be useful.

Yes, IF it could. But it can't. That's my point. You can't use a prompt to force ideas that aren't already trained enough into the model. I wish you could. I wish I could just type "bad drawing" into the negative prompt and avoid all the bad drawings, but I can't. I'm probably just telling it to avoid the tokens "bad" and "drawing" more than anything. Which, btw, avoiding "drawing" is useful if you want to force photorealism.

Dexter said:
As you pointed out it still has problems drawing hands in the best of cases though and seems to be more based on luck if it gets it kind of right once or twice.

Glad we agree.

Zed Duke of Banville · Sep 30, 2022

Turning to the Planescape campaign setting, starting with Mechanus, the plane of Law(ful Neutral).

Clockwork mechanism:

Great Orrery:

Modron Cathedral:

Jade Palace of Shang-Ti the Celestial Emperor:

Mycelia, domain of Psilofyr god of the Myconids:

Bringing D&D/AD&D campaign settings to life with Stable Diffusion

Grand Dragon

Learned

Grand Dragon

Learned

Arcane

Grand Dragon

Arcane

Dungeon Master

GM Extraordinaire

Arcane

Arcane

GM Extraordinaire

Arcane

Grand Dragon

Arcane

Grand Dragon

Arcane

Arcane

Dungeon Master

Grand Dragon

Arcane

Grand Dragon

Arcane

Grand Dragon

Dungeon Master