imaginAIry/docs/emad-qa-2020-10-10.md

1510 lines
88 KiB
Markdown
Raw Permalink Normal View History

2022-10-12 17:33:02 +00:00
# Q&A with Emad Mostaque
Lightly edited by Bryce Drennan, [imaginAIry](https://github.com/brycedrennan/imaginAIry)
### Table of Contents
- [Summarized Version](#summarized-version)
- [Full Version](#full-version)
- When will 1.5 be released?
- What are SD's views on artistic freedom versus censorship in models? (3:24)
- Any update on the updated credit pricing model that was mentioned a couple of days ago, as in, is it getting much cheaper? (5:11)
- Can we get an official statement on why [Automatic](https://github.com/AUTOMATIC1111/stable-diffusion-webui) was banned (from the discord) and why NovelAI used his code? (5:53)
- Will Stability provide, fund, a model to create new medicines? (12:31)
- Do you think the new AI models push us closer to a post-copyright world? (14:04)
- Prompt engineering may well become an elective class in schools over the next decade. With extremely fast paced development, what do you foresee as the biggest barriers to entries? Some talking points might include a reluctance to adoption, death of the concept artist and the dangers outweighing the benefits. (15:27)
- How long does it usually take to train? (16:53)
- How close do you feel you might be able to show a full motion video model like Google or Meta showed up recently? (18:26)
- When do you think we will talk to an AI about the image? (18:35)
- How realistic do you think dynamically creating realistic 3d content with enough fidelity in a VR setting would be? what do you say the timeline on something like that is? (20:44)
- Any plans for stability to tackle open source alternatives to AI code generators, like copilot and alpha code? (25:01)
- Will support be added for inference at sizes other than 512 by default?
- Do you have any plans to improve the model in terms of face, limbs, and hand generation? Is it possible to improve on specifics on this checkpoint?
- I saw your partnership with AI Grant with Nat and Daniel. If you guys would support startups in case they aren't selected by them, any way startups can connect with you folks to get mentorship or guidance?
- Is Stability AI considering working on climate crisis via models in some way? (28:23)
- Which jobs do you think are most in danger of being taken by AI?
- What work is being done to improve the attention mechanism of stable diffusion to better handle and interpret composition while preserving artistic style? There are natural language limitations when it comes to interpreting physics from simple statements. Artistic style further deforms and challenges this kind of interpretation. Is stability AI working on high-level compositional language for use of generative models?
- What are the technical limitations around recreating SD with a 1024 dataset rather than 512, and why not have varying resolutions for the dataset? Is the new model going to be a ton bigger?
- Any plans for creating a worthy open source alternative, something like AI Dungeon or Character AI?
- When we'll be able to create full on movies with AI?
- Did you read the distillation of guided diffusion models paper? Do you have any thoughts on it? Like if it will improve things on consumer level hardware or just the high VRAM data centers?
- who do you think should own the copyright of an image video made by an AI or do you think there shouldn't be an owner?
- Update on adding more payment methods for dream studio?
- Are there any areas of the industry that is currently overlooked that you'll be excited to see the effects of diffusion based AI being used?
- Do you have any plans to release a speech since this model likes script overdone voices?
- Do you have any thoughts on increasing the awareness of generative models? Is this something you see as important? How long do you think until the mass global population becomes aware of these models?
- Will we always be limited by the hardware cost to run AI or do you expect something to change?
- I'm unsure how to release licensed images based on SD output. Some suggest creative commons zero is fine.
- Is stability AI going to give commissions to artists?
- A text-to-speech model too?
- Is it possible to get vector images like an SVG file from stable diffusion or related systems?
- Is there a place to find all stable AI-made models in one place?
- Where do you see stability AI in five years?
- My question was about whether I have to pass down the rail license limitations when licensing SD based images or I can release as good. (45:06)
- As a composer and audio engineer myself, I cannot imagine AI will approach the emotional intricacies and depths of complexity found in music by world class musicians, at least not anytime soon. That said, I'm interested in AI as a tool, would love to explore how it can be used to help in this production process. Is stability AI involved in this?
- Are you guys working on LMs as well, something to compete with OpenAI GPT-3?
- In the future for other models, we are building an opt-in and opt-out system for artists and others that will lead to use in partnerships leading organizations. This model has some principles, the outputs are not direct for any single piece or initiatives of motion with regards to this.
- When will stability and EleutherAI be able to translate geese to speech in real time?
- Will dream studio front-end be open source so it can be used on local GPUs? (50:25)
- What do you think of the situation where a Google engineer believed the AI chatbot achieved sentience?
- Thoughts on getting seamless equirectangular 360 degree and 180 degree and HDR outputs in one shot for image to text and text to image.
- Any plans for text-to-3d diffusion models?
- With some of the recent backlash from artists, is there anything you wish that SD did differently in the earliest stages that would have changed the framing around image synthesis?
- Are you looking to decentralize GPU AI compute?
- Are we going to do nerf type models?
- Will AI lead to UBI? (56:09)
- When will we be able to generate games with AI?
- How's your faith influence your mission?
- How are you going to train speed cost and TPUs versus a one hundreds or the cost of switching TensorFlow from PyTorch?
- Does StabilityAI have plans to take on investors at any point or have they already?
- How much of an impact do you think AI will impact neural implant cybernetics? It appears one of the limiting facts of cybernetics is the input method, not necessarily the hardware. (1:00:24)
- Can you make cyberpunk 2077 not broken?
- Are you guys planning on creating any hardware devices? A more consumer-oriented one, which has AI as OS?
- Anything specific you'd like to see out of the community?
- How are you Emad?
- What's a good way to handle possible tribalism, extremism?
- Emad, real-life meetups for us members?
- Any collaboration in China yet? Can we use Chinese clip to guide the current one or do we need to retrain the model, embed the language clip into the model?
- Is there going to be a time when we have AI friends we create ourselves, personal companions speaking to us via our monitor, much of the same way a webcam call is done, high quality, et cetera?
- How early do you think we are in this AI wave that's emerging? With how fast it's changing it's hard not to feel FOMO. (1:10:28)
- Any comments on Harmony AI? How close do you think we are to having music sound AI with the same accessibility afforded by stable diffusion?
- When will you launch the full DreamStudio and will it be able to do animations? If so, do you think it'll be more cost effective than using Colab?
- Do you think an AI therapist could be something to address the lack of access to qualified mental health experts?
- What are your thoughts on Buckminster Fuller's work and his thoughts on how to build a world that doesn't destroy himself?
- How will generative models and unlimited custom tailored content to an audience of one impact how we value content? The paradox of choice is more options tend to make people more anxious and we get infinite choice right now. How do we get adapted to our new god-like powers in this hedonic treadmill? Is it a net positive for humanity? How much consideration are we given to potential bad outcomes?
- When do you think multi-models will emerge combining language, video and image?
- wrap up
## Summarized Version
(This took too much effort so I've only summarized the first few questions)
**When will 1.5 be released?**
The developers have asked for more time before releasing this particular class of model, given some
of the edge cases of danger here. The other part is the movement of the repository and the taking over from CompViz. this may seem like just hitting a fork button, but we've taken in legal counsel to make sure that we are doing the right thing and are fully protected.
I believe that that process is nearly complete
In the next couple of days we will be making three releases:
- the Discord bot will be open sourced
- there is a diffusion-based upscaler that is really quite snazzy that will be released as well
- a new decoder architecture for better human faces and other elements
Our clip guidance instructions released soon that will enable you to have mid-journey level results.
This particular class of models (image generation) needs to be released properly and responsibly, otherwise it's going to get very messy.
Congresswoman Eshoo coming out and [directly attacking us and asking us to be classified as dual-use technology and be banned by the NSA,](https://eshoo.house.gov/sites/eshoo.house.gov/files/9.20.22LettertoNSCandOSTPonStabilityAI.pdf) there is European Parliament actions and others, because they just think the technology is too powerful.
**What are SD's views on artistic freedom versus censorship in models? (3:24)**
My view is basically if it's legal, then it should be allowed.
The main thing that we want to try and do is that the model produces what you want it to produce, I think that's an important thing.
**Any update on the updated credit pricing model that was mentioned a couple of days ago, as in, is it getting much cheaper? (5:11)**
Yes, next week there'll be a credit pricing adjustment. You will be able to do a lot more with your credits, as opposed to the credits being changed in price.
2022-10-12 17:40:02 +00:00
**Can we get an official statement on why [Automatic](https://github.com/AUTOMATIC1111/stable-diffusion-webui) was banned (from the discord) and why NovelAI used his code? (5:53)**
2022-10-12 17:33:02 +00:00
**Will Stability provide, fund, a model to create new medicines? (12:31)**
- We're currently working on [DNA diffusion](https://github.com/pinellolab/DNA-Diffusion) that will be announced next week.
- LibreFold with Sergei Shrinikov's lab at Harvard and UCL, so that's probably going to be the most advanced protein folding model in the world, more advanced than AlphaFold.
**Do you think the new AI models push us closer to a post-copyright world? (14:04)**
I don't know, I think that's a very good question, it might.
To be honest, no one knows what the copyright is around some of these things, like at what point does fair use stop and start and derivative works?
It hasn't been tested, it will be tested,
**Prompt engineering may well become an elective class in schools over the next decade.
With extremely fast paced development, what do you foresee as the biggest barriers to entries?
Some talking points might include a reluctance to adoption, death of the concept artist and
the dangers outweighing the benefits. (15:27)**
[I think prompt engineering won't really be a thing. The models will get better.]
**How long does it usually take to train? (16:53)**
- Stable Diffusion: 150,000 A100 hours at $4/hour on Amazon. 24 days on 256 A100s.
- OpenClip: 1.2 Million hours
**How close do you feel you might be able to show a full motion video model like Google or Meta showed up recently? (18:26)**
We'll have it by the end of the year. But better.
**When do you think we will talk to an AI about the image? Like can you fix his nose a little bit or make a hair longer and stuff like that? (18:35)**
To be honest, I'm kind of disappointed in the community has not built that yet. All you have to do is whack whisper on the front end. Thank you, OpenAI.
**How do you feel about the use of generative technology being used by surveillance capitalists to further profit aimed goals? What kind of stability I do about this? (19:33)**
The only thing we can really do is offer alternatives. Out-compete.
**How realistic do you think dynamically creating realistic 3d content with enough fidelity in a VR setting would be? what do you say the timeline on something like that is? (20:44)**
It's going to come within four to five years, fully high res, 2k in each eye resolution via even 4k or 8k actually, it just needs an M2 chip with the specialist transformer architecture in there.
We have a ton of partnerships that we'll be announcing over the next few months, where we're converting closed source AI companies into open source AI companies.
**What guarantees does the community have that stability AI won't go down on the same path as OpenAI? That one day you won't develop a good enough model, you decide to close things after benefiting from all the work of the community and the visibility generated by it? (22:55)**
That's a good question. I mean, it kind of sucks what happened with open AI, right?
The R&D team and the developers have in their contracts that they can release any model that they work on open source. So legally, we can't stop them.
**Any plans for stability to tackle open source alternatives to AI code generators, like copilot and alpha code? (25:01)**
Yeah, you can go over to carper.ai, and see our code generation model that's training right now.
## Full Version
(lightly edited)
**When will 1.5 be released?**
1.5 isn't that big an improvement over 1.4, but it's still an improvement.
And as we go into version 3 and the Imagen models that are training away now, which is
like we have a 4.3 billion parameter one and others, we're considering what is the best
data for that. What's the best system for that to avoid extreme edge cases, because there's always people
who want to spoil the party? This has caused the developers themselves, and again, kind of I haven't done a big push
here, it has been from the developers, to ask for a bit more time to consult and come
up with a proper roadmap for releasing this particular class of model.
They will be released for research and other purposes, and again, I don't think the license
is going to change from the open rail end license, it's just that they want to make
sure that all the boxes are ticked rather than rushing them out, given, you know, some
of these edge cases of danger here.
The other part is the movement of the repository and the taking over from CompViz, which is
an academic research lab, again, who had full independence, relatively speaking, over the
creation of decisions around the model, to StabilityAI itself.
Now this may seem like just hitting a fork button, but you know, we've taken in legal
counsel and a whole bunch of other things, just making sure that we are doing the right
thing and are fully protected around releasing some of these models in this way.
I believe that that process is nearly complete, it certainly cost us a lot of money, but you
know, it will either be ourselves or an independent charity maintaining that particular repository
and releasing more of these generative models.
Stability itself, and again, kind of our associated entities, have been releasing over half a
dozen models in the last weeks, so a model a week effectively, and in the next couple
of days we will be making three releases, so the Discord bot will be open sourced, there
is a diffusion-based upscaler that is really quite snazzy that will be released as well,
and then finally there will be a new decoder architecture that Rivers Have Wings has been
working on for better human faces and other elements trained on the aesthetic and humans
thing.
The core models themselves are still a little bit longer while we sort out some of these
edge cases, but once that's in place, hopefully we should be able to release them as fast
as our other models, such as for example the open clip model that we released, and there
will be our clip guidance instructions released soon that will enable you to have mid-journey
level results utilising those two, which took 1.2 million A100 hours, so like almost eight
times as much as stable diffusion itself.
Similarly, we released our language models and other things, and those are pretty straightforward,
they are MIT, it's just again, this particular class of models needs to be released properly
and responsibly, otherwise it's going to get very messy.
Some of you will have seen a kind of Congresswoman Eshoo coming out and [directly attacking us
and asking us to be classified as dual-use technology and be banned by the NSA,](https://eshoo.house.gov/sites/eshoo.house.gov/files/9.20.22LettertoNSCandOSTPonStabilityAI.pdf) there
is European Parliament actions and others, because they just think the technology is
too powerful., we are working hard to avoid that, and again, we'll continue from there.
**What are SD's views on artistic freedom versus censorship in models? (3:24)**
My view is basically if it's legal, then it should be allowed, if it's illegal, then we
should at least take some steps to try and adjust things around that, now that's obviously
a very complicated thing, as legal is different in a lot of different countries, but there
are certain things that you can look up the law, that's illegal to create anywhere.
I'm in favour of more permissiveness, and you know, leaving it up to localised ethics
and morality, because the reality is that that varies dramatically across many years,
and I think it's our place to kind of police that, similarly, as you've seen with Dream
Booth and all these other extensions on stable diffusion, these models are actually quite
easy to train, so if something's not in the dataset, you can train it back in, if it doesn't
fit in with the legal area of where we ourselves release from.
So I think, you know, again, what's legal is legal, ethical varies, et cetera, the main
thing that we want to try and do is that model produces what you want it to produce, I think
that's an important thing.
I think you guys saw at the start, before we had all the filters in place, that stable
diffusion trained on the snapshot of the internet, as it was, it's just, when you type to the
women, it had kind of toplessness for a lot of any type of artistic thing, because a lot
of topless women in art, even though art is less than like, 4.5% of the dataset, you know,
that's not what people wanted, and again, we're trying to make it so that it produces
what you want, as long as it is legal, I think that's probably the core thing here.
**Any update on the updated credit pricing model that was mentioned a couple of days ago, as in, is it getting much cheaper? (5:11)**
Yes, next week, there'll be a credit pricing, a credit pricing adjustment from our side.
There have been lots of innovations around inference and a whole bunch of other things,
and the team has been testing it in staging and hosting.
You've seen this as well in the diffusers library and other things, Facebook recently
came out with some really interesting fast attention kind of elements, and we'll be passing
on all of those savings.
The way that it'll probably be is that credits will remain as is, but you will be able to
do a lot more with your credits, as opposed to the credits being changed in price, because
I don't think that's fair to anyone if we change the price of the credits.
**Can we get an official statement on why [Automatic](https://github.com/AUTOMATIC1111/stable-diffusion-webui) was banned (from the discord) and why NovelAI used his code? (5:53)**
I don't particularly like discussing individual user bans and things like that, but this was
escalated to me because it's a very special case, and it comes at a time, again, of increased
notice on the community and a lot of these other things.
I've been working very hard around this.
Automatic created a wonderful web UI that increased the accessibility of stable diffusion
to a lot of different people.
You can see that by the styles and other things.
It's not open source, and I believe there is a copyright on it, but still, again, work
super hard.
A lot of people kind of helped out with that, and it was great to see.
However, we do have a very particular stance on community as to what's acceptable and what's
not.
I think it's important to kind of first take a step back and understand what stability
is and what stable diffusion is and what this community is, right?
AI is a company that's trying to do good.
We don't have profit as our main thing.
We are completely independent.
It does come a lot from me and me trying to do my best as I try to figure out governance
structures to fit things, but I do listen to the devs.
I do listen to my team members and other things.
Obviously, we have a profit model and all of that, but to be honest, we don't really
care about making revenue at the moment because it's more about the deep tech that we do.
We don't just do image.
We do protein folding.
We release language models, code models, the whole gamut of things.
In fact, we are the only multimodal AI company other than OpenAI, and we release just about
everything with the exception of generative models until we figure out the processes for
doing that.
MIT open-sourced.
What does that mean?
It means that literally everything is open-sourced.
Against that, we come under attack.
So our model weights, when we released it for academia, were leaked.
We collaborate with a lot of entities, so NovelAI is one of them, and their engineers
have hit with various code-based things, and I think we've helped as well.
They are very talented engineers, and you'll see they've just released a list of all the
things that they did to improve stable diffusion because they were actually going to open-source
it very soon, I believe it was next week, before the code was stolen from their system.
We have a very strict no-support policy for stolen code because this is a very sensitive
area for us.
We do not have a commercial partnership with NovelAI.
We do not pay them.
They do not pay us.
They're just members of the community like any other, but when you see these things,
if someone stole our code and released it and it was dangerous, I wouldn't find that
right.
If someone stole their code, if someone stole other codes, I don't believe that's right
either in terms of releasing.
Now in this particular case, what happened is that the community member and person was
contacted and there was a conversation made.
He made some messages public.
Other messages were not made public.
I looked at all the facts.
I decided that this was a banable offense on the community.
I'm not a stupid person.
I am technical.
I do understand a lot of things, and I put everyone there to kind of make this as a clear
point.
Stable diffusion community itself is one of community of stability AI, and it's one community
of stable diffusion.
Stable diffusion is a model that's available to the whole world, and you can build your
own communities and take this in a million different ways.
It is not healthy if stability AI is at the center of everything that we do, and that's
not what we're trying to create.
We're trying to create a multiplicity of different areas that you can discuss and take things
forward and communities that you feel you yourself are a stable part of.
Now, this particular one is regulated, and it is not a free-for-all.
It does have specific rules, and there are specific things within it.
Again, it doesn't mean that you can't go elsewhere to have these discussions.
We didn't take it down off GitHub or things like that.
We leave it up to them, but the manner in which this was done and there are other things
that aren't made public, I did not feel it was appropriate, and so I approved the banning
and the buck stops with me there.
If the individual in question wants to be unbanned and rejoin the community, there is
a process for repealing bans.
We have not received anything on that side, and I'd be willing to hear other stuff if
maybe I didn't have the full picture, but as it is, that's where it stands, and again,
like I said, we cannot support any illegal theft as direct theft in there.
With regards to the specific code point, you can ask novel AI themselves what happened
there.
They said that there was AGPL code copied over, and then they rescinded it as soon as
it was notified, and they apologized.
That did not happen in this case, and again, we cannot support any leaked models, and we
cannot support that because, again, the safety issues around this and the fact that if you
start using leaked and stolen code, there are some very dangerous liability concerns
that we wish to protect the community from.
We cannot support that particular code base at the moment, and we can't support that individual
being a member of the community.
Also, I would like to say that a lot of insulting things were said, and we let it slide this
once.
Don't be mean, man.
Just talk responsibly.
Again, we're happy to have considered and thought-out discussions offline and online.
If you do start insulting other members, then please flag it to moderators, and there will
be timeouts and bans because, again, what is this community meant to be?
It's meant to be quite a broad but core and stable community that is our private community
as Stability AI, but, like I said, the beauty of open source is that if this is not a community
you're comfortable with, you can go to other communities.
You can set up your own communities.
You can set up your notebooks and others.
In fact, when you look at it, just about every single web UI has a member of Stability contributing.
From Pharma Psychotic at DeForum through to Dango on Majesty through to Gandamu at Disco,
we have been trying to push open source front-ends with no real expectations of our own because
we believe in the ability for people to remix and build their own communities around that.
Stability has no presence in these other communities because those are not our communities.
This one is.
So, again, like I said, if Automattic does want to have a discussion, my inbox is open,
and if anyone feels that they're unjustly timed out or banned, they can appeal them.
Again, there is a process for that.
That hasn't happened in this case, and, again, it's a call that I made looking at some publicly
available information and some non-publicly available information, and I wish them all
the best.
**Will Stability provide, fund, a model to create new medicines? (12:31)**
We're currently working on [DNA diffusion](https://github.com/pinellolab/DNA-Diffusion) that will be announced next week for some of the
DNA expression things in our openBioML community.
Feel free to join that.
It's about two and a half thousand members, and currently I believe it's been announced
LibreFold with Sergei Shrinikov's lab at Harvard and UCL, so that's probably going to be the
most advanced protein folding model in the world, more advanced than AlphaFold.
It's just currently undergoing ablations.
Repurposing of medicines and discovery of new medicines is something that's very close
to my heart.
Many of you may know that basically the origins of Stability were leading and architecting
and running the United Nations AI Initiative against COVID-19, so I was the lead architect
of that to try and get a lot of this knowledge coordinated around that.
We made all the COVID research in the world free and then helped organize it with the
backing of the UNESCO World Bank and others, so that's one of the genesis' alongside education.
For myself as well, if you listen to some of my podcasts, I quit being a hedge fund
manager for five years to work on repurposing drugs for my son, doing AI-based lit review
and repurposing of drugs through neurotransmitter analysis.
So taking things like nazepam and others to treat the symptoms of ASD, the papers around
that will be published and we have several initiatives in that area, again, to try and
just catalyze it going forward, because that's all we are, we're a catalyst.
Communities should take up what we do and run forward with that.
**Do you think the new AI models push us closer to a post-copyright world? (14:04)**
I don't know, I think that's a very good question, it might.
To be honest, no one knows what the copyright is around some of these things, like at what
point does free use stop and start and derivative works?
It hasn't been tested, it will be tested, I'm pretty sure there will be all sorts of
lawsuits and other things soon, again, that's something we're preparing for.
But I think one of the first AI pieces of art was recently granted a copyright.
I think the ability to create anything is an interesting one as well, because again,
it makes content more valuable, so in an abundance scarcity is there, but I'm not exactly sure
how this will play out.
I do think you'll be able to create anything you want for yourselves, it just becomes,
what happens when you put that into a social context and start selling that?
This comes down to the personal agency side of the models that we build as well, you know,
like you're responsible for the inputs and the outputs that result from that.
And so this is where I think copyright law will be tested the most, because people usually
did not have the means of creation, whereas now you have literally the means of creation.
**Prompt engineering may well become an elective class in schools over the next decade. With extremely fast paced development, what do you foresee as the biggest barriers to entries? Some talking points might include a reluctance to adoption, death of the concept artist and the dangers outweighing the benefits. (15:27)**
Well, you know, the interesting thing here is that a large part of life is the ability
to prompt.
So, you know, prompting humans is kind of the key thing, like my wife tries to prompt
me all the time, and she's not very successful, but she's been working on it for 16 years.
I think that a lot of the technologies that you're seeing right now from AI, because it
understands these latent spaces or hidden meanings, it also includes the hidden meanings
in prompts, and I think what you see is you have these generalized models like stable
diffusion and stable video fusion and dance diffusion and all these other things.
It pushes intelligence to the edge, but what you've done is you compressed 100,000 gigabytes
of images into a two gigabyte file of knowledge that understands all those contextualities.
The next step is adapting that to your local context.
So that's what you guys do when you use Dreambooth, or when you do textual inversion, you're injecting
a bit yourself into that model so it understands your prompts better.
And I think a combination of multiple models doing that will mean that prompt engineering
isn't really the thing, it's just understanding how to chain these tools together, so more
kind of context specific stuff.
This is why we're partnered with an example for Replit, so that people can build dynamic
systems and we've got some very interesting things on the way there.
I think the barriers to entry will drop dramatically, like do you really need a class on that?
For the next few years, yeah, but then soon it will not require that.
**How long does it usually take to train? (16:53)**
Well, that's a piece of string.
It depends.
We have models, so stable diffusion of 150,000 A100 hours, and A100 hours about $4 on Amazon,
which you need for the interconnect.
Open clip was 1.2 million hours.
That's literally hours of compute.
So for stable diffusion, can someone in the chat do this?
It's 256 A100s over 150,000 hours.
So divide one by the other.
What's the number?
Let me get it quick.
Quickest.
Ammonite?
Ammonite, you guys kind of calculate slow.
24 days, says Ninjaside.
There we go.
That's about how long it took to train the model.
To do the tests and other stuff, it took a lot longer.
And the bigger models, again, it depends because it doesn't really need any scale.
So it's not that you chuck 512 and it's more efficient.
It is really a lot of the heavy lifting is done by the super compute.
So what happens is that we're doing all this work up front, and then we release the model
to everyone.
And then as Joe said, DreamBooth takes about 15 minutes on an A100 to then fine tune.
Because all the work of those years of knowledge, the thousands of gigabytes, are all done for
you.
And that's why you can take it and extend it and kind of do what you want with it.
That's the beauty of this model over the old school internet, which was always computing
all the time.
So you can push intelligence to the edges.
All right.
**How close do you feel you might be able to show a full motion video model like Google or Meta showed up recently? (18:26)**
We'll have it by the end of the year. But better.
**When do you think we will talk to an AI about the image? (18:35)**
Like can you fix his nose a little bit or make a hair longer and stuff like that?
To be honest, I'm kind of disappointed in the community has not built that yet.
It's not complicated.
All you have to do is whack whisper on the front end.
Thank you, OpenAI.
You know, obviously, you know, that was a great benefit and then have that input into
style clip or a kind of fit based thing.
So if you look up, Max Wolf has this wonderful thing on style clip that you can see how to
create various scary Zuckerberg's as if he wasn't scary himself.
And so I'm putting that into the pipeline basically allows you to do what it says there
with a bit of targeting.
So there's some star clip right there in the stage chat.
And again, with the new clip models that we have and a bunch of the other bit models that
Google have released recently, you should be able to do that literally now when you
can buy that with whisper.
**How do you feel about the use of generative technology being used by surveillance capitalists to further profit aimed goals?
What kind of stability I do about this? (19:33)**
The only thing we can really do is offer alternatives like
do you really want to be in a meta what do they call it, horizon first where you got
no legs or genitals, not really, you know, like legs are good, genitals good.
And so by providing open alternatives, we can basically out compete the rest like look
at the amount of innovation that's happened on the back of stable diffusion.
And again, you know, acknowledge our place in that we don't police it, we don't control
it, you know, like people can take it and extend it.
If you want to use our services, great.
If you don't, it's fine.
We're creating a brand new ecosystem that will out compete the legacy guys, because
thousands millions of people will be building and developing on this.
Like we are sponsoring the faster AI course on stable diffusion, so that anyone who's
a developer can rapidly learn to be a stable diffusion developer.
And you know, this isn't just kind of interfaces and things like that.
It's actually you'll be able to build your own models.
And how crazy is that?
Let's make it accessible to everyone and again, that's why we're working with gradios and
others on that.
**How realistic do you think dynamically creating realistic 3d content with enough fidelity in a VR setting would be? what do you say the timeline on something like that is? (20:44)**
You know, unless you're Elon Musk, self driving cars have always been five years away.
Always always, you know, $100 billion has been spent on self driving cars, and the research
and it's to me, it's not that much closer.
The dream of photorealistic VR though is very different with generative AI.
Like again, look at the 24 frames per second image and video look at the
long fanaki video as well and then consider Unreal Engine 5 what's Unreal Engine 6 going
to look like?
Well, it'll be photorealistic right and it'll be powered by nerf technology.
The same as Apple is pioneering for use on the neural engine chips that make up 16.8%
of your MacBook M1 GPU.
It's going to come within four to five years, fully high res, 2k in each eye resolution
via even 4k or 8k actually, it just needs an M2 chip with the specialist transformer
architecture in there.
And that will be available to a lot of people.
But then like I said, Unreal Engine 6 will also be out in about four or five years.
And so that will also up the ante.
There's a lot of amazing compression and customized stuff you can do around this.
And so I think it's just gonna be insane when you can create entire worlds.
And hopefully, it'll be built on the type of architectures that we help catalyze, whether
it's built by ourselves or others.
So we have a metric shit ton, I believe is the appropriate term of partnerships that
we'll be announcing over the next few months, where we're converting closed source AI companies
into open source AI companies, because, you know, it's better to work together.
And again, we shouldn't be at the center of all this with everything laying on our shoulders.
But it should be a teamwork initiative, because this is cool technology that will help a lot
of people.
**What guarantees does the community have that stability AI won't go down on the same path as OpenAI?
That one day you won't develop a good enough model, you decide to close things after benefiting from all the work of the community and the visibility generated by it? (22:55)**
That's a good question.
I mean, it kind of sucks what happened with open AI, right?
You can say it's safety, you can say it's commercials, like whatever.
The R&D team and the developers have in their contracts, except for one person that we need
to send it to, that they can release any model that they work on open source.
So legally, we can't stop them.
Well, I think that's a pretty good kind of thing.
I don't think there's any company in the world that does that.
And again, if you look at it, the only thing that we haven't instantly released is this
particular class of generative models, because it's not straightforward.
And because you have frickin Congresswoman petitioning to ban us by the NSA.
And a lot more stuff behind that.
Look, you know, we're gonna get B Corp status soon, which puts in our official documents
that we are mission focused, not profit focused.
At the same time, I'm going to build $100 billion company that helps a billion people.
We have some other things around governance that we'll be introducing as well.
But currently, the governance structure is simple, yet not ideal, which is that I personally
have control of board, ordinary common everything.
And so a lot is resting on my shoulders are not sustainable.
As soon as we figure that out, and how to maintain the independence and how to maintain
it so that we are dedicated to open, which I think is a superior business model, a lot
of people agree with, will implement that posthaste any suggestions, please do send
them our way.
But like I said, one core thing is, if we stop being open source, and go down the open
AI route, there's nothing we can do to stop the developers from releasing the code.
And without developers, what are we, you know, nice front end company that does a bit of
model deployment, though it'd be killing ourselves.
**Any plans for stability to tackle open source alternatives to AI code generators, like copilot and alpha code? (25:01)**
Yeah, you can go over to carper.ai, and see our code generation model that's training right now.
We released one of the FID based language models that will be core to that plus our
instruct framework, so that you can have the ideal complement to that.
So I think by Q1 of next year, we will have better code models than copilot.
And there's some very interesting things in the works there, you just look at our partners
and other things.
And again, there'll be open source available to everyone.
**Will support be added for inference at sizes other than 512 by default?**
Yeah, I mean, there are kind of things like that already.
So like, if you look at the recently released novel AI improvements to stable diffusion,
you'll see that there are details there as to how to implement arbitrary resolutions
similar to something like mid journey, I'll just post it there.
The model itself, like I said, enables that it's just that the kind of code wasn't there.
It was part of our expected upgrades.
And again, like different models have been trained at different sizes.
So we have a 768 model, a 512 model, et cetera, so 1024 model, et cetera, coming in the pipeline.
I mean, like, again, I think that not many people have actually tried to train models
yet.
You can get into grips with it, but you can train and extend this, again, view it as a
base of knowledge onto which you can adjust a bunch of other stuff.
**Do you have any plans to improve the model in terms of face, limbs, and hand generation? Is it possible to improve on specifics on this checkpoint?**
Yep, 100%.
So I think in the next day or so, we'll be releasing a new fine-tuned decoder that's
just a drop-in for any latent diffusion or stable diffusion model that is fine-tuned
on the face-lion dataset, and that makes better faces.
Then, as well, you can train it on, like, Hagrid, which is the hand dataset to create
better hands, et cetera.
Some of this architecture is known as a VAE architecture for doing that.
And again, that's discussed a bit in the novel AI thing, because they do have better hands.
And again, this knowledge will proliferate around that.
**I saw your partnership with AI Grant with Nat and Daniel. If you guys would support startups in case they aren't selected by them, any way startups can connect with you folks to get mentorship or guidance?**
We are building a grant program and more.
It's just that we're currently hiring people to come and run it.
That's the same as Bruce.Codes' question.
In the next couple of weeks, there will be competitions and all sorts of grants announced
to kind of stimulate the growth of some essential parts of infrastructure in the community.
And we're going to try and get more community involvement in that, so people who do great
things for the community are appropriately awarded.
There's a lot of work being done there.
**Is Stability AI considering working on climate crisis via models in some way? (28:23)**
Yes, and this will be announced in November.
I can't announce it just yet.
They want to do a big, grand thing, but you know.
We're doing that.
We're supporting several entities that are doing climate forecasting functions and working
with a few governments on weather patterns using transformer-based technologies as well.
**Which jobs do you think are most in danger of being taken by AI?**
I don't know, man.
It's a complex one.
I think that probably the most dangerous ones are call center workers and anything that
involves human-to-human interaction.
I don't know if you guys have tried character.ai.
I don't know if they've stopped it because you could create some questionable entities.
The...
It's very good.
And it will just get better because I think you look at some of the voice models we have
coming up, you can basically do emotionally accurate voices and all sorts of stuff and
voice-to-voice, so you won't notice a call center worker.
But that goes to a lot of different things.
I think that's probably the first for disruption before anything else.
I don't think that artists get disrupted that much, to be honest, by what's going on here.
Unless you're a bad artist, in which case you can use this technology to become a great
artist, and the great artist will become even greater.
So I think that's probably my take on that.
**What work is being done to improve the attention mechanism of stable diffusion to better handle and interpret composition while preserving artistic style? There are natural language limitations when it comes to interpreting physics from simple statements. Artistic style further deforms and challenges this kind of interpretation. Is stability AI working on high-level compositional language for use of generative models?**
The answer is yes.
This is why we spent millions of dollars releasing the new CLIP.
CLIP is at the core of these models.
There's a generative component and there is a guidance component, and when you infuse
the two together, you get models like they are right now.
The guidance component, we used CLIP-L, which was CLIP-Large, which was the largest one that
OpenAI released.
They had two more, H and G, which I believe are huge and gigantic.
We released H in the first version of G, which should take like a million A100 hours to do,
and that improves compositional qualities so that as that gets integrated into a new
version of stable diffusion, it will be at the level of DALY2, just even with a small
size.
There are some problems around this in that the model learns from both things.
It learns from the stuff the generative thing is fine-tuned on and from the CLIP models,
and so we've been spending a lot of time over the last few weeks, and there's another reason
for the delay, seeing what exactly does this thing know, because even if an artist isn't
in our training dataset, it somehow knows about it, and it turns out it was CLIP all
along.
So we really wanted to output what we think it outputs and not output what it shouldn't
output, so we've been doing a lot of work around that.
Similarly, what we found is that embedding pure language models like T5, XXL, and we
tried UL2 and some of these other models, these are like pure language models like GPT-3,
improves the understanding of these models, which is kind of crazy.
And so there's some work being done around that for compositional accuracy, and again,
you can look at the blog by Novel.ai where they extended the context window so that it
can accept three times the amount of input from this.
So your prompts get longer from I think like 74 to 225 or something like that, and there
are various things you can do once you do proper latence place exploration, which I
think is probably another month away, to really hone down on this.
I think again, a lot of these other interfaces from the ones that we support to others have
already introduced negative prompting and all sorts of other stuff.
You should have kind of some vector-based initialization, et cetera, coming soon.
**What are the technical limitations around recreating SD with a 1024 dataset rather than 512, and why not have varying resolutions for the dataset? Is the new model going to be a ton bigger?**
So version 3 right now has 1.4 billion parameters.
We've got a 4.3 billion parameter image in training and 900 million parameter image in
training.
We've got a lot of models training.
We're just waiting to get these things right before we just start releasing them one after
the other.
The main limitation is the lack of 1024 images in the training dataset.
Like Lion doesn't have a lot of high resolution images, and this is one of the things why
what we've been working on the last few weeks is to basically negotiate and license amazing
datasets that we can then put out to the world so that you can have much better models.
And we're going to pay a crap load for that, but again, release it for free and open source
to everyone.
And I think that should do well.
This is also why the upscaler that you're going to see is a two times upscaler.
That's good.
Four times upscaling is a bit difficult for us to do.
Like it's still decent because we're just waiting on the licensing of those images.
**Any plans for creating a worthy open source alternative, something like AI Dungeon or Character AI?**
Well, a lot of the Carper AI teams work around instruct models and contrastive learning should
enable Carper Character AI type systems on chatbots.
And you know, from narrative construction to others, again, it will be ideal there.
The open source versions of Novel AI and AI Dungeon, I believe the leading one is Cobold
AI.
So you might want to check that out.
I haven't seen what the case has been with that recently.
**When we'll be able to create full on movies with AI?**
I don't know, like five years again.
I'm just digging that out there.
Okay, if I was Elon Musk, I'd say one year.
I mean, it depends what you mean by a feature like movies.
So like animated movies, when you combine stable diffusion with some of the language
models and some of the code models, you should be able to create those.
Maybe not in a UFO table or Studio Bones style within two years, I'd say, but I'd say a five
year time frame for being able to create those in high quality, like super high res is reasonable
because that's the time it will take to create these high res dynamic VR kind of things.
To create fully photorealistic proper people movies, I mean, you can look at E.B.
Synth or some of these other kind of pathway analyses, it shouldn't be that long to be
honest.
It depends on how much budget and how quick you want to do it.
Real time is difficult, but you're going to see some really amazing real time stuff in
the next year.
Touch wood.
We're lining it up.
It's going to blow everyone's socks away.
That's going to require a freaking supercomputer, but it's not movie length.
It's something a bit different.
**Did you read the distillation of guided diffusion models paper? Do you have any thoughts on it? Like if it will improve things on consumer level hardware or just the high VRAM data centers?**
I mean, distillation and instructing these models is awesome.
And the step counts they have for kind of reaching cohesion are kind of crazy.
RiversideWigs has done a lot of work on a kind of DDPM fast solvent, but already reduced
the number of steps required to get to those stages.
And again, like I keep telling everyone, once you start chaining these models together,
you're going to get down really sub one second and further, because I think you guys have
seen image to image work so much better if you just even give a basic sketch than text
to image.
So why don't you chain together different models, different modalities to kind of get
them?
And I think it'll be easier once we release our various model resolution sizes plus upscalers
so you can dynamically switch between models.
If you look at the dream studio kind of teaser that I posted six weeks ago, that's why we've
got model chaining integrated right in there.
**who do you think should own the copyright of an image video made by an AI or do you think there shouldn't be an owner?**
I think that if it isn't based on copyrighted content, it should be owned by the prompter of the AI.
If the AI is a public model and not owned by someone else, otherwise it is almost like
a code creation type of thing.
But I'm not a lawyer and I think this will be tested severely very soon.
**Update on adding more payment methods for dream studio?**
I think we'll be introducing some alternate ones soon, the one that we won't introduce
is PayPal.
No, no PayPal, because that's just crazy what's going on there.
**With stable diffusion having been publicly released for over a month now
and with the release of version five around the corner, what is the most impressive implementation
you've seen someone create out of the application so far?**
I really love the dream booth stuff.
I mean, come on, that shit's crazy.
You know, even though some of you fine tuned me into kind of weird poses.
I think it was pretty good.
I didn't think we would get that level of quality.
I thought it would be a textual and version level quality.
Beyond that, I think that, you know, there's been this well of creativity, like you're
starting to see some of the 3D stuff come out and again, I didn't think we'd get quite
there even with the chaining.
I think that's pretty darn impressive.
**Are there any areas of the industry that is currently overlooked that you'll be excited to see the effects of diffusion based AI being used?**
Again, like I can't get away from this PowerPoint thing.
Like it's such a straightforward thing that causes so much real annoyance.
I think we could kind of get it out there.
I think it just requires kind of a few fine tuned models plus a code model plus a language
model to kind of kick it together.
I mean, diffusion is all about de-noising and information is about noise.
So our brains filter out noise and de-noise all the time.
So these models can be used in a ridiculous number of scenarios.
Like I said, we've got DNA diffusion model going on in OpenBIM, all that shit crazy,
right?
But I think right now I really want to see some of these practical high impact use cases
like the PowerPoint kind of thing.
**Do you have any plans to release a speech since this model likes script overdone voices?**
Yes, we have a plan to release a speech to speech model soon and some other ones around that.
I think AudioLM by Google was super interesting recently.
For those who don't know, that's basically you give it a snippet of a voice or of music
or something and it just extends it.
It's kind of crazy.
But I think we get the arbitrary kind of length thing there and combined with some other models
that could be really interesting.
**Do you have any thoughts on increasing the awareness of generative models? Is this something you see as important? How long do you think until the mass global population becomes aware of these models?**
I think I can't keep up as it is and I don't want to die.
But more realistically, we have a B2B2C model.
So we're partnering with the leading brands in the world and content creators to both
get their content so we can build better open models and to get this technology out to just
everyone.
Similar on a country basis, we have country level models coming out very soon.
So on the language side of things, you can see we released Polyglot, which is the best
Korean language model, for example, Vera, Luther AI and our support of them recently.
So I think you will see a lot of models coming soon, a lot of different kind of elements
around that.
**Will we always be limited by the hardware cost to run AI or do you expect something to change?**
Yeah, I mean, like this will run on the edge, it'll run on your iPhone in a year.
Stable diffusion will run on an iPhone in probably seconds, that level of quality.
That's again, a bit crazy.
**I'm unsure how to release licensed images based on SD output. Some suggest creative commons zero is fine.**
Okay, so if someone takes a CCO out image and violates the license, then something can
be done around that.
I would suggest that if you're worried about some of this stuff, you, CCO licensing, and
again, I am not a lawyer, please consult with a lawyer, does not preclude copyright.
And there's a transformational element that incorporates that.
If you look at artists like Necro 13 and Claire Selva and others, you will see that the outputs
usually aren't one shot, they are multi-sesic.
And then that means that this becomes one part of that, a CCO license part that's part
of your process.
Like, even if you use GFPGAN or upscaling or something like that, again, I'm not a lawyer,
please consult with one.
I think that should be sufficiently transformative that you can assert full copyright over the
output of your work.
**Is stability AI going to give commissions to artists?**
We have some very exciting in-house artists coming online soon.
Some very interesting ones, I'm afraid that's all I can say right now.
But yeah, we will have more art programs and things like that as part of our community
engagement.
It's just that right now it's been a struggle even to keep Discord and other things going
and growing the team.
Like, we're just over a hundred people now, God knows how many we actually need.
I think we probably need to hire another hundred more.
**A text-to-speech model too?**
Yep.
I couldn't release it just yet as my sister-in-law was running Synantic, but now that she's been absorbed by Spotify, we can release emotional text-to-speech.
Not soon though, I think that we want to do some extra work around that and build that
up.
**Is it possible to get vector images like an SVG file from stable diffusion or related systems?**
Not at the moment.
You can actually do that with a language model, as you'll find out probably in the next month.
But right now I would say just use a converter, and that's probably going to be the best way
to do that.
**Is there a place to find all stable AI-made models in one place?**
No, there is not, because we are disorganized.
We barely have a careers page up, and we're not really keeping a track of everything.
We are employing someone as an AI librarian to come and help coordinate the community
and some of these other things.
Again, that's just a one-stop shop there.
But yeah, also there's this collaborative thing where we're involved in a lot of stuff.
There's a blurring line between what we need and what we don't need.
We just are going to want to be the catalyst for all of this.
I think the best models go viral anyway.
**Where do you see stability AI in five years?**
Hopefully with someone else leading the damn thing so I can finish Elden Ring.
No, I mean, our aim is basically to build AI subsidiaries in every single country so
that there's localized models for every country and race that are all open and to basically
be the biggest, best company in the world that's actually aligned with you rather than
trying to suck up your attention to serve you ads.
I really don't like ads, honestly, unless they're artistic, I like artistic ads.
So the aim is to build a big company to list and to give it back to the people so ultimately
it's all owned by the people.
For myself, my main aim is to ramp this up and spread as much profit as possible into
Imagine Worldwide, our education arm run by our co-founder, which currently is teaching
kids literacy and numeracy in refugee camps in 13 months on one hour a day.
We've just been doing the remit to extend this and incorporate AI to teach tens of millions
of kids around the world that will be open source, hosted at the UN.
One laptop per child, but really one AI per child.
That's one of my main focuses because I think I did a podcast about this.
A lot of people talk about human rights and ethics and morals and things like that.
One of the frames I found really interesting from Vinay Gupta, who's a bit of a crazy guy,
but a great thinker, was that we should think about human rights in terms of the rights
of children because they don't have any agency and they can't control things and what is
their right to have a climate, what is their right to food and education and other things.
We should really provide for them and I'm going to use this technology to provide for
them so there's literally no child left behind, they have access to all the tools and technology
they need.
That's why creativity was a core component of that and communication, education and healthcare.
Again, it's not just us, all we are is the catalyst and it's the community that comes
and helps and extends that.
If you'd like to learn more about our education initiative, they're at Magic Worldwide.
Lots more on that soon as we scale up to tens of millions of kids.
**My question was about whether I have to pass down the rail license limitations when licensing SD based images or I can release as good. (45:06)**
Ah yes, you don't have to do rail license, you can release as is.
It's only if you are running the model or distributing the model to other people that
you have to do that.
**As a composer and audio engineer myself, I cannot imagine AI will approach the emotional intricacies and depths of complexity found in music by world class musicians, at least not anytime soon. That said, I'm interested in AI as a tool, would love to explore how it can be used to help in this production process. Is stability AI involved in this?**
Yes we are, I think someone just linked to harmonai when I play and we will be releasing
a whole suite of tools soon to extend the capability of musicians and make more people
into musicians.
And this is one of the interesting ones, like these models, they pay attention to the important
parts of any media.
So there's always this question about expressivity and humanity, I mean they are trained on humanity
and so they resonate and I think that's something that you kind of have to acknowledge and then
it's about aesthetics have been solved to a degree by this type of AI.
So something can be aesthetically pleasing, but aesthetics are not enough.
If you are an artist, a musician or otherwise, I'd say a coder, it's largely about narrative
and story.
And what does that look like around all of this?
Because things don't exist in a vacuum, it can be a beautiful thing or a piece of music,
but you remember it because you were driving a car when you were 18 with your best friends,
you know, or it was at your wedding or something like that.
That's when story matters, for music, for art, for other things as well like that.
**Are you guys working on LMs as well, something to compete with OpenAI GPT-3?**
Yes.
We recently released from the Carpa Lab, the instruct framework and we are training to
achieve chiller optimal models, which outperformed GPT-3 on a fraction of the parameters.
They will get better and better and better.
And then as we create localized data sets and the education data sets, those are ideal
for training foundation models at ridiculous power relative to the parameters.
So I think that it will be pretty great to say the least as we kind of focus on that.
EleutherAI, which was the first community that we properly supported and a number of stability
employees help lead that community.
The focus was GPT Neo and GPT-J, which were the open source implementations of GPT-3 but
on a smaller parameter scale, which had been downloaded 25 million times by developers,
which I think is a lot more use than GPT-3 has got.
But GPT-3 is fantastic or instruct GPT, which it really is.
I think this instruct model that took it down a hundred times.
Again, if you're technical, you can look at the Carpa community and you can see the framework
around that.
**In the future for other models, we are building an opt-in and opt-out system for artists and others that will lead to use in partnerships leading organizations. This model has some principles, the outputs are not direct for any single piece or initiatives of motion with regards to this.**
There will be announcements next week about this and various entities that we're bringing in place for that.
That's all I can say, because I'm not allowed to spoil announcements, but we've been working
super hard on this.
I think there's two or maybe three announcements, it'll be 17th and 18th will be the dates of
those.
**When will stability and EleutherAI be able to translate geese to speech in real time?**
I think the kind of honking models are very complicated.
Actually, this is actually very interesting.
People have actually been using diffusion models to translate animal speech and understand it.
If you look at something like whisper, it might actually be in reach.
Whisper by open AI, they open sourced it kindly, I wonder what caused them to do that, is a
fantastic speech to text model.
One of the interesting things about it is you can change the language you're speaking
in the middle of a sentence and it'll still pick that up.
So if you train it enough, then you'll be able to kind of do that.
So one of the entities we're talking with wants to train based on whale song to understand
whales.
Now this sounds a bit like Star Trek, but that's okay, I like Star Trek.
So we'll see how that goes.
**Will dream studio front-end be open source so it can be used on local GPUs? (50:25)**
I do not believe there's any plans for that at the moment because dream studio is kind
of our pro CMR end kind of thing, but you'll see more and more local GPU usage.
So like, you know, you've got visions of chaos at the moment on windows machines by softology
is fantastic, where you can run just about any of these notebooks like D forum and others
or HLKY or whatever.
And so I think that's kind of a good step.
Similarly, if you look at the work being done on the Photoshop plugin, it will have local
inference in a week or two.
So you can use that directly from Photoshop and soon many other plugins.
**What do you think of the situation where a Google engineer believed the AI chatbot achieved sentience?**
It did not.
He was stupid.
Unless you have a very low bar of sentience pose, you could, I mean, some people are barely
sentient.
It must be said, especially when they're arguing on the internet, never went an argument on
the internet.
That's another thing like facts don't really work on the internet.
A lot of people have preconceived notions.
Instead, you should try to just be like, you know, as open minded as possible and let people
agree to disagree.
**Thoughts on getting seamless equirectangular 360 degree and 180 degree and HDR outputs in one shot for image to text and text to image.**
I mean, you could use things like, I think I called it stream fusion, which was dream
fusions, stable diffusion kind of combined.
There are a bunch of data sets that we're working on to enable this kind of thing, especially
from GoPro and others.
But I think it'd probably be a year or two away still.
**Any plans for text-to-3d diffusion models?**
Yes, there are.
And they are in the works.
**With some of the recent backlash from artists, is there anything you wish that SD did differently in the earliest stages that would have changed the framing around image synthesis?**
No, really.
I mean, like the point is that these things can be fine-tuned anyway.
So I think people have attacked fine tuning.
I mean, ultimately it's like, I understand the fear, this is threatening to their jobs
and the thing cause anyone can kind of do it, but it's not like ethically correct for
them to say, actually, we don't want everyone to be artists.
So instead they focus on, it's taken my art and trained on my art and you know, it's impossible
for this to work without my art.
Not really.
So you train on ImageNet and it can still create just about any composition.
Part of the problem was having the clip model embedded in there because the
clip model knows a lot of stuff.
We don't know what's in the open AI dataset, um, as should we do kind of, and it's interesting.
think that all we can do is kind of learn from the feedback from the people that
aren't shouting at us or like, uh, you know, members of the team have received death threats
and other things which are completely over the line.
This is a reason why I think caution is the better part of what we're doing right
now.
You know, we have put ourselves in our way, like my inbox does look a bit
ugly, uh, in certain places, to try and calm things down and really listen to the
calmer voices there and try and build systems so people can be represented appropriately.
It's not an easy question.
I think it's incumbent on us to try and help facilitate this conversation
because it's an important question.
**Are you looking to decentralize GPU AI compute?**
Uh, yeah, we've got kind of models that enable that, um, hive minds that you'll see, um,
on the decentralized learning side as an example whereby I'm trained on distributed GPUs, um,
actually models.
I think that we need the best version of that is on reinforcement learning models.
I think those are deep learning models, especially when considering things like, uh, community
models, et cetera, because as those proliferate and create their own custom models bind to
your dream booth or others, there's no way that centralized systems can keep up.
But I think decentralized compute is pretty cheap though.
**Are we going to do nerf type models?**
Yes. I think nerfs are going to be the big thing.
They are, um, going to be supported by Apple and Apple hardware.
So I think you'll see lots of nerf type models there.
**Will AI lead to UBI? (56:09)**
Maybe. It'll either lead to UBI and utopia or panopticon that we can never escape from because the models that
were previously used to focus our attention and service ads will be used to control our brains instead.
And they're really good at that.
So, you know, no big deal, just two forks in the road.
That's the way we kind of do.
**When will we be able to generate games with AI?**
You can already generate games with AI.
So the code models allow you to create basic games, but then we've had generative games
for many years already.
**How's your faith influence your mission?**
I mean, it's just like all faiths are the same.
Do you want to others as you'd have done unto yourself, right?
The golden rule, um, for all the stuff around there.
I think people forget that we are just trying to do our best.
Like it can lead to bad things though.
So Robert chief rabbi, Jonathan Sacks, sadly past very smart guy had this concept of altruistic
evil with people who tried to do good, can do the worst evil because they believe they're doing good.
No one wants to be in our soul and bad, even if we have our arguments and it makes us forget
our humanity.
What I really want to focus on is this idea of public interest
and bring this technology to the masses because I don't want to have this world where I looked
at the future and there's this AI God that is controlled by a private enterprise.
Like that enterprise would be more powerful than any nation unelected and in control of
everything.
And that's not a future that I want from my children.
I think, because I would not want that done unto me and I think it should be
made available for people who have different viewpoints to me as well.
This is why, like I said, look, I know that there was a lot of tension over the weekend
and everything on the community, but we really shouldn't be the only community for this.
And we don't want to be the sole arbiter of everything here.
We're not open AI or deep mind or anyone like that.
We're really trying to just be the catalyst to build ecosystems where you can find your
own place, whether you agree with us or disagree with us.
Um, so yeah, I think that also it'd be nice when people of other faiths or no faith can
actually talk together reasonably.
Um, and that's one of the reasons that we accelerated AR and faith.org.
Again, you don't have to agree with it, but just realize these are some of the stories
that people subscribe to and everyone's got their own faith in something or other, literally
not.
**How are you going to train speed cost and TPUs versus a one hundreds or the cost of switching TensorFlow from PyTorch?**
We have code that works on both.
And we have had great results on TPU V4s, the horizontal and vertical scaling works
really nicely.
And gosh, there is something called a V5 coming soon.
That'd be interesting.
Um, you will see models trained across a variety of different architectures and we're trying
just about all the top ones there.
**Does StabilityAI have plans to take on investors at any point or have they already?**
We have taken on investors.
There will be an announcement on that.
We have given up zero control and we will not give up any control.
I am very good at this.
As I mentioned previously, the original stable diffusion model was financed by some
of the leading AI artists in the world and collectors.
And so, you know, we've been kind of community focused.
I wish that we could do a token sale or an IPO or something and be community focused,
but it just doesn't fit with regulations right now.
So anything that I can say is that we will and will always be independent.
**How much of an impact do you think AI will impact neural implant cybernetics? It appears one of the limiting facts of cybernetics is the input method, not necessarily the hardware. (1:00:24)**
I don't know.
I guess you have no idea too much, I never thought about that.
Like I think that it's probably required for the interface layer.
The way that you should look at this technology is that you've got the highest structure to
the unstructured world, right?
And this acts as a bridge between it.
So like with stable diffusion, you can communicate in images that you couldn't do otherwise.
Cybernetics is about the kind of interface layer between humans and computers.
And again, you're removing that in one direction and the cybernetics allow you to remove it
in the other direction.
So you're going to have much better information flow.
So I think it will have a massive impact from these foundation devices.
**Can you make cyberpunk 2077 not broken?**
I was the largest investor in CD project at one point and it is a crying shame what happened
there.
Uh, I have a lot of viewpoints on that one.
But you know, we can create like cyberpunk worlds of our own in what did I say?
Five years.
Yeah.
Not Elon Musk in there.
So that's going to be pretty exciting.
**Are you guys planning on creating any hardware devices? A more consumer-oriented one, which has AI as OS?**
Uh, we have been looking into customized ones.
Um, so some of the kind of edge architecture, but it won't be for a few years on the AI
side.
Actually, that will be, it'll probably be towards the next year because we've got thaton our tablets.
So we've got basically a fully integrated stack or tablets for education, healthcare, and others.
And again, we were trying to open source as much as possible.
So looking to risk five and alternative architectures there. probably announcement there in
Q1,
**Anything specific you'd like to see out of the community?**
I just like people to be nice to each other, right?
Like communities are hard.
It's hard to scale community.
Like humans are designed for one to 150 and what happens is that as we scale communities
bigger than that, this dark monster of our being, Moloch, kind of comes out.
People get like really angsty and there's always going to be education, there's always
going to be drama.
How many communities do you know that aren't drama and like, just consider what your aunts
do and they chat all the time.
It's all kind of drama.
I like to focus on being positive and constructive as much as possible and acknowledging
that everyone is bored humans.
But again, sometimes you make tough decisions.
I made a tough decision this weekend.
It might be right.
It might be wrong, but you know, it's what I thought was best for the community.
We wanted to have checks and balances and things, but it's a work in progress.
Like I don't know how many people we've got in the community right now, like 60,000
or something like that.
That's a lot of people and you know, I think it's, 78,000, that's a lot of fricking
people.
That's like a small town in the US or like a city in Finland or something like that.
I just like people to be excellent to each other.
**How are you Emad?**
I'm a bit tired.
Back in London for the first time in a long time, I was traveling, trying to get the education
thing set up.
There's a stability Africa set up as well.
There's some work that we're doing in Lebanon, which unfortunately is really bad.
I said stability does a lot more than image and it's just been a bit of a stretch
even now with a hundred people.
The reason that we're doing everything so aggressively is cause you kind of have
to because there's just a lot of unfortunateness in the world.
And I think you'd feel worse about yourself if you don't have to.
And there's an interesting piece I read recently, it's like, I know Simon freed, uh, FTX,
you know, he's got this thing about effective altruism.
He talks about this thing of expected utility.
How much impact can you make on the world?
And you have to make big bets.
So I made some really big bets.
I put all my money into fricking GPU's.
I really created together a team.
I got government international backing and a lot of stuff because I think you, everyone
has agency and you have to figure out where you can add the most agency and accelerate
things up there.
Uh, we have to bring in the best systems and we've built this multivariate system with
multiple communities and now we're doing joint ventures in every single country because we
think that is a whole new world.
There's another great piece Sequoia did recently about generative AI being a whole
new world that will create trillions.
We're at this tipping point right now.
And so I think unfortunately you've got to work hard to do that because it's a once in
a lifetime opportunity.
Just like everyone in this community here has a once in a lifetime opportunity.
You know about this technology that how many people in your community know about now?
Everyone in the world, everyone that you know will be using this in a few years and no one
knows the way it's going to go.
**What's a good way to handle possible tribalism, extremism?**
So if you Google me and me, my name, you'll see me writing in the wall street journal
and Reuters and all sorts of places about counter extremism.
It's one of my expert topics and unfortunately it's difficult with the social media echo
changers to kind of get out of that and you find people going in loops because sometimes
things aren't fair.
Like, you know, again, let's take our community.
For example, this weekend actions were taken, you know, the banning that we could sit down
fair.
And again, that's understandable because it's not a cut and dry, easy decision.
You had kind of the discussions going on loop.
You had people saying some really unpleasant things, you know, some of the stuff made me
kind of sad because I was exhausted and you know, people questioning my motivations and
things like that.
And again, it's your prerogative, but as a community member myself, it made me feel bad.
I think the only way that you can really fight extremism and some things like that is to
have checks and balances and processes in place.
The mod team have been working super hard on that.
I think this community has been really well-behaved, like, you know, it was super difficult
and some of the community members got really burned out during the beta because they had
to put up with a lot of shit, to put it quite simply.
But getting people on the same page, getting a common mission and kind of having a degree
of psychological safety where people can say what they want, which is really difficult
in a community where you don't know where everyone is.
That's the only way that you can get around some of this extremism and some of this hate
element.
Again, I think the common mission is the main thing.
I think everyone here is in a common mission to build cool shit, create cool shit.
And you know, like I said, the tagline kind of create, don't hate, right?
**Emad, real-life meetups for us members?**
Yeah, we're going to have little stability societies all over the place and hackathons.
We're just putting an events team together to really make sure they're well organized
and not our usual disorganized shambles.
But you know, feel free to do it yourselves, you know, like, we're happy to amplify it
when community members take that forward.
And the things we're trying to encourage are going to be like artistic oriented things,
get into the real world, go and see galleries, go and understand things, go and paint, that's
good painting lessons, etc.
As well as hackathons and all this more techy stuff, techy kind of stuff.
You can be part of the events team by messaging careers at stability.ai.
Again, we will have a careers page up soon with all the roles, we'll probably go to like
250 people in the next few months.
And yeah, it's going very fast.
**Any collaboration in China yet? Can we use Chinese clip to guide the current one or do we need to retrain the model, embed the language clip into the model?**
I think you'll see a Chinese variant of stable diffusion coming out very soon.
Can't remember what the current status is.
We do have a lot of plans in China, we're talking to some of the coolest entities there.
As you know, it's difficult due to sanctions and the Chinese market, but it's been heartening
to see the community expand in China so quickly.
And again, as it's open source, it didn't need us to go in there to kind of do that.
I'd say that on the community side, we're going to try and accelerate a lot of the engagement
things.
I think that the Doctor Fusion one's ongoing, you know, shout out to Dreitweik for Nerf
Gun and Almost 80 for kind of the really amazing kind of output there.
I don't think we do enough to appreciate the things that you guys post up and simplify
them.
And I really hope we can do better in future.
The mod team are doing as much as they can right now.
And again, will we try to amplify the voices of the artistic members of our community as
well, more and more, and give support through grants, credits, events and other things as
we go forward.
**Is there going to be a time when we have AI friends we create ourselves, personal companions speaking to us via our monitor, much of the same way a webcam call is done, high quality, et cetera?**
Yes, you will have "her" from Joachim Phoenix's movie, Her, with Scarlett Johansson whispering in your ear.
Hopefully she won't dump you at the end, but you can't guarantee that.
If you look at some of the text to speech being emotionally resonant, then, you know, it's kind of creepy, but it's very immersive.
So I think voice will definitely be there first. Again, try talking to a character.AI model and you'll see how good some of these chat
bots can be.
There are much better ones coming.
We've seen this already with Xiaoshi in China, so Alice, which a lot of people use for mental
health support and then Elisa in Iran.
So millions of people use these right now as their friends.
Again, it's good to have friends.
Again, we recommend 7cups.com if you want to have someone to talk to, but it's not the
same person each time or, you know, like just going out and making friends, but it's not
easy.
I think this will help a lot of people with their mental health, etcetera.
**How early do you think we are in this AI wave that's emerging? With how fast it's changing it's hard not to feel FOMO. (1:10:28)**
It is actually literally exponential.
So like when you do a log normal return of the number of AI papers that are coming out,
it's a straight line.
So it's literally an exponential kind of curve.
Like I can't keep up with it.
No one can keep up with it.
We have no idea what's going on.
And the technology advances like there's that meme.
Like one hour here is seven years on earth.
Like from interstellar, that's how life kind of feels like I was on top of it for a few
years and now it's like, I didn't even know what's happening.
Here we go.
It's a doubling rate of 24 months.
It's a bit insane.
**Any comments on Harmony AI? How close do you think we are to having music sound AI with the same accessibility afforded by stable diffusion?**
Now, Harmony has done a slightly different model of releasing dance diffusion gradually.
We're putting it out there as we license more and more data sets, some of the O and X and
other work that's going on.
I mean, basically considering that you're at the VQGAN moment right now, if you guys
can remember that from all of a year ago or 18 months ago, it'll go exponential again
because the amount of stuff here is going to go crazy.
Like generative AI, look at that Sequoia link I posted is going to be the biggest investment
theme of the next few years and literally tens of billions of dollars are going to be
deployed like probably next year alone into this sector.
And most of it will go to stupid stuff, some will go to good stuff, most will go to stupid
stuff but a decent amount will go to forwarding music in particular because the interesting
thing about musicians is that they're already digitally intermediated versus artists who
are not.
So artists, some of them use Procreate and Photoshop, a lot of them don't.
But musicians use synthesizers and DSPs and software all the time.
So it's a lot easier to introduce some of these things to their workflow and then make
it accessible to the people.
Yeah, musicians just want more snares.
You see the drum bass guy there.
**When will you launch the full DreamStudio and will it be able to do animations? If so, do you think it'll be more cost effective than using Colab?**
Very soon, yes, and yes, there we go.
Keep an eye here.
Then the next announcements won't be hopefully quite so controversial, but instead very exciting,
**Do you think an AI therapist could be something to address the lack of access to qualified mental health experts?**
I would rather have volunteers augmented by that.
So again, with 7Cups.com, we have 480,000 volunteers helping 78 million people each
month train on active listening that hopefully will augment by AI as we help them build their
models.
AI can only go so far, but the edge cases and the failure cases I think are too strong.
And I think again, a lot of care needs to be taken around that because people's mental
health is super important.
At the same time, we're trialing art therapy with stable diffusion as a mental health adjunct
in various settings from survivors of domestic violence to veterans and others.
And I think it will have amazing results because there's nothing quite like the magic of using
this technology.
And I think, again, magic is kind of the operative word here that we have.
That's how you know technology is cool.
**What are your thoughts on Buckminster Fuller's work and his thoughts on how to build a world that doesn't destroy himself?**
To be honest, I'm not familiar with it.
But I think the world is destroying itself at the moment and we've got to do everything we can to stop it.
Again, I mentioned earlier, one of the nice frames I've thought about this is really thinking
about the rights of children because they can't defend themselves.
And are we doing our big actions with a view to the rights of those children?
I think that children have a right to this technology and that's every child, not just
ones in the West.
And that's why I think we need to create personalized systems for them and infrastructure so they
can go up and kind of get out.
**How will generative models and unlimited custom tailored content to an audience of one impact how we value content? The paradox of choice is more options tend to make people more anxious and we get infinite choice right now. How do we get adapted to our new god-like powers in this hedonic treadmill? Is it a net positive for humanity? How much consideration are we given to potential bad outcomes?**
I think this is kind of one of those interesting things whereby, like I was talking to Alexander
Wang at scale about this and he posted something on everyone being in their own echo chambers
as you basically get hedonic to death, entertained to death.
Kind of like this WALL-E, you remember the fat guys with their VR headsets?
Yeah, kind of like that.
I don't think that's the case.
I think people will use this to create stories because we're prosocial narrative creatures
and the n equals one echo chambers are a result of the existing internet without intelligence
on the edge.
We want to communicate unless you have Asperger's like me and social communication disorder,
in which case communicating is actually quite hard, but we learned how to do it.
And I think, again, we're prosocial creatures that love seeing people listen to what we
do.
You've got likes and, you know, you've got this kind of hook model where you input something
you're triggered and then you wait for verification and validation.
So I think actually this will allow us to create our stories better and create a more
egalitarian internet because right now the internet itself is this intelligence amplifier
that means that some of the voices are more heard than others because some people know
how to use the internet and they drown out those who do not and a lot of people don't
even have access to this, so yeah.
**When do you think multi-models will emerge combining language, video and image?**
I think they'll be here by Q1 of next year and they'll be good.
I think that by 2024 they'll be truly excellent.
You can look at the DeepMind Gato paper on the autoregression of different modalities
on reinforcement learning to see some of the potential on this.
So Gato is just a 1.3 billion parameter model that is a generalist agent.
As we've kind of showed by merging image and others, these things can cross-learn just
like humans and I think that's fascinating and that's why we have to create models for
every culture, for every country, for every individual so we can learn from the diversity
and plurality of humanity to create models that are aligned for us instead of against
us.
And I think that's much better than stack more layers and build giant freaking supercomputers
to train models to serve ads or whatever.
**wrap up**
So with that, I bid you adieu.
It's been seven weeks, feels like seven years or seven minutes, I'm not
even sure anymore, like I think we made a time machine.
But hopefully we can start building stuff a lot more structured.
So thanks all and you know, stay cool, rock on, bye.