Multi-Object Editing in Personalized Text-To-Image Diffusion Model via Segmentation Guidance

Uploaded by

aswanthkmr03

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Multi-Object Editing in Personalized Text-To-Image Diffusion Model via Segmentation Guidance

Uploaded by

aswanthkmr03

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 979-8-3503-4485-1/24/$31.

00 ©2024 IEEE | DOI: 10.1109/ICASSP48485.2024.10447048

MULTI-OBJECT EDITING IN PERSONALIZED TEXT-TO-IMAGE DIFFUSION MODEL

VIA SEGMENTATION GUIDANCE

Haruka Matsuda† , Ren Togo†† , Keisuke Maeda†† , Takahiro Ogawa†† , Miki Haseyama††
†
School of Engineering, Hokkaido University, Japan
††
Faculty of Information Science and Technology, Hokkaido University, Japan
E-mail:{matsuda, togo, maeda, ogawa, mhaseyama}@lmd.ist.hokudai.ac.jp

ABSTRACT

This paper presents a personalized text-to-image diffusion

model for multiple object editing that can improve visual fidelity
of the target image and editing ability with a segmentation-based
restriction and continual learning. Multiple personalization tasks
face the problem of destabilization, especially when the number of
targets increases and the concepts of the targets are similar. The
proposed method introduces a segmentation guide into continual
learning to improve performance for multiple objects. The seg- Fig. 1: Example of generated images. “V*” is an identifier of the
mentation guide helps to separate each concept by restricting the object in the target image. The state-of-the-art methods have the
regions of target objects during both training and inference. The problem of losing the visual fidelity of the target objects when out-
proposed method learns these concepts by continual learning with putting similar concepts.
Elastic Weight Consolidation, and achieves the output of multiple
target objects with concept separation while maintaining visual fi-
delity. Experimental results demonstrate that the proposed method generate multiple objects by merging separately trained models or
successfully maintains visual fidelity for multiple target objects. performing the joint training of all targets.
Although the above methods have succeeded in improving per-
Index Terms— Personalization, diffusion model, ControlNet, formance in the output of multiple targets, there are issues with bal-
continual learning, generative AI. ancing between reconstruction and editability [7, 11]. Specifically,
it is difficult to maintain the visual fidelity of objects when recon-
structing them in the new scenes and to edit the output image based
1. INTRODUCTION on the input text prompt. In particular, Kumari et al. [7] report that
these capabilities are significantly impaired when generating similar
The digital transformation has rapidly advanced in the creative in- concepts, resulting in failed output, such as the output of only one
dustry, and many creators can effectively utilize digital devices to object or a mixed concept object as shown in Fig. 1. While train-
create their works [1]. In particular, the text-to-image models such ing with limited attention regions based on mask images helps to
as Stable Diffusion [2] are widely utilized as a tool for creative ex- maintain visual fidelity [9, 12–14], too much attention to the target
pression. While users can control the output images by changing the objects causes overfitting and losing editing ability. Conversely, it
input text prompt, generating images that align with personal prefer- has been shown that the models with a high editing ability cannot
ences requires users to search for appropriate text prompts. With the generally ensure the visual fidelity of the target [15]. Although there
increasing demands to reduce the burden on users, personalization is a trade-off between these two capabilities, the model that performs
methods are attracting attention [3, 4]. These methods enable image the personalization task for multiple subjects should maintain these
generation that reflects the characteristics of user-provided objects capabilities at a level sufficient to separate similar concepts.
by training the model to capture specific features. In this paper, we propose a personalization method for multiple
DreamBooth [3] is one of the main approaches for fine-tuning targets that can maintain visual fidelity and the editing ability in the
the diffusion model from several images of a specific target. By output of similar concepts. The proposed method introduces a seg-
forming a correspondence between a user-specified concept and a mentation guide through the encoder mechanism of ControlNet [16]
unique identifier to distinguish the concept, the model can generate and performs continual learning based on DreamBooth. The purpose
images of a specific target. Since DreamBooth has the potential to of introducing the segmentation guide is to separate concepts, and
be extended for more convenient personalization tasks, numerous continual learning aims to handle multiple target outputs. Specifi-
related methods have been proposed [5–8]. For instance, various cally, by associating a specific segmented region with a personaliza-
methods achieve the simultaneous output of multiple targets while tion target in the training phase, the proposed method achieves the
maintaining their visual fidelity [7–11], which is a challenge that output control not only with a specific text containing a unique iden-
has been difficult for previous methods to overcome. These methods tifier, but also with a specific segmentation guide. In the case of per-
sonalization for multiple targets, we separate each target concept by
This work was partly supported by JSPS KAKENHI Grant Numbers assigning a specific segmented region to each object. Moreover, we
JP21H03456 and JP23K11141. prevent performance degradation of previous tasks during continual

979-8-3503-4485-1/24/$31.00 ©2024 IEEE 8140 ICASSP 2024

uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 05:31:16 UTC from IEEE Xplore. Restrictions apply
learning by introducing Elastic Weight Consolidation (EWC) [17], where i is segmentation image data corresponding to the content
which contributes to maintaining the visual fidelity of the target ob- image data x. The first term of Eq. (1) and LCLDM have the same
jects. Finally, we summarize the contribution of this paper. purpose in terms of mapping conditions to target objects. Here, sim-
• By associating the segmentation guide with the personaliza- ply adding these two losses promotes overfitting, which is a problem
tion target, the proposed method succeeds in outputting the of DreamBooth. Moreover, in the case where the weight of LCLDM is
target from a specific segmented region and improves the edit- too large, the segmentation guide is ignored during inference and the
ing ability. In particular, we can separate similar concepts by training target is simply generated. To prevent the above problems,
mapping a specific segmented region for each target when we introduce a hyperparameter γ ∈ [0, 1] that changes as the training
personalizing multiple targets. progresses. As a result, the overall loss function in the fine-tuning
for the first object S1 is expressed as follows:
• Since the continual learning method with EWC restricts pa-
rameter updates when learning multiple target concepts, the
proposed method can maintain the visual fidelity of previ- LS1 = (1 − γ)LLDM (xS1 , yS1 ) + λreg LLDM (xreg reg
S 1 , yS 1 )
ously learned targets. (4)
+ γLCLDM (xS1 , yS1 , iS1 ).
Consequently, the proposed method achieves the output of multiple
objects with similar concepts, maintaining an appropriate balance By performing fine-tuning based on the proposed loss mentioned
between visual fidelity and editing ability as shown in Fig. 1. above, our method enables the formation of the correspondence be-
tween the specific target and the specific segmentation image.
2. PROPOSED METHOD

2.1. Personalized Text-to-Image Diffusion Model 2.3. Subsequent Training with EWC Loss
The proposed method performs fine-tuning based on the Dream-
The objective of the subsequent training is to prevent performance
Booth framework. In order to make the model learn the target object
degradation on previous tasks, in addition to achieving the goals of
S, we use content image data xS and its corresponding text yS for
the initial training. Following the ideas of Kirkpatrick et al. [17], we
personalization, and regularized image data xregS and its correspond-
address this issue by updating the diffusion model parameters with-
ing text ySreg for regularization as inputs. Note that the roles of xreg
S
out deviating from the past distribution. Specifically, the following
and ySreg are to address overfitting and language-drift problems [3].
loss LEWC is added to Eq. (4) in the second and later fine-tuning
Then yS consists of a unique identifier and a class descriptor of the
for objects Sn (n = 2, 3, . . . , N ; N being the number of the target
object, such as “a [identifier] [class noun]”, and ySreg includes only
objects) :
the class, such as “a [class noun]”. Although xreg S are generated by
the frozen diffusion model in the original DreamBooth, we alterna- X
tively use real images as xreg LEWC (θSn ,i ) = FSn−1 ,i (θSn ,i − θS∗ n−1 ,i )2 , (5)
S because utilizing real images is effec-
tive in maintaining the visual fidelity [7]. The loss function LS can i

be expressed as follows: where θSn is a network parameter being updated, θS∗ n−1 is a network
parameter obtained from the fine-tuned model of the previous ses-
LS = LLDM (xS , yS ) + λreg LLDM (xreg reg
S , yS ), (1) sion, FSn−1 is a Fisher information matrix calculated in each fine-
where λreg indicates the importance of the class prior, and LLDM is tuning session, and i is the index of the parameter corresponding to
the loss of Latent Diffusion Model (LDM) [2] expressed by the fol- the layer to be updated. Note that both θS∗ n−1 and FSn−1 are treated
lowing equation: as constants in the calculation of the loss LEWC . Here, the Fisher
information matrix serves as an approximation of the posterior dis-
tribution [20], and the importance of the previous task is aggregated
√ √
LLDM (x, y) = E kǫ − ǫθ ( ᾱt x + 1 − ᾱt ǫ, t, τθ (y))k22 , (2) for each layer of the model [21]. The matrix value is defined as

follows:
where t is a timestep, ᾱt is a step size based on a variance schedule, ǫ "
is a random noise, τθ (·) is the text encoder of Contrastive Language- 2 #
∂
Image Pre-Training (CLIP) [18], and ǫθ (·, ·, ·) is the U-Net [19]. F =E log L(θ|x) , (6)
∂θ
The proposed method adds an additional equation to LS to learn the
correspondence with the segmentation image and to limit excessive where L(θ|x) is the likelihood function and can be calculated by ap-
updating of the parameters. proximating the variational lower bound [22]. Here, according to the
theory of Denoising Diffusion Probabilistic Models (DDPM) [23],
2.2. Initial Training with ControlNet Loss the definition of the diffusion model loss starts with optimizing a
negative log-likelihood. In the subsequent steps, this loss equation is
The purpose of the first training session is to form a correspondence transformed to the base form of LLDM by splitting the equation, re-
between the specific target and the segmented image, not only the ducing a constant term, and simplifying the coefficients. Since these
specific text. To perform this additional conditioning, the following operations are all linear transformations, Eq. (2) can be regarded as
loss LCLDM is added to Eq. (1). a proportional function of the log-likelihood. Therefore, we can cal-
culate the Fisher matrix of the latent diffusion model as expressed

√ by the following equation:
LCLDM (x, y, i) = E kǫ − ǫθ ( ᾱt x  
2
(3)

√
∂
FSn ,i ≈ E  LLDM (xSn , ySn ) . (7)
+ 1 − ᾱt ǫ, t, τθ (y), i)k22 , ∂θ
θ=θSn ,i

8141

uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 05:31:16 UTC from IEEE Xplore. Restrictions apply
U-Net. By repeating the denoising loop with the above-mentioned
training flow, the proposed method performs training with segmen-
tation image conditioning.

2.5. Inference Phase

In the inference phase, the proposed method outputs multiple tar-

gets, separating each concept by referencing the segmentation guide
via a semantic segmentation model from ControlNet. We control
the output of multiple objects through the operation of input text
prompt and segmentation guide images. Specifically, the segmen-
tation guide contains specific segmented regions corresponding to
each target. This correspondence is formed in the training phase.
Similarly, the input text prompts contains words with unique iden-
tifiers corresponding to each target. The proposed method sets the
Fig. 2: Architecture overview of the proposed method in the training detailed content of the output using a text prompt, as in conventional
phase. The frozen ControlNet is utilized to propagate the segmenta- text-to-image methods, and specifies the approximate pixel position
tion feature to the U-Net. of the target using the segmentation guide. Consequently, the pro-
posed method succeeds in outputting multiple objects while separat-
The calculated matrix contains information about how the parameter ing similar concepts.
contributes to the task performance. Finally, the loss function in the
fine-tuning for the subsequent objects Sn is expressed as follows:
3. EXPERIMENTS

LSn = (1 − γ)LLDM (xSn , ySn ) + λreg LLDM (xreg reg

S n , yS n ) Dataset. For the target images for personalization, we used the
+ γLCLDM (xSn , ySn , iSn ) (8) datasets provided by DreamBooth [3] and Custom-Diffusion [7].
+ λSn−1 LEWC (θSn ,i ), The DreamBooth dataset consists of 30 subjects with 5-6 images
for each target, and the Custom-Diffusion dataset consists of 101
where λSn−1 is a hyperparameter that indicates the importance of subjects with 3-15 images for each target. For the regularization im-
the past task to the current task and balances with Eq. (4). Con- ages, we used the MSCOCO [24] dataset and the Flicker30K [25]
sequently, the proposed method can restrict the parameter update dataset. We compared our method with the following three state-of-
related to the previous tasks. the-art methods: (1) Textual Inversion (TI) [4], (2) CustomDiffusion
(CD) [7], and (3) ED-LoRA [9].
Evaluation Metrics. To evaluate the performance of the model,
2.4. Training Phase
we used two indices based on the CLIP score. (1) Image-Alignment:
In order to perform fine-tuning with additional conditioning, we con- We calculate the similarity between the generated image contain-
nect pre-trained ControlNet to U-Net as shown in Fig. 2. Since the ing the personalization target and the real image of the personaliza-
function of ControlNet is an image encoder that inputs the segmen- tion target used for training. This can evaluate whether the model
tation image as an additional condition, we freeze all layers of the produces output that captures the characteristics of the target. (2)
ControlNet so that the feature map obtained from the segmentation Text-Alignment: We calculate the similarity between the text prompt
image is not modified. Furthermore, the proposed method freezes used to generate the image containing the personalization target and
the blocks that do not contain the attention layer of U-Net. This the generated image. However, words containing identifiers used in
procedure is based on the analysis of Kumari et al. [7], which re- the generation prompt (e.g., a V* dog) are replaced with words of
vealed that the attention layer significantly contributes to the person- the class to which the target belongs (e.g., a dog). It can evaluate
alization task in fine-tuning. Specifically, the U-Net of the diffusion whether the model reflects the textual content in the images.
model consists of nine main blocks: four down blocks, one middle Training details. We performed fine-tuning for the four targets
block, and four up blocks. The down block and the up block nearest and evaluated the model in two scenarios: single-target output and
to the middle block do not contain the attention layer. Therefore, multiple-target output. Following the experiments of Kumari et
the proposed method freezes these two blocks and updates the other al. [7], we generated the following four types of prompts for each
blocks. This operation prevents the update of parameters unrelated target: (A) scene change without changing the target object, (B)
to the personalization task and contributes to maintaining model per- insert a new object along with a target object, (C) style change of
formance. the target object, and (D) material change of the target object. We
The proposed method performs fine-tuning with the above struc- used two prompts for each type, totaling eight prompts per target
ture based on the training flow of LDM [2]. Specifically, the main object. From each prompt, we generated 25 images, with a total
input to the U-Net structure is the noisy image xS,t , and the main in- of 200 images per object for evaluation. We created segmentation
put to the ControlNet structure is the segmentation image iS , which images for reference during training and inference, respectively.
corresponds to xS,t within one training step. The two structures also The segmentation images for training were generated by passing the
receive a text yS and the timestep t ∼ [0, . . . , T ] as inputs. The U- target image through MaskDINO [26], a Segment Anything Model.
Net structure estimates the noise to be removed from xS,t at time For inference, we generated segmentation images by randomly en-
t. The ControlNet part adds residual blocks from its down blocks larging and reducing the segmented regions used in training, then
and middle block to the corresponding middle and up blocks of the pasting them at arbitrary locations on a solid-color image.

8142

and ED-LoRA [9] generate almost the same number of objects as

specified, but do not fully separate the concepts. Besides, CD has
difficulty in reflecting the textual contents of prompts (C) and (D).
In contrast, our proposed method succeeds in minimizing concept
mixing and generates images that maintain the visual fidelity of the
target.

3.2. Quantitative Comparison

Figure 4 shows the quantitative results of our method and the state-
of-the-art methods [4, 7, 9]. This figure shows that the training step
800 of the proposed method is the best image alignment for both sin-
gle and multiple target outputs, indicating that the proposed method
is able to generate images that better capture the visual fidelity of the
target. TI records better text alignment than the other methods, but
Fig. 4: Results of a CLIP score for single target objects (left), and tends to be lower image alignment. Conversely, the optimal balance
multiple target objects (right). The crossed markers indicate the av- between image and text alignment is reached at training step 600 in
erage values of the four targets. The blue dotted line shows the our method, confirming its overall superior performance.
trajectory of the proposed method as the number of training steps
changes.
4. CONCLUSION

We have proposed a novel personalization method for multiple tar-

3.1. Qualitative Comparison gets with continual learning by leveraging a segmentation guide. Our
method achieves the separation of similar concepts by utilizing a
Figure 3 shows the qualitative results of outputting multiple person- segmentation guide and preserves the visual fidelity of multiple ob-
alization targets from four types of prompts (A)-(D). For all four jects through the use of EWC in continual learning. Experimental
types of prompts, the proposed method successfully generates im- results demonstrate that our method outperforms state-of-the-art ap-
ages that capture features from the target image while also reflecting proaches in terms of visual fidelity and is capable of generating mul-
the textual content. For example, TI [4] suffers from the simulta- tiple targets with distinct concepts.
neous output of multiple objects maintaining visual fidelity. CD [7]

8143

uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 05:31:16 UTC from IEEE Xplore. Restrictions apply
5. REFERENCES [15] O. Avrahami, K. Aberman, O. Fried, D. Cohen-Or, and
D. Lischinski, “Break-a-scene: Extracting multiple concepts
[1] M. Henderson, K. Bokor, M. Dookie, and M. Shirotori, “Cre- from a single image,” in Proc. ACM SIGGRAPH Conference
ative economy outlook 2022,” United Nations, pp. 70–75, and Exhibition on Computer Graphics and Interactive Tech-
2022. niques in Asia, 2023, pp. 1–12.
[2] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Om- [16] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional con-
mer, “High-resolution image synthesis with latent diffusion trol to text-to-image diffusion models,” in Proc. IEEE Interna-
models,” in Proc. IEEE/CVF Conference on Computer Vision tional Conference on Computer Vision, 2023, pp. 3836–3847.
and Pattern Recognition, 2022, pp. 10684–10695.
[17] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Des-
[3] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and jardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho,
K. Aberman, “Dreambooth: Fine tuning text-to-image diffu- A. Grabska-Barwinska, et al., “Overcoming catastrophic for-
sion models for subject-driven generation,” in Proc. IEEE/CVF getting in neural networks,” National Academy of Sciences,
Conference on Computer Vision and Pattern Recognition, vol. 114, no. 13, pp. 3521–3526, 2017.
2023, pp. 22500–22510.
[18] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh,
[4] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano,
S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al.,
G. Chechik, and D. Cohen-Or, “An image is worth one
“Learning transferable visual models from natural language
word: Personalizing text-to-image generation using textual in-
supervision,” in Proc. International Conference on Machine
version,” in Proc. International Conference on Learning Rep-
Learning, 2021, pp. 8748–8763.
resentations, 2023, pp. 1–31.
[19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolu-
[5] W. Chen, H. Hu, Y. Li, N. Rui, X. Jia, M. Chang, and W. W.
tional networks for biomedical image segmentation,” in Proc.
Cohen, “Subject-driven text-to-image generation via appren-
International Conference on Medical Image Computing and
ticeship learning,” arXiv preprint arXiv:2304.00186, 2023.
Computer-Assisted Intervention, 2015, pp. 234–241.
[6] R. Gal, M. Arar, Y. Atzmon, A. H. Bermano, G. Chechik,
and D. Cohen-Or, “Encoder-based domain tuning for fast per- [20] A. Ly, M. Marsman, J. Verhagen, R. P. P. P. Grasman, and
sonalization of text-to-image models,” ACM Transactions on E. Wagenmakers, “A tutorial on fisher information,” Journal
Graphics, vol. 42, no. 4, pp. 1–13, 2023. of Mathematical Psychology, vol. 80, pp. 40–55, 2017.
[7] N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J. Zhu, [21] Y. Li, R. Zhang, J. C. Lu, and E. Shechtman, “Few-shot im-
“Multi-concept customization of text-to-image diffusion,” in age generation with elastic weight consolidation,” Advances
Proc. IEEE/CVF Conference on Computer Vision and Pattern in Neural Information Processing Systems, vol. 33, pp. 15885–
Recognition, 2023, pp. 1931–1941. 15896, 2020.
[8] Y. Tewel, R. Gal, G. Chechik, and Y. Atzmon, “Key-locked [22] C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner, “Varia-
rank one editing for text-to-image personalization,” in Proc. tional continual learning,” in Proc. International Conference
ACM Special Interest Group on Computer Graphics, 2023, pp. on Learning Representations, 2018, pp. 1–18.
1–11. [23] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion proba-
[9] Y. Gu, X. Wang, J. Z. Wu, Y. Shi, Chen Y., Z. Fan, W. Xiao, bilistic models,” Advances in Neural Information Processing
R. Zhao, S. Chang, W. Wu, Y. Ge, Shan Y., and M. Z. Shou, Systems, vol. 33, pp. 6840–6851, 2020.
“Mix-of-show: Decentralized low-rank adaptation for multi- [24] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays,
concept customization of diffusion models,” Advances in Neu- Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence
ral Information Processing Systems, 2023. Zitnick, “Microsoft COCO: Common objects in context,” in
[10] L. Han, Y. Li, H. Zhang, P. Milanfar, D. Metaxas, and F. Yang, Proc. IEEE European Conference on Computer Vision, 2014,
“SVDiff: Compact parameter space for diffusion fine-tuning,” pp. 740–755.
in Proc. IEEE International Conference on Computer Vision,
[25] P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, “From im-
2023, pp. 7323–7334.
age descriptions to visual denotations: New similarity metrics
[11] Z. Liu, R. Feng, K. Zhu, Y. Zhang, K. Zheng, Y. Liu, D. Zhao, for semantic inference over event descriptions,” Transactions
J. Zhou, and Y. Cao, “Cones: Concept neurons in diffusion of the Association for Computational Linguistics, vol. 2, pp.
models for customized generation,” in Proc. International 67–78, 2014.
Conference on Machine Learning, 2023, pp. 21548–21566.
[26] F. Li, H. Zhang, H. Xu, S. Liu, L. Zhang, L. M. Ni, and H. Y.
[12] Y. Wei, Y. Zhang, Z. Ji, J. Bai, L. Zhang, and W. Zuo, Shum, “Mask DINO: Towards a unified transformer-based
“ELITE: Encoding visual concepts into textual embeddings for framework for object detection and segmentation,” in Proc.
customized text-to-image generation,” in Proc. IEEE Inter- IEEE/CVF Conference on Computer Vision and Pattern Recog-
national Conference on Computer Vision, 2023, pp. 15943– nition, 2023, pp. 3041–3050.
15953.
[13] R. Zhang, Z. Jiang, Z. Guo, S. Yan, J. Pan, Hao Dong, Peng
Gao, and Hongsheng Li, “Personalize segment anything model
with one shot,” arXiv preprint arXiv:2305.03048, 2023.
[14] Y. Watanabe, R. Togo, K. Maeda, T. Ogawa, and M. Haseyama,
“Text-guided facial image manipulation for wild images via
manipulation direction-based loss,” in Proc. IEEE Interna-
tional Conference on Image Processing, 2023, pp. 361–365.

8144

uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 05:31:16 UTC from IEEE Xplore. Restrictions apply

Spectrum Reading Workbook, Grade 8
From Everand
Spectrum Reading Workbook, Grade 8
Spectrum
No ratings yet
Google Vs Yahoo Group 8 Research Report
No ratings yet
Google Vs Yahoo Group 8 Research Report
54 pages
Cui_IDAdapter_Learning_Mixed_Features_for_Tuning-Free_Personalization_of_Text-to-Image_Models_CVPRW_2024_paper
No ratings yet
Cui_IDAdapter_Learning_Mixed_Features_for_Tuning-Free_Personalization_of_Text-to-Image_Models_CVPRW_2024_paper
10 pages
Team Project Report
No ratings yet
Team Project Report
54 pages
Paper 2
No ratings yet
Paper 2
23 pages
ViewDiff - 3D - Consisitent Image Generation With Text To Image Models
No ratings yet
ViewDiff - 3D - Consisitent Image Generation With Text To Image Models
22 pages
Personalized Text 2 Image by Rl
No ratings yet
Personalized Text 2 Image by Rl
17 pages
Group - 45 Research Paper - Final
No ratings yet
Group - 45 Research Paper - Final
8 pages
Virtual Try On
No ratings yet
Virtual Try On
10 pages
Patashnik_Localizing_Object-Level_Shape_Variations_with_Text-to-Image_Diffusion_Models_ICCV_2023_paper
No ratings yet
Patashnik_Localizing_Object-Level_Shape_Variations_with_Text-to-Image_Diffusion_Models_ICCV_2023_paper
11 pages
Image Based Virtual Try On Network
No ratings yet
Image Based Virtual Try On Network
4 pages
2309.03453v2
No ratings yet
2309.03453v2
22 pages
Fashion Synthesis With Structural Coherence
No ratings yet
Fashion Synthesis With Structural Coherence
9 pages
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
No ratings yet
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
22 pages
Ladi-Vton: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
No ratings yet
Ladi-Vton: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
15 pages
Autodecoding Latent 3D Diffusion Models
No ratings yet
Autodecoding Latent 3D Diffusion Models
22 pages
2410_3D-Adapter
No ratings yet
2410_3D-Adapter
22 pages
Dress IEEE 8
No ratings yet
Dress IEEE 8
10 pages
Nataniel Ruiz Dreambooth Fine Tuning Text To Image
No ratings yet
Nataniel Ruiz Dreambooth Fine Tuning Text To Image
11 pages
Dragondiffusion: Enabling Drag-Style Manipulation On Diffusion Models
No ratings yet
Dragondiffusion: Enabling Drag-Style Manipulation On Diffusion Models
10 pages
Fashion Recommendation System Reflecting Individuals Preferred Style
No ratings yet
Fashion Recommendation System Reflecting Individuals Preferred Style
2 pages
Text-To-image Editing by Image Information Removal
No ratings yet
Text-To-image Editing by Image Information Removal
10 pages
LaDI VTON
No ratings yet
LaDI VTON
15 pages
Sync Dreamer
No ratings yet
Sync Dreamer
13 pages
2308.16512v4
No ratings yet
2308.16512v4
21 pages
Ruiz DreamBooth Fine Tuning Text-to-Image Diffusion Models For Subject-Driven Generation CVPR 2023 Paper
No ratings yet
Ruiz DreamBooth Fine Tuning Text-to-Image Diffusion Models For Subject-Driven Generation CVPR 2023 Paper
11 pages
Cui_Dressing_in_Order_Recurrent_Person_Image_Generation_for_Pose_Transfer_ICCV_2021_paper
No ratings yet
Cui_Dressing_in_Order_Recurrent_Person_Image_Generation_for_Pose_Transfer_ICCV_2021_paper
10 pages
SDFusion
No ratings yet
SDFusion
10 pages
Dream Booth
No ratings yet
Dream Booth
25 pages
Fast and Robust Virtual Try On Based On Parser Free Generative Adversarial Network
No ratings yet
Fast and Robust Virtual Try On Based On Parser Free Generative Adversarial Network
10 pages
Mpvton, Iccv 2019 PDF
No ratings yet
Mpvton, Iccv 2019 PDF
11 pages
Face2Diffusion For Fast and Editable Face Personalization
No ratings yet
Face2Diffusion For Fast and Editable Face Personalization
15 pages
Zeng 等 - 2020 - TileGAN Category-Oriented Attention-based High-qu
No ratings yet
Zeng 等 - 2020 - TileGAN Category-Oriented Attention-based High-qu
14 pages
GS-VTON Controllable 3D Virtual Try-On With Gaussi
No ratings yet
GS-VTON Controllable 3D Virtual Try-On With Gaussi
21 pages
04-personalization&editing_paul_final
No ratings yet
04-personalization&editing_paul_final
47 pages
Object Detection Using Deep CNNs Trained On Synthetic Images
No ratings yet
Object Detection Using Deep CNNs Trained On Synthetic Images
8 pages
Fashion Diffusion Control
No ratings yet
Fashion Diffusion Control
25 pages
2503.12590v1
No ratings yet
2503.12590v1
13 pages
Komprimiertes PDF 2403.05139v3
No ratings yet
Komprimiertes PDF 2403.05139v3
30 pages
Zero-1-To-3: Zero-Shot One Image To 3D Object
No ratings yet
Zero-1-To-3: Zero-Shot One Image To 3D Object
13 pages
Cui Dressing in Order
No ratings yet
Cui Dressing in Order
19 pages
Feng_Weakly_Supervised_High-Fidelity_Clothing_Model_Generation_CVPR_2022_paper
No ratings yet
Feng_Weakly_Supervised_High-Fidelity_Clothing_Model_Generation_CVPR_2022_paper
10 pages
2403.05139v2
No ratings yet
2403.05139v2
30 pages
2023-NeurIPS-High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
No ratings yet
2023-NeurIPS-High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
36 pages
Swapping Autoencoder For Deep Image Manipulation: Webpage
No ratings yet
Swapping Autoencoder For Deep Image Manipulation: Webpage
23 pages
Automated Virtual Product Placement and Assessment in Images Using Diffusion Models
No ratings yet
Automated Virtual Product Placement and Assessment in Images Using Diffusion Models
9 pages
2504.02160v1
No ratings yet
2504.02160v1
21 pages
1.Thesis Book Omar
No ratings yet
1.Thesis Book Omar
55 pages
Literature Review (DL) - 1
No ratings yet
Literature Review (DL) - 1
7 pages
Construction of CNN Model Based On Hard-Assigned Coding of Image Features
No ratings yet
Construction of CNN Model Based On Hard-Assigned Coding of Image Features
5 pages
Customizing Text-to-Image Models With A Single Image Pair
No ratings yet
Customizing Text-to-Image Models With A Single Image Pair
27 pages
Deep Learning Notes PDF
No ratings yet
Deep Learning Notes PDF
26 pages
Break-A-Scene: Extracting Multiple Concepts From A Single Image
No ratings yet
Break-A-Scene: Extracting Multiple Concepts From A Single Image
21 pages
Thuy T. Pham: By, U. of Technology Sydney
No ratings yet
Thuy T. Pham: By, U. of Technology Sydney
5 pages
3D Generative Models A Survey
No ratings yet
3D Generative Models A Survey
21 pages
978-0-7503-6244-3.preview (1)
No ratings yet
978-0-7503-6244-3.preview (1)
56 pages
Automatic Defect Detection of Print Fabric Using C
No ratings yet
Automatic Defect Detection of Print Fabric Using C
8 pages
April Monthly Collection, Grade K
From Everand
April Monthly Collection, Grade K
Carson Dellosa Education
No ratings yet
Great Careers in Engineering
From Everand
Great Careers in Engineering
Meg Gaertner
No ratings yet
Glass
From Everand
Glass
Trudy Becker
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Towards General Artistic Intelligence
No ratings yet
Towards General Artistic Intelligence
5 pages
Semantic Network Representation
No ratings yet
Semantic Network Representation
13 pages
Why Is It Important To Review Literature Before Identifying A Research Problem
No ratings yet
Why Is It Important To Review Literature Before Identifying A Research Problem
8 pages
Towards A Computational Model of General Cognitive Control Using Artificial Intelligence, Experimental Psychology and Cognitive Neuroscience
No ratings yet
Towards A Computational Model of General Cognitive Control Using Artificial Intelligence, Experimental Psychology and Cognitive Neuroscience
244 pages
IJCVML
No ratings yet
IJCVML
2 pages
Flowchart, Bounding Box and Conclusion
No ratings yet
Flowchart, Bounding Box and Conclusion
2 pages
A_Situation_Based_Predictive_Approach_for_Cybersecurity_Intrusion_Detection_and_Prevention_Using_Machine_Learning_and_Deep_Learning_Algorithms_in_Wireless_Sensor_Networks_of_Industry_4.0
No ratings yet
A_Situation_Based_Predictive_Approach_for_Cybersecurity_Intrusion_Detection_and_Prevention_Using_Machine_Learning_and_Deep_Learning_Algorithms_in_Wireless_Sensor_Networks_of_Industry_4.0
20 pages
BB Project
No ratings yet
BB Project
90 pages
Bjit CV
No ratings yet
Bjit CV
3 pages
PDF 2024 Global Talent Trends Report en
No ratings yet
PDF 2024 Global Talent Trends Report en
68 pages
Data Analytics & Data Sc, ML and AI
No ratings yet
Data Analytics & Data Sc, ML and AI
5 pages
Ge HCInvestorDay 12022019
No ratings yet
Ge HCInvestorDay 12022019
75 pages
Unit 1
100% (1)
Unit 1
77 pages
Deep+Learning+Mind+Map+PDF+Download
No ratings yet
Deep+Learning+Mind+Map+PDF+Download
1 page
LLMs - Liquid Foundation Models
No ratings yet
LLMs - Liquid Foundation Models
13 pages
Unit 1a - Fundamentals of Deep Learning
No ratings yet
Unit 1a - Fundamentals of Deep Learning
54 pages
Artificial Intelligence & Machine Learning
No ratings yet
Artificial Intelligence & Machine Learning
82 pages
Using IBM Watson To Answer Two Important Questions About Your Customers - PDF
No ratings yet
Using IBM Watson To Answer Two Important Questions About Your Customers - PDF
12 pages
AI Lab CDP Format
No ratings yet
AI Lab CDP Format
9 pages
Artificial-Intelligence-ppt
No ratings yet
Artificial-Intelligence-ppt
13 pages
The Ultimate Guide To Managing A Remote Accounting Team
No ratings yet
The Ultimate Guide To Managing A Remote Accounting Team
12 pages
Course Information Sheet
No ratings yet
Course Information Sheet
7 pages
Confluence 2025
No ratings yet
Confluence 2025
8 pages
Brain Tumor Detection Based On Machine Learning Algorithms: Komal Sharma Akwinder Kaur Shruti Gujral
No ratings yet
Brain Tumor Detection Based On Machine Learning Algorithms: Komal Sharma Akwinder Kaur Shruti Gujral
5 pages
Research Jollibot
100% (1)
Research Jollibot
21 pages
Artificial Neural Networks: Classification Using Multilayer Perceptron Model
No ratings yet
Artificial Neural Networks: Classification Using Multilayer Perceptron Model
15 pages
AI Companion Market Size And Share _ Industry Report, 2030
No ratings yet
AI Companion Market Size And Share _ Industry Report, 2030
11 pages
1 s2.0 S2665917421001951 Main
No ratings yet
1 s2.0 S2665917421001951 Main
4 pages
Artificial Intelligence in Early Drug Discovery Enabling Precision Medicine
No ratings yet
Artificial Intelligence in Early Drug Discovery Enabling Precision Medicine
18 pages