Enabling Multi Modal Search for Inspirational Design Stimuli Using Deep Learning
Enabling Multi Modal Search for Inspirational Design Stimuli Using Deep Learning
Introduction
During idea generation, designers are known to benefit from external inspirational stimuli
toward achieving desirable design outcomes such as greater novelty, feasibility, or innovative-
ness (Chan et al., 2011; Fu et al., 2013b; Goucher-Lambert et al., 2020). The efficacy of a stim-
ulus in providing inspiration during the design process can depend on a variety of features. For
example, the modality of stimulus representation, analogical distance of the example to the
design problem, and the timing of example delivery have all been shown to impact the way
in which designers utilize stimuli (Linsey et al., 2008; Tseng et al., 2008; Chan et al., 2011).
Inspirational stimuli may vary with respect to modality of presentation. Different uses of visual
stimuli to support design ideation have been explored, such as when combined with text
(Borgianni et al., 2017), other images (Hua et al., 2019), or in contrast to interactions with
physical products (Toh and Miller, 2014). Representing stimuli visually compared to physi-
cally, or when combined with textual examples, has been shown to increase idea novelty
(Linsey et al., 2008; Toh and Miller, 2014). The impact of analogical distance of stimuli
from the design problem is also important to consider. Relative to the designer’s approach
to a design problem, far-field examples have been found to contribute to idea novelty
(Chan et al., 2011; Goucher-Lambert and Cagan, 2019). However, near-field examples can
also lead to design creativity as well as greater feasibility, relevance, and quantity of ideas
(Chan et al., 2015; Goucher-Lambert et al., 2019, 2020). A given stimulus may be more useful
© The Author(s), 2022. Published by depending on when it is accessed during the design process. Inspirational stimuli provided
Cambridge University Press. This is an Open after ideation on a design task has begun have been found to be more effective than when pro-
Access article, distributed under the terms of
vided before ideation (Tseng et al., 2008). During ideation, designers who receive stimuli when
the Creative Commons Attribution licence
(http://creativecommons.org/licenses/by/4.0/), stuck produce more ideas than those who receive them at predefined intervals, indicating the
which permits unrestricted re-use, distribution importance of timing of example delivery (Siangliulue et al., 2015). The level of abstraction of
and reproduction, provided the original article inspirational examples can also impact their influence on the design process. Design stimuli at
is properly cited. the concept level may provide more rapid inspiration but miss the richer design details avail-
able in more comprehensive documents like patents (Luo et al., 2021). Examples can differ
further by being provided with descriptions that are more general versus domain-specific
(Linsey et al., 2008) or constitute concrete design examples versus abstract system properties
(Vasconcelos et al., 2017).
While extensive prior research, as highlighted above, has inspirational stimuli in a given dataset and a designer’s input.
uncovered the characteristics of inspirational stimuli that contrib- Research in design by analogy offers insight into different tech-
ute to their usefulness to designers, less is known regarding how niques used to establish relationships between examples in the
designers naturally discover them. Prior researchers have mostly design space, which may be based on, for example, semantic,
provided carefully curated examples to designers in controlled functional, or visual information.
studies to study how specific independent variables of inspira- A variety of sources from which these stimuli may be derived
tional stimuli affect design outcomes. To better understand the have been explored. Information-rich repositories such as patent
process of searching organically for inspiration during design, a databases or biology textbooks are expansive sources of examples
creativity-support platform is developed that allows designers to that are commonly used to provide relevant design information in
search flexibly in realistic contexts and researchers to collect both textual and pictorial representations (Chan et al., 2011;
data through custom instrumentation. Toward these research Cheong et al., 2011; Fu et al., 2013b). Examples from these and
goals, the core contribution of this work is two-fold: other sources are often used as functionally related inspirational
stimuli to support design by analogy. Using patent databases as
(1) The development of a platform enabling search for inspira- sources of design examples, function-based relationships between
tional stimuli. This platform provides designers with the abil- these designs can be defined. Murphy et al. built a functional vec-
ity to search with multi-modal inputs and control the degree tor space model to quantify functional similarity of a design prob-
of similarity between retrieved results and input queries. lem to designs described in patents (2014). This approach forms a
(2) An investigation of search processes in design employed when functional vocabulary through text-based processing of patent
using this platform during a cognitive study. This study com- documents, resulting in a vector representation of the patent data-
pares the impact of using the afforded modalities on overall base. Latent semantic analysis (LSA) is another method for defin-
search outcomes and behaviors. ing text-based contextual similarity between patents, used by Fu
et al. (2013a, 2013b). VISION is an exploration-based design-
The platform developed in this work involves the computa- by-analogy tool developed by Song and Fu that uses nonnegative
tional extraction of features of inspirational stimuli, and the subse- matrix factorization to assign topics to patents based on different
quent ability to search based on these features using multiple concepts, including function (Song and Fu, 2022). Patent data has
modalities. Semantic, visual, and function-based features are specif- also been used to train semantic network databases to support
ically explored in this work, following past studies on design by engineering design activities. While some semantic networks
analogy, as introduced in the section “Computational methods such as WordNet or ConceptNet contain common words (Han
for inspirational stimuli retrieval”. By providing this creativity- et al., 2022), the Technology Semantic Network (TechNet) was
support platform (described in the section “Platform development”) developed using patent data to formulate a semantic network
to designers during a cognitive study (described in the section database specialized in technology-based knowledge (Sarica
“Cognitive study design”), important insights regarding designers’ et al., 2020, 2021). Beyond patents, crowd-sourced design solu-
processes of searching for inspirational stimuli can be uncovered. tions and ratings have also been provided to designers as sources
The findings of the cognitive study reveal how designers search of inspiration (Goucher-Lambert and Cagan, 2019; Kittur et al.,
by different modalities and the effect of using these modalities on 2019). Goucher-Lambert and Cagan used natural language pro-
the inspirational stimuli designers engage with and discover. cessing approaches to categorize near and far inspirational stimuli
These insights into designers’ search behaviors and strategies, and based on the frequency of terms appearing in crowd-sourced
how they are differently influenced by search modality, can be help- responses (2019).
ful to future researchers when further developing computational Another category of approaches utilizing text-based functional
retrieval-based systems best fitted to the engineering design process. relationships include those that facilitate search for analogies in
biologically inspired design. Goel et al. used a structure–behav-
ior–function knowledge representation (2009) to represent bio-
Related works
logical models and provide biological inspiration in multiple
The first objective of this work is to leverage computational modalities, for example, text and visually represented behavior
methods to develop a platform that enables the retrieval of and structure models (Vattam et al., 2011; Goel et al., 2012).
inspirational stimuli based on a given search input. This section Representations of and relationships between biological analogies
thus firstly presents a brief review of computational techniques have been differently approached by Chakrabarti et al. to empha-
used to derive relationships between design ideas and inspira- size behavior of natural systems (e.g., motion) (2005). To imple-
tional stimuli. The proposed platform also aims to support flexible ment keyword search for relevant biological analogies, Cheong
search for inspirational stimuli. Therefore, existing work on et al. extracted a set of biologically meaningful keywords corre-
design-support tools that specifically allow multi-modal interac- sponding to functional terms in engineering (2011). Nagel and
tions, for example, visual sketch-based inputs, is also surveyed. Stone further contributed a computational method that presents
A second objective of this work is to conduct a cognitive study relevant biological concepts based on desired functionality, as
to investigate how designers use the developed platform to search searched for by the designer (2012). Object functionality can be
for inspiration. Search processes from a cognitive perspective are differently defined based on the interaction context in which an
thus discussed to gain insight into designers’ search behavior. object is used, which Hu et al. explored with a functional similar-
ity network, a generative network, and a segmentation network
(2018).
Computational methods for inspirational stimuli retrieval
Less frequently explored in prior research are computational
In order to extract meaningful stimuli relevant to a given design methods to support visual analogy. Setchi and Bouchard devised
problem, design idea, or search query, computational methods a method to index images based on semantic information from
and tools are needed to derive similarity relationships between image labels and textual descriptions as one method of providing
images as design inspiration (2010). To establish relationships provided data-driven suggestions for new components to add to
between examples based on non-textual information, emerging a designer’s initial 3D model (Chaudhuri and Koltun, 2010).
methods using visual analogy in design have considered image- Retrieval of inspiring examples based on 3D-represented design
based search. Recent work by Zhang and Jin has demonstrated ideas can facilitate emergence and reinterpretation processes
how visual analogy can be supported by sketch-based retrieval of important for the design process. Conventionally, 3D-modeling
visually similar examples (2020, 2021). Specifically, they used a environments recognize the unambiguous selection and place-
deep-learning model to construct a latent space for a dataset of ment of different elements to build a model, and thus provide
sketches and computationally determined visual similarities within limited support for new ideas to emerge or old ideas to be reinter-
this space (Zhang and Jin, 2020). Short and long-distanced visual preted (Gross, 2001). It is also important to note that while CAD
analogies can then be identified based on the level of visual similar- modeling enhances visualization and communication of ideas by
ity shared between sketches (Zhang and Jin, 2021). Kwon et al. providing a form to early design ideas, it may also cause prema-
explored the use of image-based search to find visually similar ture design fixation and limit ideation (Robertson et al., 2007).
examples to aid alternative-use concept generation (2019). Visual Systems capable of recognizing and reinterpreting conceptual or
information, along with topic-level international patent classifica- early-stage 3D design are valuable for overcoming limitations
tion (IPC) labels, have also been used in the retrieval of images related to developing 3D models in a typical CAD environment.
from patent documents (Jiang et al., 2020, 2021). Jiang et al. used
a convolutional neural network-based method to perform image-
Cognitive processes underlying search for inspiration
based search using visual similarity and shared domain knowledge.
The methods and systems explored here are used to define text To implement useful features in the proposed search platform and
and visual-based relationships within various design stimuli repo- link interactions with the platform to insights on search behavior,
sitories (e.g., patent images, design concepts, sketches, etc.). In cognitive processes involved when searching for inspiration are
prior research, these derived relationships have been used to iden- reviewed. Early work on the role of search processes in design
tify stimuli related to a specified input, such as a design problem identified incidental experience and intentional learning as rele-
or search term. The current work also relies upon computational vant sources of knowledge (Purcell and Gero, 1992). More
methods to extract semantic, functional, and visual information recently, inspiration has been proposed as an iterative process
from potentially inspirational examples as well as designers’ that begins with an intention, is actualized by a search input,
search inputs, as expressed through multiple modalities. and ends when the problem has been solved (Goncalves et al.,
2016). In this process, active approaches to find specific stimuli
more intentionally or passive approaches to randomly encounter
Motivating multi-modal search for inspirational stimuli
relevant stimuli may take place (Herring et al., 2009; Goncalves
A second consideration for the platform developed in this work is et al., 2016). Active search refers to the deliberate search for a par-
the interface through which designers explore and discover ticular stimulus with a specific goal in mind (Eckert and Stacey,
inspirational stimuli. The role of non-text-based modalities in 2003). Alternatively, when what designers are searching for is
providing flexible modes of interaction and expressing search is unclear, they typically depend on randomly finding relevant stim-
examined. In general, expressing design ideas with visual attri- uli. Randomness of web-based search, for example, has been
butes importantly supports cognitive processes of emergence found to be beneficial for inspiration due to the sometimes unex-
and reinterpretation. Shape emergence is a process where pectedness of results, related to more passive search strategies
designers perceive emergent patterns not initially intended in a (Herring et al., 2009). In information retrieval theory, search
visual stimulus (Soufi and Edmonds, 1996). Reinterpretation of behavior has classically been categorized as exploratory versus
visual stimuli is a process that leads to the formation of alternate specific (or lookup) (Sutcliffe and Ennis, 1998). Lookup search
interpretations and restructuring of design problems (Gross, activities involve precise search goals whereas exploratory search
2001). During design exploration, these processes can importantly is related to knowledge acquisition and evolving needs
trigger new mental images and thus new ideas for design (Marchionini, 2006). Users have been found to examine more
(Menezes and Lawson, 2006). Designers can benefit from inter- results in open-ended exploratory search tasks than during lookup
acting with a system through sketch-based inputs specifically, tasks (Athukorala et al., 2016). For computational tools to suc-
since in early-stage idea exploration, the act of sketching itself cessfully support search for inspiration, user studies suggest that
can assist with idea formation (Botella et al., 2018). The ability they should provide control and flexibility over the level of
for a creativity-support tool to uncover meaning from a designer’s abstraction versus literalness of search terms (Mougenot et al.,
developing sketch, intent, and task context can be valuable for 2008). To facilitate search for inspiration, it is important that
activating appropriate computational aid at the right time (Do, active and passive search strategies are both supported.
2005). As an example, Kazi et al. developed DreamSketch, a Designers should be able to express what they are looking for
sketch-based user interface that uses generative design methods, with a high level of agency and encounter inspirational stimuli
to provide designers with potential 3D-modeled design solutions more passively when what they are looking for is undetermined.
based on early-stage 2D-sketch-based designs (2017). SketchSoup Relevant to the current work, these insights into search for
is another interface that inputs rough sketches and generates new inspiration both guide the design of the search platform and pro-
sets of sketches, which may be explored and inspire further con- vide a basis for interpreting the anticipated results of the cognitive
cept generation (Arora et al., 2017). Interfaces that capture these study presented.
sketch-based inputs can therefore be useful for supporting search
and exploration of the design space.
Platform development
In addition to 2D sketches, design ideas can be expressed in a
3D representation, for which creativity support is also possible. To effectively support and subsequently study how inspirational
Through the InspireMe interface, Chaudhuri and Koltun stimuli are retrieved in the design process, a platform that enables
similarity-based, multi-modal search for stimuli in the form of design examples (3D-model parts in this application) and
3D-model parts was developed. This section describes in detail natural-language-model keywords. This deep-learning approach
(1) the process of defining similarity among stimuli using directly consumes 2D snapshots of 3D-modeling parts and uti-
deep-neural networks and (2) the development and design of lizes knowledge from large text corpuses, which subsequently
the multi-modal search interface that participants interacted enables the efficient retrieval of relevant examples in the large
with in a cognitive study. dataset used. Deep-neural networks are suitable candidates for
this task because they are highly effective in understanding com-
plex patterns in high-dimensional data, such as multi-perspective
Neural network development enabling inspirational stimulus
image snapshots of 3D models in the platform. In their review of
retrieval
data-driven methods to support design-by-analogy, Jiang et al.
A major component of the platform is a system that supports the identify deep-learning models as an effective technique for learn-
search for and retrieval of inspirational design stimuli in large ing complex features from datasets (Jiang et al., 2022).
datasets using multi-modal inputs. Relying solely on text-based
search using semantic relationships may limit discovery of
inspirational stimuli to concepts that are well enough defined to Computationally deriving similarity between 3D-model parts
express using words. As described previously, search processes Using the PartNet dataset, three neural networks were used with
using more passive or exploratory processes are sometimes pre- the intent to embed raw 3D-model data to high-level concepts
ferred and may not be as well supported by tools requiring and modeling parameters to be used in the platform. Each of
such direct input (Goncalves et al., 2016). Introducing new these networks handles a unique modality or type of similarity.
modes of expressing search may be one approach to aid different These networks are respectively (1) a text network that encodes
search strategies when needed during the design process. As such, similarity of design concepts in natural language; (2) an appear-
the proposed system is designed to support queries of 3D-model ance network that encodes similarity of 3D models only by
examples while also maintaining support for text-based queries. their appearance and geometric presence; and (3) a functionality
Beyond aiming to support additional query modalities, the plat- network that extends beyond (2) to encode similarity of functions
form provides a measure of similarity that allows users to control of 3D models based on their neighboring 3D parts.
the similarity level between retrieved examples and their multi- The text network in the platform relies on the Universal
modal queries. It is believed that having such agency in the system Sentence Encoder (Cer et al., 2018) pre-trained on web text to
will allow researchers to better understand and analyze users’ find parts with names similar to the keyword queries provided
intentions in the retrieval process of inspirational examples. by users. The Universal Sentence Encoder is trained on nontech-
nical text to solve general text-understanding tasks such as senti-
Using 3D-model parts as inspirational stimuli ment analysis and question classification. As a result, the model
To support the research goals in the current work, a large-scale should be able to obtain a general semantic understanding of
dataset of 3D models is used to train the deep-neural networks English words and thus be able to identify synonyms (e.g.,
and provide users with relevant examples. Specifically, the “box” should be semantically similar to “container” in the embed-
PartNet dataset was used, which contains 26,671 unique 3D mod- ding space). Alternative semantic networks exist beyond the
els (assemblies) in 24 object categories, each further splitting into Universal Sentence Encoder, such as TechNet (Sarica et al.,
trees of individually named parts within each assembly (e.g., cap 2020), which consists of technology-related terms. However,
as a child of bottle) (Mo et al., 2018). Names for each part are since the PartNet dataset contains everyday objects that are not
assigned in the dataset through expert-defined annotations. In highly technical, the use of the Universal Sentence Encoder to
total, the dataset contains 573,585 part instances, across 24 object understand common words is sufficient for this work. The
categories. Each object category contains varying numbers of part Universal Sentence Encoder is also effective for working with,
instances. For example, bags, hats, bowls, mugs, and scissors con- not only sentences, but short phrases, which other semantic
tain on the order of ∼1 K parts whereas vases, trash cans, lamps embedding methods, for example, BERT (Bidirectional Encoder
contain ∼10 K parts and chairs, storage furniture, and tables con- Representations from Transformers), are not trained on (Devlin
tain >100 K parts. The 24 object categories include everyday et al., 2019).
objects at various scales (e.g., microwave, scissors, tables). Since The appearance network was trained by embedding knowledge
these categories represent only a small subset of possible objects from 2D snapshots of 3D-model parts. This network is trained to
that mechanical engineers might design, part-based data within consider snapshots of the same 3D model as “similar” to each
these objects are instead used and presented in the proposed sys- other, and snapshots from different 3D models as “dissimilar”
tem. These parts (e.g., legs, cover, lid) may be present in object from each other in the embedding space. This leads to the
categories beyond those in the dataset. This allows the system to model learning the general physical form and presence of the
cover diverse design cases and to potentially provide inspiration 3D model by visually analyzing it from different angles. More
between distant design goals. While the PartNet dataset was concretely, consider a training example (x) as a 3D-modeling
used in this work, alternative datasets could be leveraged that sim- part (e.g., a leg of a chair). Eight 2D snapshots (images) [S(x)i,
ilarly contain large-scale, hierarchical, fine-grained annotations of i ∈ 1, …, 8] of the part are first taken by rendering the part in
data. Blender. Snapshots are normalized to the size of the image,
The use of such a large-scale 3D-model dataset also allows the meaning that the whole part takes the size of the entire image
system to leverage data-driven, deep-learning-based methods. and the relative scale of the part is not considered. After obtaining
These methods are used to extract computationally derived simi- these screenshots, each snapshot is passed through a neural
larities between stimuli within the platform, based on their network f to get a single n-dimensional real-valued embedding
semantic, visual, and functional features. The platform uses deep- f(S(x)i) ∈ R n. These embeddings for other examples in the dataset
neural networks to contrastively model similarities between were similarly gathered.
To train this model toward the goal of considering snapshots These pairs are then trained using the same loss function
of the same 3D model as similar, other examples are also needed [Eq. (3)]. Figure 1 displays how the functional embeddings are
to allow the model to contrast the dissimilar embeddings with the derived from appearance embeddings using the described networks.
similar embeddings. As such, to obtain a loss function for each The appearance network consists of five stacked groups of a
embedding in the dataset, real-value embeddings of multiple convolution layer with kernel sizes of 5 × 5 or 3 × 3 followed by
other 3D models in the dataset were also gathered. Without loss a 4 × 4 or 2 × 2 max pooling layer (see Fig. 1 for arrangement).
of generality, another randomly sampled but different 3D model This network also consists of a final 4 × 4 convolution layer that
(a) and its embeddings f(S(a)i) were considered in the following flattens the output to 128-dimensional appearance embeddings.
formulation. The model was then trained with sampled positive The functional network then takes these appearance embeddings
pairs that consist of snapshots that come from the same 3D model, and passes them through its four stacked 128-dimensional fully
connected layers and one 64-dimensional fully connected layer
p+ =( f (S(x)i ), f (S(x)j )), i = j, (1) to produce 64-dimensional functional embeddings. On a high
level, these embeddings encode the context of the usage of the
and negative pairs: 3D-model parts and consider 3D-model parts that are used
along with other parts as similar by assuming that they have sim-
p− =( f (S(x)i ), f (S(a)j )). (2) ilar functions.
The following training objective L [in Eq. (3)] was used to Implementation and training of neural networks
minimize the distance [measured by the distance function (D)] The appearance and functionality models are implemented with
of positive pairs and maximize the distance of negative pairs Tensorflow Keras. An Adam Optimizer with a learning rate of
(up to the margin m): 0.001 was used to train each model until the validation loss pla-
teaued. The appearance model took 26 h to train on a machine
L = D( p+ )2 − (max (m − D( p− ), 0))2 , (3) with two GPUs (a NVIDIA GeForce 1080 Ti and a Titan X
Pascal), while the functional network took 10 h to train on the
same machine. The text network did not involve any training as
D(a, b) = ||a − b||1 . (4) it is directly taken from the pre-trained Universal Sentence
Encoder provided in Tensorflow Hub.
On a high level, this model considers these snapshots as sim-
ilar among themselves and dissimilar to snapshots of other 3D
models in the latent space. Such similarity is considered primarily Front-end user interface for multi-modal search
by the overall appearance and geometric presence of the 3D- Leveraging the underlying platform for inspirational stimuli
model parts. retrieval described in the previous section , functionality for
The functionality network was built to learn a slightly different multi-modal search was subsequently enabled for use in a cog-
notion of similarity than the appearance network. While consid- nitive study. This was achieved by comparing the semantic, visual,
ering the exact functions of different 3D models could be difficult and functional features of the participant’s input to the parts in
and greatly depend on context, as a first step toward this goal, 3D the dataset populating the platform. The modalities of input avail-
models are considered to be similar if they have similar neighbor- able and additional features of the search interface are discussed
ing parts within their respect assemblies. Hu et al. demonstrate below.
the effectiveness of this approach in capturing the function of
3D models through the usage contexts of the models (2018).
Using this method means that 3D-model parts that perform a cer- Search modalities: keyword, part, and workspace-based
tain function should have similar neighbors in their respective Using the search interface that relies on the neural networks
assemblies (e.g., different styles of chair legs, despite having differ- described above, participants were able to search for parts in
ent appearances, are considered similar since they share “chair the dataset using three types of input. The first search type is
seat” as a neighbor). The functionality network builds upon the keyword-based, where text input by the participant is embedded
appearance network such that it takes the appearance embeddings using the text network, as described in the section
and transforms them into function-aware embeddings. The func- “Computationally deriving similarity between 3D-model parts”.
tionality network is trained with a very similar paradigm as the Embedding values are then compared against those of the data-
appearance network, with an almost identical loss function to set’s part names and the nearest neighbors from the dataset are
Eq. (3). The only difference is that the functionality network (g) retrieved. The results from a keyword search for the term “con-
is now used to obtain a transformation of the appearance embed- tainer” is shown in Figure 2a. The second and third search
dings [g( f(S(x)i))], and the group of similar parts extend beyond types are part-based and workspace-based, where new parts are
the snapshots of a single 3D-model part itself to neighboring retrieved using visual snapshots taken of a selected 3D-model
parts. For instance, given a chair leg x and a chair seat z, and part or the participant’s current workspace (composed of
an irrelevant lamp cover b, positive pairs are formed 3D-model parts), respectively. For workspace searches, snapshots
of the whole workspace are taken, which may include multiple
p+ =(g( f (S(x)i )), g( f (S(z)j ))), (5) parts. These snapshots are passed through the appearance and
functional networks and the resulting appearance and functional
embedding values are compared with those of other parts in the
as well as negative pairs: dataset. The same computational approach used to derive similar-
ities between 3D-model parts, as described in the section
p− =(g( f (S(x)i )), g( f (S(b)j ))). (6) “Computationally deriving similarity between 3D-model parts”,
Fig. 1. Overview of process of transformation of embeddings from appearance network to functional embeddings. Appearance embeddings of input part (scissor
blade) were used to generate a predicted functional transformation using the functional network. Functional network was then trained by considering this pre-
diction as similar to the appearance embedding of a neighboring part (scissor handle) and dissimilar to the appearance embedding of an unrelated part (chair
leg). Intermediate representation within the functional network was used as the functional embedding of each model part in the dataset.
Fig. 2. (a) Search results for a keyword search of the term “container”; (b) search results for a part search of a result from keyword search for “container”.
is used to produce embedding values for search inputs from the design task. For any given search result, participants could per-
relevant neural networks. form none to all actions, in any order.
Part and workspace-based searches are made using two addi- These search modalities and part interactions are envisioned to
tional user-specified parameters, appearance similarity and func- enable search for inspiration during early-stage design. Keyword
tional similarity, which participants can specify in the platform and part searches may provide initial, rapid inspiration by retriev-
interface with sliders. The closest neighbors are retrieved for the ing results based on the designer’s text-based query or based on
participants according to the weighted sum of the distances spe- similarity to a previously discovered part. Workspace search,
cified by the appearance and functional sliders in the user inter- more similar to 2D or 3D sketch-based retrieval platforms intro-
face. Figure 2b shows the use of similarity sliders and the search duced in the section “Motivating multi-modal search for inspira-
results for a part search of the first keyword search result for “con- tional stimuli” (e.g., SketchSoup, InspireMe, DreamSketch), can
tainer”. Sliders controlling similarity in appearance and function further support the discovery of inspiration during later stages
allow participants to conduct multiple searches using the same of design, based on the designer’s developing 3D-model. In gen-
part or workspace input with increased agency. In the example eral, the various representations of inspiration provided by the
shown in Figure 2b, parts are searched for with low similarity platform (i.e., 2D representation, 3D representation, text label)
in appearance but high similarity in function to the selected con- make it suitable for aiding various stages and forms of early-stage
tainer. As represented in Figure 1, neighboring parts of visually design, such as in generating conceptual design ideas, 2D
similar parts to the input part are considered functionally similar sketches, or 3D sketches.
to the input. In this example, the shared visual characteristics
between the chair seats and container have caused chair legs to
be considered functionally similar to the container. Based on Cognitive study design
the results retrieved, participants are then able to modify these To understand the processes and behaviors associated with
inputs to continue to search for new results. searching for and exploring design examples, a cognitive study
was conducted using the platform. During the study, participants
searched with different modalities available in the platform to find
Interactions with parts retrieved from search and select relevant 3D parts that could help inspire solutions to a
After relevant 3D-model parts are retrieved from the dataset, the design challenge. The main approach taken in this work was to
model pushes the images of the 3D models, as well as their asso- analyze participants’ interactions in the platform and relate
ciated STL files, to the web front-end of the platform, which is these actions to strategies involved in searching for inspirational
based on the editor code of the open-source three.js library. examples. A 30-minute study was administered to understand
Participants can thus preview three of the retrieved 3D models how participants engaged with the three search types available
in the “Search Results” panel of the interface (Fig. 2). An example in the platform. Participants searched for parts using each search
in Figure 3a shows how parts can also be added to and modified modality in three separate subtasks and worked toward collecting
in the user’s 3D workspace using the “Add to Workspace” button. inspirational stimuli for a given design challenge.
Workspace-based searches are made with snapshots of the entire
workspace with parts added by the participant using this action.
Participants
Moreover, since all results are retrieved from the PartNet dataset,
which contains information on neighboring parts in the assembly Participants were recruited from announcement emails sent to
of the results, participants may view this information (Fig. 3b) undergraduate and graduate mechanical engineering students at
using the “View in Context” button. For a selected part, this the University of California, Berkeley. Twenty-three participants
action allows further understanding of the retrieved parts’ utility (15 males and 8 females) with varying levels of design experience,
in their original context. Finally, participants have the ability to ranging from less than 1 year to 9 years, volunteered for the study.
use the “Add to gallery” button to save a part to a gallery of col- Participants were offered $10 compensation for their participation
lected 3D parts (Fig. 3c). The gallery is accessible to the partici- in the 30-minute study. Due to data collection errors, data from
pant to access and select parts from at any point during the two participants were excluded from the analysis. All participants
Fig. 3. Interactions with selected part in Figure 2 – (a) adding part to the workspace; (b) viewing part in context by seeing related parts with text labels in the same
object assembly; and (c) adding part to a gallery of saved 3D parts.
completed the study while connected virtually with the experi- Table 1. Overview of search types and inputs specified for each subtask of the
menter over a Zoom meeting, where all participants consented cognitive study
to sharing their screens for the duration of the task. Any issues Subtask Search type Initial search input
completing the task or clarifications needed could thus be
addressed in real time. A Keyword “Container”
B Part Any previous keyword search result
Study objective C Workspace Workspace consisting of parts retrieved in
Subtasks A and B
The study objective presented to participants was to use the plat-
form to search for, and save, 3D parts that inspired solutions to
the following design challenge: “design a multi-compartment dis-
posal unit for household waste”. Participants were told that parts Task C: Finally, in Task C, participants conducted workspace
inspiring solutions to the design challenge could include those searches and made their first search consisting of parts either pre-
they might want to directly incorporate into potential solutions. viously added to the workspace, or newly added from parts saved
The design challenge presented to participants was developed to during Tasks A and B. A min. of four additional workspace
fit the context of the search platform, which is populated with searches were made and a min. of three parts were saved, without
parts related to household objects. Pilot testing revealed that making any new keyword or part searches.
this design prompt engaged several object categories in the This study design, while constrained, ensured that participants
PartNet dataset, including some that are highly relevant to the used each search modality for a comparable portion of the design
task (e.g., trash can, storage furniture). study, enabling an investigation into the use of the search plat-
form’s modalities and features. Without prescribing these con-
straints, for example, a minimum number of searches, sufficient
Study overview
interaction with each search modality and feature may not have
The study was divided into three subtasks (A, B, C), as summa- been observed, given participants’ lack of familiarity with the
rized in Figure 4, where each task involved the use of a different novel search inputs introduced.
search type (keyword, part, workspace), but worked toward the The motivation for the ordering and division of tasks was to
same design challenge. The study objective and task instructions easily teach participants how to engage with the search platform.
were embedded in a Qualtrics survey link sent to participants at The order was selected since parts need to be discovered initially
the start of the experiment. For each subtask, participants read through keyword search to subsequently conduct part and work-
the associated training and instructions, and then completed the space searches. Tasks were ordered to first use the most intuitive
task in an external link. At the end of the experiment, participants search mode (keyword) and to last introduce the least familiar and
responded to a series of open-ended and multiple-choice ques- most difficult mode (workspace). Pilot testing revealed that learn-
tions about their experience using the search platform. Table 1 ing about each search type at study onset overloaded participants
additionally summarizes the search types, inputs, and require- with too much information to effectively engage with each search
ments of each subtask of the study. type, therefore each search type was introduced and used in sepa-
Task A: In Task A, all participants were instructed to first rate tasks.
search by keyword beginning with the term “container” After completing the study, participants were asked to provide
(Fig. 2a). They were instructed to make four additional keyword open-ended descriptions of any strategies used when conducting
searches (min.) and to save min. three parts to their galleries. each type of search. Participants also evaluated the intuitiveness
Task B: Participants then continued with their progress from and usefulness of different features in the platform on five-point
Task A in Task B by conducting a part search with a part saved Likert scales. These features included searching for new parts
to their gallery during Task A. As before, the instructions were and gaining more information about parts. Finally, participants
to conduct min. four additional part searches and save min. self-evaluated the broadness of their exploration of the part repo-
three more parts. Participants were also instructed to not make sitory and of their final gallery of saved parts on five-point Likert
any additional keyword searches. scales.
Fig. 4. Overview of flow between subtasks: training on search types and features in the interface preceded presentation of instructions and completion of each
subtask.
leads to the development of other definitions of similarity descri- not report this measure for the functional network due to the high
bed in the remainder of this section. number of parts required to relax this definition of physical form
relevance to include all parts belonging to the models containing
Definition and results of concept-based similarity the sampled parts.
Beyond the 3D models themselves, there are many other models Overall, the ranked-based accuracy measures computed in this
that can be considered similar semantically in the dataset. For subsection provide insights for the retrieval behavior of the neural
instance, there are many chair legs that could be similar. To networks underlying the search platform used in the cognitive
account for this relaxed definition of relevance, the text annota- study. Different perspectives of similarity are considered including
tions of 3D models available in the dataset used (PartNet) were self, semantic (concept-based), and visual (physical form) similar-
utilized to consider similarity. Two 3D models were considered ity that allows us to further understand this neural network-based
to be relevant if both consist of exactly the same text label. methods’ retrieval characteristics. These measures are summa-
These text labels represent larger clusters of 3D models in the rized in Table 2. The development and behavior of the platform
dataset. The procedure outlined in the section “Ranked-based are further discussed in the section “Discussion of multi-modal
accuracy computation” is similarly used to compute top-1 and search platform development and behavior”.
top-k similarities using this definition of text-concept-based simi-
larity. For training/validation/test sets, the top-1 accuracy of the
Results of cognitive study
appearance network is 25.6%/26.8%/26.8% and the top-10 accu-
racy is 66.1%/69.1%/68.1%. For the functional network, the The developed platform, which uses the similarity relationships
top-1 accuracy is 40.5%/39.9%/40.4% and the top-10 accuracy is described in the previous section, allows users to search for and
85.7%/85.7%/85.8%, with the further relaxation of definition interact with retrieved parts. In this section, the results of a cog-
that all text annotations of other parts that belong to the same nitive study are presented, which demonstrate how participants
model are considered similar. search for stimuli in the platform and the content of these
retrieved parts. In the cognitive study, participants searched for
Definition and results of physical form relevance 3D-model parts using keyword searches in Task A, part searches
Besides calculating semantic relevance using text labels, relevance in Task B, and workspace searches in Task C. Throughout the
can similarly be computed by the physical forms of the models. study, the following actions could be taken on any search result:
The physical similarity of two models is computed by the three- adding it to the workspace, viewing it in context, or saving it to
dimensional intersection over union (IoU) of the models, such a gallery. This work considers how these interactions reveal the
that the models are super-positioned to find the overlapping vol- ways different modalities of expressing search affect and support
ume, which is then divided by the sum of the total volume of the the search process. The focus of the present study is on investigat-
models. To ensure the consistency of this measurement, six extra ing the use of the described search platform to search for inspira-
random orientations (in addition to the default position of the tional stimuli, and not necessarily the impact of these stimuli on
models) were taken between the models during super-positioning design outcomes. Specific objectives of this study are to identify
and the maximum value of the seven orientations was taken as the differences in search modality by how participants (1) search
final IoU. Two models are considered as similar if the IoU for new parts and (2) engage with the retrieved parts, as well as
between them is within the top 5% out of all other pairs for a par- (3) what participants discovered. These results extend upon find-
ticular model. This criterion controls the difficulty of our retrieval ings in prior work by Kwon et al. (2021).
task such that a completely random retriever would get 5% top-1
accuracy in such a task. Search for new parts
The above process requires the models to be closed for the vol- To understand how different search modalities are used, the
ume computation to be correct. Therefore, the convex hulls of search inputs defined by participants are first discussed. Specific
both models and the intersected volumes are further computed inputs that participants tend to modify across searches are impor-
to ensure correctness. These computational steps of convex hull, tant to identify to support multi-modal search. Frequencies of
IoU, and volume of models are done with Blender 2.8.2. each search modality used, and slider movements made, are ana-
Moreover, since this process is computationally expensive (and lyzed to examine how participants used the platform to search for
scales quadratically with the size of the candidates), 200 random new parts. Differences between search types in the total number
examples were sampled from the test set of the PartNet dataset of searches made were compared with a Chi-square test. The
and then manually reviewed to be approximately convex before number of searches made using each search type, including repea-
being used as candidates of this experiment. The final top-1 accu- ted searches, significantly differs [χ 2 (2, N = 677) = 9.8, p < 0.01],
racy for this similarity criteria on these candidates for the appear- where the number of part searches (264) compared to keyword
ance network is 65.9% and the top-10 accuracy is 95.5%. We did (207) and workspace (206) searches is the highest. These
Table 2. Summary of rank-based accuracies for similarity measures of test set data describing retrieval behavior of neural networks
Similarity measures Top-1 accuracy (%) Top-10 accuracy (%) Top-1 accuracy (%) Top-10 accuracy (%)
implications of the differences in search frequency are further dis- Table 4. Frequencies of new and modified workspace searches with changes in
cussed in the section “Discussion of cognitive study results”. functional and/or appearance similarity (+: increasing similarity, −: decreasing
similarity)
Total search counts include all keyword searches, and both
new and modified part and workspace searches. New part Search input Search counts
searches are defined as those where a unique part is used as the
search input. New workspace searches are made when the work- New search (new part added to workspace) 105
space input contains a newly added part. Participants could also Modified search (no new parts Change in functional 24 (14+, 10−)
make modified searches, where the same part or workspace added to workspace) similarity
from a previous search is selected and adjustments are made Change in appearance 28 (19+, 9−)
only to sliders specifying appearance and functional similarity. similarity
The numbers of new and modified part searches made across par-
Change in both 24
ticipants are summarized in Table 3. Also included are the num- similarity types
bers of modified part searches that increase (+) or decrease (−)
Total modified 76
appearance and/or functional similarity from a previous search.
searches
As shown in Table 3, more total number of searches are con-
ducted with the same part (131) than a different part (104) from
the previous search. However, when examining the proportion of
new and modified searches made by each participant, a repeated can engage with a part through the interactions enabled in the
measures ANOVA did not reveal a significant difference [F(1,20)] platform, as outlined in the section “Interactions with parts
= 0.55, p = 0.5). Modified search counts combined across partici- retrieved from search”, for example, by viewing it in context (to
pants vary significantly with respect to whether modifications are gain contextual information), adding it to the workspace (to see
made in functional similarity (37), appearance similarity (60), or and manipulate its 3D representation), or saving it to their gallery.
both (34) [χ 2 (2, N = 131) = 9.3, p < 0.01]. The proportion of these The number of times each interaction was made was counted to
modified searches within participant does not differ across mod- determine how results from each search type are engaged with
ification types [F(2,40)] = 0.03, p = 0.97). This result signifies that differently. Frequencies of interactions with search results were
while some participants may have conducted many appearance- compared across search modalities to assess differences in how
modified searches, this was not observed across all participants. participants engage with results retrieved from each search
Of the 21 participants, only 4 conducted more than 5 appearance- modality. There is a significant difference between search modal-
modified searches. ities in both the total number of search results that users engaged
The same analysis performed for part searches was done to with [χ 2 (2, N = 106) = 18.6, p < 0.001] and did not engage with
identify how workspace searches were made, as summarized in [χ 2 (2, N = 581) = 23.0, p < 0.001], as shown in Figure 5. This result
Table 4. The number of workspace searches made with modifica- suggests that, despite being instructed to conduct the same number
tions to functional (24), appearance (28), or both types of similarity of searches using each search modality, participants interacted dif-
(24), did not significantly differ. Different from part searches, more ferently with each search modality and the parts retrieved.
workspace searches are made with new search inputs (i.e., with an The differences in frequency between the expected and
added part to the workspace) than with the same workspace observed values for each set of results are plotted in Figure 5.
configuration (105 vs. 76). A significant difference was observed The expected value is the total number of parts engaged with
in the proportion of new and modified workspace searches made (106) or not (581), divided by 3 (the number of tasks). This
by each participant, as revealed by a repeated measures ANOVA value represents the number of parts expected to be engaged
test [F(1,20)] = 7.43, p < 0.05). These combined results demonstrate with or not in each task if no task differences exist. Parts that
how search inputs and desired similarity are differently defined are engaged with include those viewed in context, added to the
when engaging with part versus workspace searches. workspace, or saved to the gallery. Parts not engaged with are
those retrieved from search and seen by the participant, with
Engagement with parts retrieved from search no further interaction made. The highest proportion of parts
The search platform, beyond supporting retrieval of parts, allows that were further engaged with were retrieved by keyword search,
participants to further engage with the shown parts. Participants while results not engaged with were mostly those retrieved by part
search. On average, participants spend 343 s, 195 s, and 451 s in
subtasks A, B, and C, respectively. These results suggest that
Table 3. Frequencies of new and modified part searches with changes in increased part engagement of keyword search results does not
functional and/or appearance similarity (+: increasing similarity, −: decreasing occur due to increased time spent at the beginning of the study.
similarity)
Reduced time spent on subtask B, despite high frequency of
Search input Search counts part searches, further demonstrates participants’ lack of engage-
ment with these search results.
New search (different part from previous) 104 To more closely consider how users engage with search results,
Modified search (same part Change in functional 37 (17+, 20−) the number of parts in each task that are viewed in context or
as previous) similarity added to the workspace are compared. The number of parts
Change in appearance 60 (30+, 30−) viewed in context significantly differs between tasks [χ 2 (2, N =
similarity 104) = 13.3, p < 0.01]. Displayed in Figure 6, results from keyword
search are more frequently viewed in context than expected, and
Change in both similarity 34
types fewer results from workspace search are viewed than expected. As
in Figure 5, expected values in Figure 6 refer to the total numbers
Total modified searches 131
of parts viewed in context (104) or added to the gallery (101),
divided evenly between tasks. Numbers of parts added to the the search type used when they were retrieved. The visualization
workspace do not differ significantly between tasks. Combined, represents the parts reduced from the 128-dimensional
these results suggest that keyword search encourages increased appearance-based neural network to a two-dimensional space
engagement with individual results, while part-based search using principal component analysis (PCA). The reduced space
does not. A more detailed analysis of these results can be found accounts for 72.8% of the total variance of the original space.
in the section “Discussion of cognitive study results”. As highlighted, examples of closely and distantly related parts
in appearance are shown in the 2D projection of the embedding
Coverage of design stimuli space by retrieved parts space. The pair of closely related parts displayed are also nearest
Regarding participants’ interactions with the interface, the role of neighbors in terms of Euclidean distance in the full
search modality in the overall discovery of inspirational stimuli is 128-dimensional appearance embedding space.
investigated using a measure of coverage of the design stimuli It is important to note that “closeness” between parts in the
space. By deriving a measure of coverage, the relative diversity full 128-dimensional embedding space may appear differently
of parts within the appearance and function-based embedding visually when projected into 2D. These differences provide insight
spaces discovered using each search modality can be compared. into features learned by the neural network, which may be diffi-
The parts retrieved by all participants throughout the study are cult to discern visually. As a representative example of this con-
first shown based on their representation within the appearance- cept, Figure 8 displays a series of parts that are “close” to a
based neural network in Figure 7. Parts are color coded based on reference part (in this case, a trash can lid). In Figure 8, parts
labeled 1–4 are the top 4 nearest neighbors in Euclidean distance Similarly, the parts retrieved in the study are also represented
in the full embedding space to the reference trash can lid, while based on their embeddings in the function-based neural network,
Part * appears close in distance in the reduced embedding shown in Figure 9. The 2D visualization, reduced from the
space. As shown, parts with high appearance similarity, as deter- 64-dimensional functional embedding space using PCA, accounts
mined by the appearance-based neural network, may not have the for 89.7% of total variance. Figure 9 displays a cabinet door and its
same relative distance in the 2D projection of the embedding closest neighboring part by Euclidean distance in the full func-
space. In this example, Part *, though not a nearest neighbor in tional embedding space, a sink drawer face. Also shown is a set
the full embedding space, does appear to share high visual simi- of chair legs, distantly related in function to the cabinet doors.
larity to the trash can lid, by human inspection. This discrepancy These parts exemplify how functional relationships are repre-
between human and model evaluation of appearance-based simi- sented in the neural network, as detailed in the section
larity can be explored in future work. “Computationally deriving similarity between 3D-model parts”.
As Figure 7 helps to visualize, the parts discovered by partici- As intended in the design of the functional network, two types
pants during the study appear to vary in their overall coverage of of doors that are used in different contexts are functionally similar
the two embedding spaces by the search modality (keyword vs. based on their shared relation to box structures. A difference was
part vs. workspace) used to search for them. Quantitatively, this found between how parts retrieved using each search type covered
result can be demonstrated by comparing the total variation of the functional embedding space (F = 6.77, p < 0.01).
each set of parts in the original embedding spaces (represented In addition to comparing the variances across diagonal ele-
as each set of colored points in the 2D visualizations). In the ments of each covariance–variance matrix using Levene’s tests,
approach taken, three 128 × 128 covariance–variance matrices a more intuitive representation of this measure is the trace of
are first computed for the keyword, part, and workspace search the matrix, that is, sum of the diagonal elements. The trace equals
results based on their definitions within the 128-dimensional the sum of variances of each dimension of the original neural net-
appearance-based neural network. Diagonal elements of each works and represents total variation in the respective embedding
matrix represent variances in each dimension of the embedding spaces. Total variation provides a metric for comparing how parts
space. A significant Levene’s test determined that the variances accessed by each search type differently cover the search space.
across the diagonal elements of the three matrices were not Table 5 summarizes the differences between parts retrieved
equal (F = 20.9, p < 0.001), signifying a difference between search using each search modality, with respect to the total variation
types in the coverage by parts of the appearance embedding space. and highest variance of a single variable in both embedding
Table 5. Total variation and highest variance of a single variable in appearance estimation given the difficulty of this task, as noted in the section
(128-dimensional) and functional (64-dimensional) embedding spaces by “Definition and results of self-similarity”. Alternative metrics are
search type
therefore explored, including a concept-based (i.e., semantic) def-
Appearance embedding inition of similarity and similarity of physical form. At the con-
space Functional embedding space cept level, the platform’s appearance network reached a top-10
test set accuracy of ∼68% for identifying text labels of the corre-
Search Total Highest Total Highest sponding 3D model. By comparison, Zhang and Jin’s deep-learn-
type variation variance variation variance
ing approach achieved up to ∼82% accuracy when labeling
Keyword 6.0 × 10−3 1.4 × 10−4 7.8 × 10−4 7.9 × 10−5
clusters of 2D sketches with one of five categories (e.g., “canoe”
vs. “car”) (2020). Different from the present work, 2D images,
Part 7.2 × 10−3 1.4 × 10−4 7.8 × 10−4 7.2 × 10−5 and not 3D models, were used in this study. Limited instances
−3 −5 −4
Workspace 3.6 × 10 5.2 × 10 3.9 × 10 3.1 × 10−5 of 3D-part-based retrieval, as it has been implemented in this
work, exist in prior research to compare retrieval behavior in
the context of physical-form-based similarity. In the application
spaces. The highest variance demonstrates the relative contribu- of the platform in the cognitive study, the task was designed
tion of individual variables to the total variation. Based on such that the specific stimuli retrieved was less relevant than
these values, the use of workspace searches appears to lead to how search intent was expressed. Moreover, we would like to
the retrieval of parts with the lowest overall coverage of both highlight that these definitions of similarity, while simplistic
spaces. Since the dimensions of the appearance and functional and intuitive, only provide very limited perspectives on the ability
embedding spaces differ, variances should be compared within of the models in supporting design ideation. However, future
(by search type) and not across (appearance vs. functional) the work can explore further direct validation metrics and an evalu-
respective embedding spaces. Total variation is expected to be ation of the accuracy of the retrieved examples from the user’s
lower in the functional embedding space, since there are 64 perspective when performing our targeted task.
parameters, compared to 128 in the appearance embedding
space. At a high level, these results suggest that the search mod- Discussion of cognitive study results
ality used impacted the breadth and diversity of inspirational
stimuli discovered. This platform was used to complete a search task during a cog-
nitive study, which was administered such that participants used
the three available search modalities during three distinct sub-
Discussion tasks. Participants were instructed to search for parts using key-
word, part, and workspace searches, in Tasks A, B, and C,
In the present work, a multi-modal search platform was developed respectively. The overall goal of the task was to save parts that
and used to study how designers search for inspirational stimuli. A served as inspirational for designing a multi-compartment dispo-
cognitive study was conducted to investigate the impact of search- sal unit. A limitation of this study design is that the effects of
ing with different modalities to retrieve inspirational stimuli in the learning with each task and the stage of the design process during
form of 3D-model parts. Findings related to the design of the each task may have influenced how search modalities were used.
search platform and results of the cognitive study are further dis- However, for the aims of this work, understanding how each
cussed in this section with added insight from qualitative results. search modality was used and interacted with was prioritized
over capturing how designers may have naturally used them to
achieve specific design outcomes. Each search modality was asso-
Discussion of multi-modal search platform development and
ciated with distinct search behaviors and interactions with
behavior
retrieved parts.
The design, development, and behavior of a multi-modal search Affected outcomes include search frequency and how search
platform are presented in this work. Deep-neural networks were inputs were specified. Most searches occurred in Task B, by
trained to model relationships between 3D-model parts from part search. Prior work has shown that, when presented with ran-
the PartNet dataset. By selecting a large dataset of 3D-model dom examples, high click frequency on examples occurred to
parts as inspirational stimuli, data-driven, deep-learning-based examine them until something desirable was found (Lee et al.,
methods could be leveraged. 3D-model parts specifically contain 2010). Increased searches made with new part selections or simi-
rich information and allowed semantic, visual, and function- larity slider positions may indicate this exploration for desirable
based similarities to be derived between stimuli. These computa- stimuli. When conducting workspace searches in Task C, more
tional methods were then also effectively used to develop a plat- searches made were new and introduced a new part to the work-
form to retrieve examples based on these features. Therefore, space input, than modified with adjusted similarity sliders from
beyond deriving multiple types of similarity, this work presents the previous search. The same result was not observed with part
a platform that additionally provides the flexibility to search searches. One explanation for this finding is that the ability to
based on these characteristics of design stimuli. Various similarity make incremental modifications to the main search input by add-
definitions were considered to help us understand the retrieval ing parts to the workspace may encourage more new searches. An
behavior of the neural networks using a ranked-based measure, analogous incremental manipulation to visual features of the
as introduced in the section “Quantitative retrieval behavior of search input in part searches is absent. Observed differences in
neural networks”. The lowest accuracy measures were observed these inputs suggest that users value the ability to conduct
when relevance was defined in terms of self-similarity (i.e., the searches that vary individual parameters one at a time.
model retrieves the same part as the input). However, in the con- When interacting with the retrieved parts, most parts viewed
text of this platform’s use, self-similarity is a highly conservative in context were those retrieved from keyword searches. One
participant explicitly described their use of this function when Exploratory search strategies
commenting on their keyword search strategy: “I was inspired The platform enables part and workspace searches to specify the
by some of the parts in the ’view in context’ like the ’lid’”. levels of appearance and function-based similarity of results to the
While participants could make part and workspace searches input. While adjustment of sliders provides a method to specify
using a previously retrieved part, text-labeled images of parts desired search results, qualitative responses link the use of sliders
from the view in context function may inspire subsequent key- to, counterintuitively, more exploratory behavior. When asked to
word search inputs. Stimuli combining semantic elements and “describe any strategies [used] when conducting part searches”,
images were also found by Han et al. to help designers generate one participant noted the use of sliders as supporting search
creative ideas (2018). However, the provided stimuli may not when a distinct goal was missing: “I would try both combinations
directly inspire new ideas, but help divert designers onto a new of functionality and appearance because I didn’t really know what
train of thought to enable new ideas (Howard et al., 2011). A sim- I was looking for and I wanted to see all my options”. The use of
ilar process involving indirect stimulation was also observed by similarity sliders is also mentioned as a way to explore limits of
Chen et al. during the use of a mind-mapping tool, where the design stimuli space, in one participant’s part search strategy:
retrieved results prompted further querying (2022). “I mainly used this as a way to look at possible new ideas I had
The final outcome of the cognitive study relates to what partic- not considered before by moving the functionality slider to max
ipants searched for and discovered. The lowest coverage of the and the appearance slider to the lowest setting” and another par-
search space occurred when searching by workspace, as assessed ticipant’s use of workspace searches: “I was trying several factors
using metrics of variance within the appearance and functional that could play with changing the appearance and functionality
embedding spaces. Increased breadth of coverage may occur levels while adjusting it from the opposite to all being very sim-
when defining new keyword and part searches through inspira- ilar”. Previous work on searching with inputs specifying desired
tion by external concepts. For instance, parts discovered when uti- similarity and variety of results has also shown that these param-
lizing view in context may inspire a new keyword search based on eters are helpful for finding relevant examples (Lee et al., 2010).
a shown text-labeled part or a part search for functionally similar These responses support the use of providing mechanisms to con-
parts. He et al. observed that concept-space exploration using duct searches by adjusting parameters that assist with wider
external information was common during interaction with a exploration. Search can be specified based on desired diversity
concept-space visualization tool (2019). Future work may investi- or variety of stimuli, for example.
gate these cognitive processes and motivations for conducting These contributions of our work encourage the further devel-
through think aloud protocols or in-depth post-task interviews. opment of multi-modal search systems, as well as research on cog-
nitive processes relevant to the search for inspirational examples
to support design. Improved understanding is needed regarding
Implications for understanding and supporting how designers
when different approaches to search are more useful (e.g., direct
search
and active vs. exploratory and passive), and how to identify and
The insights gained from the cognitive study aim to advance the promote these processes through interactions with features of
understanding of how designers search for inspirational stimuli, search interfaces.
and how search modalities can differently support these cognitive
processes. Distinct interactions within the platform, as discussed
Conclusion
above, may reflect the different cognitive processes underlying
search. The work presented in this paper provides insight into how search
modality affects the processes designers use to search for and
Active and passive search strategies retrieve inspirational stimuli to support design ideation. We
As introduced in the section “Cognitive processes underlying describe the development of a new multi-modal search platform
search for inspiration”, search behavior can be broadly divided and the results of a cognitive study investigating the role of mod-
into active versus passive strategies, which support situations in ality in search. The first main outcome of this work is the design,
which a specific goal exists versus where random encounters development, and illustration of behavior of a multi-modal search
with inspirational stimuli occur. In general, participants can be platform. A deep-learning approach was leveraged to construct
assumed to be engaged in active search processes when defining deep-neural networks based on semantic, visual, and functional
a search query (i.e., there is intention underlying search). Other relationships between design stimuli from a large dataset of
interactions within the platform can also afford the ability to 3D-model parts. The platform affords inputs based on text,
engage in search processes to passively inspire their next search. 3D-model parts, and assemblies of 3D-model parts to search
As mentioned, passive search can be supported by information for additional parts. A variety of similarity metrics were used to
gained by viewing parts in context. Previous work suggests that quantitatively understand the platform’s retrieval behavior using
participants want to be struck by inspiration and to search more rank-based accuracy measures. Secondly, the results of the cog-
randomly (Herring et al., 2009; Goncalves et al., 2016). Increased nitive study conducted using the search platform were presented.
engagement with parts may therefore be a strategy to randomly When engaging with the platform to search for parts to inspire a
encounter inspiration and inspire subsequent searches. Parts solution to a given design challenge, differences between the three
retrieved from keyword searches were the most engaged with, spe- modalities were observed in terms of frequency of search, how
cifically by being viewed in context, which may indicate that search inputs were defined, interactions with retrieved results,
sources of inspiration not explicitly searched for may be especially and the resulting coverage of the search space. Behaviors such
helpful when searching with a directly articulated input, such as by as increased search frequency and modified adjustments to search
text. Introducing additional means for passive search through ran- inputs are proposed to indicate random exploratory behavior,
dom discovery of inspirational stimuli may formally achieve what which can be enhanced in future creativity-support tools. Other
participants found useful about viewing parts in context. interactions leading to random external stimuli discovery that
inspired new search inputs can be more formally implemented to Fu K, Cagan J, Kotovsky K and Wood K (2013a) Discovering structure in
assist designers during different stages of the search process. design databases through functional and surface based mapping. Journal
Overall, the results of this study contribute to recent work on of Mechanical Design 135, 031006.
Fu K, Chan J, Cagan J, Kotovsky K, Schunn C and Wood K (2013b) The
new search modalities to retrieve inspirational stimuli to enhance
meaning of “near” and “far”: the impact of structuring design databases
design ideation. This study supports the need for further research
and the effect of distance of analogy on design output. Journal of
on both the search process itself, as well as on how modality Mechanical Design 135, 021007.
affects and aids how designers search. Goel AK, Rugaber S and Vattam S (2009) Structure, behavior, and function
of complex systems: the structure, behavior, and function modeling lan-
Data availability. The data that support the findings of this study are avail-
guage. Artificial Intelligence for Engineering Design, Analysis and
able from PartNet (https://partnet.cs.stanford.edu). Restrictions apply to the
Manufacturing 23, 23–35.
availability of these data, which were used under licence for this study.
Goel AK, Vattam S, Wiltgen B and Helms M (2012) Cognitive, collaborative,
Financial support. This work is supported in part by the National Science conceptual and creative — four characteristics of the next generation of
Foundation under grant 2145432 – CAREER. Any opinions, findings, and con- knowledge-based cad systems: a study in biologically inspired design.
clusions or recommendations expressed in this material are those of the authors Computer-Aided Design 44, 879–900.
and do not necessarily reflect the views of the National Science Foundation. Goncalves M, Cardoso C and Badke-Schaub P (2016) Inspiration choices
that matter: the selection of external stimuli during ideation. Design
Conflict of interest. The authors declare none. Science 2, 1–31.
Goucher-Lambert K and Cagan J (2019) Crowdsourcing inspiration: using
crowd generated inspirational stimuli to support designer ideation. Design
Studies 61, 1–29.
Goucher-Lambert K, Moss J and Cagan J (2019) A neuroimaging investiga-
References
tion of design ideation with and without inspirational stimuli—understand-
Arora R, Darolia I, Namboodiri VP, Singh K and Bousseau A (2017) ing the meaning of near and far stimuli. Design Studies 60, 1–38.
Sketchsoup: exploratory ideation using design sketches. Computer Goucher-Lambert K, Gyory JT, Kotovsky K and Cagan J (2020) Adaptive
Graphics Forum 36, 302–312. inspirational design stimuli: using design output to computationally search
Athukorala K, Głowacka D, Jacucci G, Oulasvirta A and Vreeken J (2016) for stimuli that impact concept generation. Journal of Mechanical Design
Exploratory search: from finding to understanding. Journal of the 142, 091401.
Association for Information Science and Technology 67, 2645–2651. Gross MD (2001) Emergence in a recognition based drawing interface. In
Borgianni Y, Rotini F and Tomassini M (2017) Fostering ideation in the very Gero JS, Tversky B and Purcell T (eds), Visual and Spatial Reason in
early design phases: how textual, pictorial and combined stimuli affect crea- Design II. Sydney, Australia: Key Centre of Design Computing and
tivity. Proceedings of the 21st International Conference on Engineering Cognition, pp. 51–65.
Design, ICED17. The Design Society. Han J, Shi F, Chen L and Childs PR (2018) The combinator–a computer-
Botella M, Zenasni F and Lubart T (2018) What are the stages of the creative based tool for creative idea generation based on a simulation approach.
process? What visual art students are saying. Frontiers in Psychology 9, 2266. Design Science 4, e11.
Cer D, Yang Y, Kong S-y, Hua N, Limtiaco N, St. John R, Constant N, Han J, Sarica S, Shi F and Luo J (2022) Semantic networks for engineering
GuajardoCespedes M, Yuan S, Tar C, Strope B and Kurzweil R (2018) design: state of the art and future directions. Journal of Mechanical
Universal sentence encoder for English. Proceedings of the 2018 Design 144, 020802.
Conference on Empirical Methods in Natural Language Processing: System He Y, Camburn B, Liu H, Luo J, Yang M and Wood K (2019) Mining and
Demonstrations. Brussels, Belgium: Association for Computational representing the concept space of existing ideas for directed ideation.
Linguistics. Journal of Mechanical Design 141, 121101.
Chakrabarti A, Sarkar P, Leelavathamma L and Nataraju B (2005) A func- Herring SR, Chang CC, Krantzler J and Bailey BP (2009) Getting inspired!
tional representation for aiding biomimetic and artificial inspiration of new Understanding how and why examples are used in creative design practice.
ideas. Artificial Intelligence for Engineering Design, Analysis and Proc. of SIGCHI Conference on Human Factors in Computing Systems,
Manufacturing 19, 113–132. CHI’09. New York, NY, USA: Association for Computing Machinery.
Chan J, Fu K, Schunn C, Cagan J, Wood K and Kotovsky K (2011) On the Howard TJ, Culley S and Dekoninck EA (2011) Reuse of ideas and concepts
benefits and pitfalls of analogies for innovative design: ideation perfor- for creative stimuli in engineering design. Journal of Engineering Design 22,
mance based on analogical distance, commonness, and modality of exam- 565–581.
ples. Journal of Mechanical Design 133, 081004. Hu R, Yan Z, Zhan J, van Kaick O, Shamir A, Zhang H and Huang H
Chan J, Dow SP and Schunn C (2015) Do the best design ideas (really) come (2018) Predictive and generative neural networks for object functionality.
from conceptually distant sources of inspiration? Design Studies 36, 31–58. ACM Transactions on Graphics (Proc. SIGGRAPH) 37, 151:1–151:14.
Chaudhuri S and Koltun V (2010) Data-driven suggestions for creativity sup- Hua M, Han J, Ma X and Childs P (2019) Exploring the effect of combina-
port in 3D modeling. ACM Transactions on Graphics 29, 1–10. tional pictorial stimuli on creative design performance. Proc. of the 22nd
Chen T, Mohanty RR and Krishnamurthy VR (2022) Queries and cues: text- International Conference on Engineering Design, ICED19. Cambridge
ual stimuli for reflective thinking in digital mind-mapping. Journal of University Press.
Mechanical Design 144, 021402. Jiang S, Luo J, Ruiz-Pava G, Hu J and Magee CL (2020) A convolutional
Cheong H, Chiu I, Shu LH, Stone RB and McAdams DA (2011) Biologically neural network-based patent image retrieval method for design ideation.
meaningful keywords for functional terms of the functional basis. Journal of Proc. ASME 2020 International Design Engineering Technical Conferences
Mechanical Design 133, 021007. and Computers and Information in Engineering Conference, Paper No.
Devlin J, Chang M-W, Lee K and Toutanova K (2019) BERT: Pre-training of DETC2020-22048, Virtual, Online, August 17–19.
deep bidirectional transformers for language understanding. Proceedings of Jiang S, Luo J, Ruiz-Pava G, Hu J and Magee CL (2021) Deriving design fea-
the 2019 Conference of the North American Chapter of the Association for ture vectors for patent images using convolutional neural networks. Journal
Computational Linguistics: human Language Technologies (NAACL-HLT). of Mechanical Design 143, 061405.
Minneapolis, MN, June 2–7, pp. 4171–4186. Jiang S, Hu J, Wood KL and Luo J (2022) Data-driven design-by-analogy:
Do EY-L (2005) Design sketches and sketch design tools. Knowledge-Based state-of-the-art and future directions. Journal of Mechanical Design 144,
Systems 18, 383–405. 020801.
Eckert C and Stacey M (2003) Sources of inspiration in industrial practice: the Kazi RH, Grossman T, Cheong H, Hashemi A and Fitzmaurice G (2017)
case of knitwear design. Journal of Design Research 3, 16–44. Dreamsketch: early stage 3D design explorations with sketching and
generative design. Proc. of the 30th Annual ACM Symposium on User Song H and Fu K (2022) Design-by-analogy: effects of exploration-based
Interface Software and Technology, UIST ‘17. New York, NY, USA: approach on analogical retrievals and design outcomes. Journal of
Association for Computing Machinery. Mechanical Design 144, 061401.
Kittur A, Yu L, Hope T, Chan J, Lifshitz-Assaf H, Gilon K, Ng F, Kraut RE Soufi B and Edmonds E (1996) The cognitive basis of emergence: implica-
and Shahaf D (2019) Scaling up analogical innovation with crowds and AI. tions for design support. Design Studies 17, 451–563.
Proceedings of the National Academy of Sciences 116, 1870–1877. Sutcliffe A and Ennis M (1998) Towards a cognitive theory of information
Kwon E, Pehlken A, Thoben KD, Bazylak A and Shu LH (2019) Visual simi- retrieval. Interacting with Computers 10, 321–351.
larity to aid alternative-use concept generation for retired wind-turbine Toh CA and Miller SR (2014) The impact of example modality and physical
blades. Journal of Mechanical Design 141, 031106. interactions on design creativity. Journal of Mechanical Design 136, 091004.
Kwon E, Huang F and Goucher-Lambert K (2021) Multi-modal search for Tseng I, Moss J, Cagan J and Kotovsky K (2008) The role of timing and ana-
inspirational examples in design. Proc. ASME 2021 International Design logical similarity in the stimulation of idea generation in design. Design
Engineering Technical Conferences and Computers and Information in Studies 29, 203–221.
Engineering Conference, Paper No. DETC2021-71825, Virtual, Online, Vasconcelos LA, Cardoso CC, Sääksjärvi M, Chen C-C and Crilly N (2017)
August 17–19. Inspiration and fixation: the influences of example designs and system
Lee B, Srivastava S, Kumar R, Brafman R and Klemmer SR (2010) properties in idea generation. Journal of Mechanical Design 139, 031101.
Designing with interactive example galleries. Proceedings of the SIGCHI Vattam S, Wiltgen B, Helms ME, Goel AK and Yen J (2011) DANE: foster-
Conference on Human Factors in Computing Systems, CHI ‘10. New York, ing creativity in and through biologically inspired design. In Taura T and
NY, USA: Association for Computing Machinery. Nagai Y (eds), Design Creativity 2010. London: Springer, pp. 115–122.
Linsey J, Wood K and Markman A (2008) Modality and representation in Zhang Z and Jin Y (2020) An unsupervised deep learning model to discover
analogy. Artificial Intelligence for Engineering Design, Analysis and visual similarity between sketches for visual analogy support. Proc. ASME
Manufacturing 22, 85–100. 2020 International Design Engineering Technical Conferences and
Luo J, Sarica S and Wood KL (2021) Guiding data-driven design ideation by Computers and Information in Engineering Conference, Paper No.
knowledge distance. Knowledge-Based Systems 218, 106873. DETC2020-22394, Virtual, Online, August 17–19.
Marchionini G (2006) Exploratory search: from finding to understanding. Zhang Z and Jin Y (2021) Toward computer aided visual analogy support
Communications of the ACM 49, 41–46. (CAVAS): augment designers through deep learning. Proc. ASME 2021
Menezes A and Lawson BR (2006) How designers perceive sketches. Design International Design Engineering Technical Conferences and Computers
Studies 27, 571–585. and Information in Engineering Conference, Paper No. DETC2021-70961,
Mo K, Zhu S, Chang AX, Yi L, Tripathi S, Guibas LJ and Su H (2018) Virtual, Online, August 17–19.
Partnet: a large-scale benchmark for fine-grained and hierarchical part-level
3D object understanding. IEEE Conference on Computer Vision and Pattern
Elisa Kwon is a Ph.D student at the University of California, Berkeley advised
Recognition (CVPR), pp. 909–918. by Dr. Kosa Goucher-Lambert. She received her B.A.Sc. in Engineering
Mougenot C, Bouchard C, Aoussat A and Westerman S (2008) Inspiration,
Science (2017) and her M.A.Sc. in Mechanical Engineering (2019) at the
images and design: an investigation of designers’ information gathering University of Toronto. Her main research interest is in investigating
strategies. Journal of Design Research 7, 331–351.
human cognition during the engineering design process through human-
Murphy J, Fu K, Otto K, Yang M, Jensen D and Wood K (2014) Function
subject studies and drawing on methods from cognitive neuroscience and
based design-by-analogy: a functional vector approach to analogical search. psychology.
Journal of Mechanical Design 136, 101102.
Nagel JK and Stone RB (2012) A computational approach to biologically
inspired design. Artificial Intelligence for Engineering Design, Analysis and Forrest Huang is a Ph.D. candidate at the University of California, Berkeley
Manufacturing 26, 161–176. advised by Prof. John F. Canny. He received a B.S. in Computer Science
Purcell AT and Gero JS (1992) Effects of examples on the results of a design from the University of Illinois at Urbana-Champaign in 2017. His research
activity. Knowledge-Based Systems 5, 82–91. focuses on developing deep-learning systems that support creative activities
Robertson B, Walther J and Radcliffe DF (2007) Creativity and the use of with sketch-based and natural-language-based user interaction. His research
CAD tools: lessons for engineering design education from industry. contributions also include large-scale novel UI design and sketch datasets,
Journal of Mechanical Design 129, 753–760. and interactive visualization and debugging tools for deep-learning workflows.
Sarica S, Luo J and Wood KL (2020) Technet: technology semantic network
based on patent data. Expert Systems with Applications 142, 112995.
Sarica S, Song B, Luo J and Wood KL (2021) Idea generation with technology Kosa Goucher-Lambert is an Assistant Professor of Mechanical Engineering
semantic network. Artificial Intelligence for Engineering Design, Analysis at the University of California, Berkeley, and Affiliate Faculty member in
and Manufacturing 35, 265–283. the Jacobs Institute of Design Innovation and the Berkeley Institute of
Setchi R and Bouchard C (2010) In search of design inspiration: a semantic- Design. Dr. Goucher-Lambert is an expert in the field of engineering design
based approach. Journal of Computing and Information Science in theory, methods, and automation, and conducts research merging computa-
Engineering 10, 031006. tional analyses of human-behavior in design with cognitive and neuroscience
Siangliulue P, Chan J, Gajos KZ and Dow SP (2015) Providing timely exam- models of designer behavior. He is the recipient of an NSF CAREER Award
ples improves the quantity and quality of generated ideas. Proceedings of the and 2019 Excellence in Design Science Award. He has received several best
2015 ACM SIGCHI Conference on Creativity and Cognition, C&C ‘15. paper awards from the American Society of Mechanical Engineers and the
New York: Association for Computing Machinery. Design Society.