NL2API - A Framework For Bootstrapping Service Recommendation Using Natural Language Queries
NL2API - A Framework For Bootstrapping Service Recommendation Using Natural Language Queries
net/publication/325203163
CITATIONS READS
0 21
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Anup Kalia on 17 May 2018.
Chen Lin∗ , Anup K. Kalia† , Jin Xiao† , Maja Vukovic† and Nikos Anerousis†
∗ Department of Computer Science
North Carolina State University, Raleigh, NC
Email: [email protected]
† IBM T. J. Watson Research Center, Yorktown Heights, NY 10598
{anup.kalia, jinoaix, maja, nikos}@us.ibm.com
Abstract—Existing approaches to recommend services using techniques such as keyword matching, entity relationship ex-
natural language queries are supervised or unsupervised. traction, and extraction of topics and clusters. The keyword
Supervised approaches rely on a dataset with natural language matching approach [11] and the entity relationship extrac-
queries annotated with categorizing labels. As the annotation
process is manual and requires deep domain knowledge, tion [17] technique match users’ natural language queries
these approaches are not readily applicable on new datasets. syntactically with service descriptions, thereby, ignoring the
On the other hand, unsupervised approaches overcome the semantic similarity and may lead to false positives. For
limitation. To date, unsupervised approaches are primarily example, consider a user wants to find services related to
based on matching keywords, entity relationships, topics and bus. The response returns a service Message Bus that has
clusters. Keywords and entity relationships ignore the semantic
similarity between a query and services. Topics and clusters no relation to the user’s actual intent, i.e., transport services.
capture the semantic similarity, but rely on mashups that For another instance, approaches based on the extraction of
explicitly capture relationships between services. Again, for new topics and clusters [13], [6], [5], [16] capture the semantic
services, the information are not readily available. We propose similarity between a user query and service descriptions.
NL2API, a framework that relies solely on service descriptions However, these approaches rely on mashups that explicitly
for recommending services. NL2API has the benefit of being
immediately applicable as a bootstrap recommender for new capture the relationships between services – mashups may
datasets. To capture relationships among services, NL2API not be available for all services.
provides different approaches to construct communities where In the paper, we propose a novel unsupervised framework
a community represents an abstraction over a group of services. natural language to service APIs (NL2API) that recommends
Based on the communities and users’ queries, NL2API applies services based on users’ queries and service descriptions.
a query matching approach to recommend top-k services. We
evaluate NL2API on datasets collected from Programmable The approach does not rely on mashups since they are not
Web and API Harmony. Our evaluation shows that for sizable available for new services especially if they have not been
datasets such as Programmable Web NL2API outperforms used in any application. Thus, NL2API acts as a bootstrap
baseline approaches. service recommender for new services. NL2API identifies
Keywords-service discovery, topic modeling, deep learning, relationships between services by extracting communities
community detection, clustering, web services based on their semantic similarities of intents. Each com-
munity is represented as a tree like structure where the
I. I NTRODUCTION the internal nodes represent hierarchical intents and the leaf
Web services help users to develop applications by offer- nodes represent topics. A topic refers to a common theme
ing them a wide range of functionalities in the domains of across a group of services, whereas an intent refers to an
IT, e-commerce, marketing, location, communications, and abstraction over topics, or other intents. NL2API identifies
so on. With the increasing number of services in different the communities which are most closely related to the
domains, the discovery process has become challenging. users’ queries and then recommends top-k services with a
Existing discovery techniques rely on two broad ap- confidence score from these communities using the latent
proaches, supervised and unsupervised. In supervised ap- semantic index (LSI) technique.
proaches a training dataset for a domain is prepared where The proposed framework was evaluated on two datasets:
natural language based queries are annotated with specific Programmable Web and API Harmony [8]. The results show
services by domain experts [9] or from crowd [14]. There are that the proposed framework can retrieve relevant results
a few challenges associated with supervised approach. First, without external information such as mashups. The results
the annotation process is time consuming and laborious. also show that the proposed framework outperforms baseline
Second, annotations can be misleading and biased. Unsu- approaches especially when it was applied to large dataset
pervised approaches overcome the limitations by considering such as Programmable Web.
II. R ELATED W ORK Stack Overflow. We assume that Stack Overflow might not
The existing approach for service recommendation is provide data for a new domain. Another limitation of their
based on supervised approaches or unsupervised approaches. approach is that they consider keyword based matching that
ignores the semantic similarity of users’ queries with service
A. Supervised Approaches descriptions and relationship between services.
Su et al. [14] propose a framework for web service discov- Xie et al. [16] provide an approach that takes input as
ery. Since the framework relies on a supervised approach, mashups (related services) and clusters them into groups
Su et al. creates a training data based on inputs from crowd. based on their textual descriptions. For the clustering they
The training data consists of mappings between informal used the K-medoids algorithm. From the clusters, services
commands based on natural language and service calls. Su are recommended based on user’s requirements. The ap-
et al. provide an intermediate representation or a canonical proach has several limitations. One, the approach relies on
command for each service in terms of HTTP verb, resource, mashups which may not be present for a new domain. Two,
return type, required parameters, and optional parameters. the approach clusters mashups rather than services. Thus, the
Then, the crowd is employed to paraphrase canonical com- identified clusters may not correctly depict the relationship
mands. They use LSTM for training and evaluation of between services.
their annotated dataset. Their approach has limitations. One, Gao and Wu [5] provide a framework to recommend
collecting informal commands for a large repository of a set of services based for mashup composition based on
services can be challenging. Two, with inclusion of new users’ text inputs. In addition to matching services with
domains, the intermediate representation has to be refined. users’ requirements they emphasize on the quality of service
Kalia et al. [9] provide service discovery in a service recommended. For recommending top-k services, rather than
catalog based on natural language requests from users. using traditional topic modeling techniques, they create a
Their approach relies on multiple labels associated with mapping between service descriptions and mashup compo-
each service. For example, a service can be associated with sition. The supervise technique first learns the relevance
a category, task, and an action. To prepare the training between service descriptions and mashups. Then, they apply
data Kalia et al. consider IT change requests and annotate clustering using K-means to recommend top-k services.
them with multiple labels based on a service catalog. For They used the learned model from the supervised approach
building their model they use the classifier chain multilabel to improve clustering. Clearly the approach rely on mashup
classification and improve the classification accuracy by composition to cluster services that may not be present for
providing a feedback based on extracted parameters from new domains.
users’ requests. Their approach is limited to a specific Samanta and Liu [13] provide an approach for service
catalog of services. For new catalogs, the annotation process recommendation that applies Hierarchical Dirichlet process
can be cumbersome and needs extensive domain knowledge on services to generate topics. The approach considers
for annotation. mashups as its input to generate topics. A cosine similarity
between the topics generated from services and mashups is
B. Unsupervised Approaches computed to generate a candidate list of services. Further,
Xiong et al. [17] provides a framework that considers the usage history of services in mashups is used to produce
users’ personalized requirements and semantic graphs to top k-services from the candidate list. Hao et al. [6] propose
generate recommendations for users. The semantic graph is an approach to refine service descriptions for a specific
constructed based on natural language API descriptions. A query. The approach relies on mashup descriptions that
semantic representation is captured as triples in the form provide application scenarios for services and such descrip-
of (X, α, Y) where X and Y are entities and α is the tions provide additional information for services that can
string of words that intervenes between X and Y. Relation- be used to reconstruct the original service descriptions. For
ship are constructed by extracting typed dependency trees reconstruction of service descriptions, Hao et al. consider
from descriptions. Using the relationships, they recommend Latent Dirichlet Allocation to extract topics from mashup de-
top-k recommendations. Their approach does not discover scriptions and the query given by the user. Both approaches
the semantic relationships between services based on their rely on mashups that may not be present for all kinds of
descriptions. Thus, the approach may fail to recognize the domains.
users’ intent from their queries.
Rahman et al. [11] provide an approach that uses crowd- III. T HE NL2API F RAMEWORK
sourced knowledge for recommendation of APIs. Their In the section, we present an overview of our proposed
approach exploits the relationships of keywords extracted framework NL2API. Our framework has three components:
from Stack Overflow questions and answers with APIs a preprocessing unit, a community extractor, and a query
based on their descriptions. One important limitation of their matcher. In the following sections, we present each of the
approach is that the approach rely on external dataset such as components in greater detail.
A. The Preprocessing Unit
The preprocessing unit takes natural language service
descriptions as input and preprocesses them using a pipeline
that removes punctuations, applies part-of-speech (POS)
tagger, removes frequent, infrequent, and corpus specific
stop words. In our framework, we removed infrequent words
that were in three or less number of service descriptions and
frequent words that appear in more than 10% of the service
descriptions. For corpus-specific stop words, we heuristically
select the low-quality topics and remove the most frequent
terms in the topics. Table I shows examples of such words.
Table I
E XAMPLES OF MOST FREQUENT, INFREQUENT, AND CORPUS - SPECIFIC
STOP WORDS EXTRACTED FROM SERVICE DESCRIPTIONS IN
P ROGRAMMABLE W EB . Figure 1. Examples of communities constructed from service descriptions
Type Words
cluster approach. Finally, a topic modeling method is applied
Frequent service, time, method, platform, developer, api, web,
words system, site, respone, functionality
to services within each cluster. A detailed descriptions are
provided in the following paragraphs.
Infrequent phyloinformatic, carregistrationapi, myhurricane, 1) Topic Modeling (TNMF): Existing topic modeling
words atomicreach, hotukdeal, polypeptide, cooladata,
eventcategorie, changeavatar, zubhium, americascup techniques consider the well-known algorithms such as La-
tent Dirichlet Allocation (LDA) [1] and non-negative matrix
Corpus- management, feature, message, solution, integration,
specific app, tool, function, client, request, xml, way, provider factorization (NMF) [3]. However, empirical studies [12],
stopword [19] have shown that their effectiveness greatly reduces
when the length of documents decreases. Considering most
service descriptions are short sentences, we consider the
B. The Community Extractor non-negative matrix factorization based on term correlation
A community represents a hierarchical structure that rep- matrix (TNMF) [18] as the method has been proven to out-
resents a set of topics related to a common intent. Within perform LDA, NMF, probabilistic latent semantic analysis
each community (Figure 1), the structure can be constructed (PLSA), graph regularized NMF [3], and symmetric NMF
as a tree where the internal nodes represent intents and leaf [10] for multiple datasets containing short texts.
nodes represent topics. Each topic is inferred from a list of Table II
service descriptions. Note that the depth of the hierarchical E XAMPLES OF TOPIC AND TOPIC WORDS EXTRACTED FROM SERVICE
tree indicates intents at different levels of abstraction. Intents DESCRIPTIONS IN P ROGRAMMABLE W EB .
TNMF+LCD
TNMF
(a) total coverage (PW). (b) keywords coverage (PW). (c) semantics coverage (PW).
LSTM+K-Means
TNMF+LCD
TNMF
0 10 20 30 40 50 −5 10 20 30 40 50 −5 15 30 45 60
(d) total coverage (AH). (e) keywords coverage (AH). (f) semantics coverage (AH).
Figure 4. Boxplots comparing different approaches applied on datasets from Programmable Web (PW) and API Harmony (AH).
semantics, and total coverage. Figure 4 shows the boxplots 66.93%) and LSTM + K-means (std = 7.5%, var = 56.56%)
of coverage performance for each approach by averaging the approaches are significantly larger than the TNMF approach
coverage scores across 5 intents. the exercise is conducted (std = 0.83%, var = 0.69%). The observation suggests that
for Programmable Web (PW) and for API Harmony (AH). output based on communities constructed could vary largely
Total Coverage. For the Programmable Web dataset we from one query to another i.e., for some queries we can get
find that the median of the total coverage for the approaches a higher coverage than others. However, the may not hold
TNMF + LCD (10.41%) and LSTM + K-means (9.12%) out- for the TNMF approach. For the API Harmony dataset, we
perform the baseline approach TNMF (9.02%). However, for find the standard deviation (std) and variance (var) for the
the API Harmony dataset we find that the median of the total TNMF approach is larger than approaches TNMF + LCD
coverage for the approach TNMF (28.35%) outperforms the and LSTM + K-means. The observation suggests that a
approaches TNMF + LCD (10.75%) and LSTM + K-means larger dataset can be crucial to improve the performance
(8.48%). The observations show that for a large dataset of the recommendations.
such as Programmable Web extracting communities could Semantic Coverage. For the Programmable Web dataset,
be beneficial in discovering closely related services. For a we find that the median of the semantic coverage for
small dataset such as API Harmony, a baseline approach the approach TNMF + LCD (12.5%) outperforms the ap-
is sufficient to discover relevant services. the suggest that proaches TNMF (9.7%) and LSTM + K-means (9.7%).
with increasing size of datasets, our approach is beneficial For the API Harmony dataset, we find that the median of
to developers. the semantic coverage for the TNMF approach (23.22%)
Keyword Coverage. For the Programmable Web dataset, outperforms the TNMF + LCD (13.02%) and LSTM + K-
we find that the median of the keyword coverage for the means (7.97%) approaches. From the observations we find
approaches TNMF (8.33%) and TNMF + LCD (8.33%) that for a larger dataset such as Programmable Web the com-
outperforms the LSTM + K-means (3.33%) approach. For munities extracted using LCD could be useful in discovering
the API Harmony dataset, we find that the median of relevant services based on semantic queries. In addition, we
the keyword coverage for the approach TNMF (30.79%) observe that for the Programmable Web dataset, the standard
outperforms the approaches TNMF + LCD (6.32%) and deviations (std) and variances (var) for the TNMF + LCD
LSTM + K-means (5.09%). From the observations we find (std = 13.2%, var = 174.7%) and LSTM + K-means (std
that for searching services based on keyword queries topic = 5.8%, var = 33.7%) approaches are significantly larger
modeling approaches based on TNMF can provide relevant than the TNMF approach (std = 4.49%, var = 20.17%). The
recommendations. Additionally, for the Programmable Web observation suggests that based on communities constructed
dataset we find that the standard deviations (std) and vari- we can get higher coverage for certain queries. It may not
ances (var) for the TNMF + LCD (std = 8.18%, var = hold for the TNMF approach. For the API Harmony dataset,
we find the standard deviation (std) and variance (var) for [8] IBM. Api harmony, 2017.
the TNMF + LCD approach is larger than the TNMF and
LSTM + K-means approaches. The observation suggests that [9] A. K. Kalia, J. Xiao, M. F. Bulut, N. Anerousis, and
M. Vukovic. Cataloger: Catalog recommendation service
for semantic queries, generating communities from topics for it change requests. In Proceedings of International
are beneficial for better recommendations irrespective of the Conference on Service Computing, pages 545–560, Malaga,
data size. nov 2017. Springer.
VI. D ISCUSSION [10] D. Kuang, C. Ding, and H. Park. Symmetric nonnegative
We present NL2API a framework for recommending top-k matrix factorization for graph clustering. In Proceedings
services based on three different approaches. The approaches of the 2012 SIAM international conference on data mining,
pages 106–117. SIAM, 2012.
are agnostic of mashups or pre-specified mappings from
natural language queries to services. Thus, NL2API can act [11] M. M. Rahma, C. K. Roy, and D. Lo. RACK: Auto-
as a quick bootstrapping recommender for datatsets that do matic api recommendation using crowdsourced knowledge.
not have mashups or prior history of services’ usage. We In Proceedings of 23rd IEEE International Conference on
evaluate NL2API on two different datasets, Programmable Software Analysis, Evolution, and Reengineering, pages 349–
359, Klagenfurt, 2016. IEEE.
Web and API Harmony. Our findings show that for larger
datasets such as Programmable Web the TNMF + LCD and [12] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The
LSTM + K-means approaches provide significantly higher author-topic model for authors and documents. In Pro-
coverage than the baseline TNMF approach. However, for ceedings of the 20th conference on Uncertainty in artificial
smaller dataset such API Harmony, the baseline approach intelligence, pages 487–494. AUAI Press, 2004.
can produce better results than the other approaches. [13] P. Samanta and X. Liu. Recommending services for new
In the future, we plan to further expand our investigation. mashups through service factors and top-k neighbors. In Pro-
One, we plan to create a larger repository of services. Two, ceedings of IEEE International Conference on Web Services,
we plan to provide a more efficient query matching tech- pages 381–388, Honolulu, jun 2017. IEEE.
nique such as word vector to discover relevant communities
[14] Y. Su, A. H. Awadallah, M. Khabsa, P. Pantel, and M. Ga-
generated by the TNMF + LCD and the LSTM + K-means mon. Building natural language interfaces to web apis.
approaches. Three, we plan to perform an extensive user In Proceedings of the 26th ACM International Conference
study to further evaluate the effectiveness of our approaches. on Information and Knowledge Management, pages 1–10,
Singapore, nov 2017. ACM.
R EFERENCES
[1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet alloca- [15] Z. Wu and M. Palmer. Verbs semantics and lexical selection.
tion. Journal of machine Learning research, 3(Jan):993–1022, In Proceedings of the 32nd annual meeting on Association
2003. for Computational Linguistics, pages 133–138. Association
for Computational Linguistics, 1994.
[2] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefeb-
vre. Fast unfolding of communities in large networks. [16] F. Xie, J. Liu, M. Tang, D. Zhou, B. Cao, and M. Shi.
Journal of statistical mechanics: theory and experiment, Multi-relation based manifold ranking algorithm for api rec-
2008(10):P10008, 2008. ommendation. In G. Wang, Y. Han, and G. M. Pérez,
editors, Proceedings of 10th Asia-Pacific Services Computing
[3] D. Cai, X. He, J. Han, and T. S. Huang. Graph regularized Conference, pages 15–32, Zhangjiajie, 2016. Springer.
nonnegative matrix factorization for data representation. IEEE
Transactions on Pattern Analysis and Machine Intelligence, [17] W. Xiong, Z. Wu, B. Li, Q. Gu, L. Yuan, and B. Hang.
33(8):1548–1560, 2011. Inferring service recommendation from natural language api
descriptions. In Proceedings of IEEE International Confer-
[4] A. Clauset, M. E. Newman, and C. Moore. Finding com- ence on Web Services, pages 316–323, San Francisco, jun
munity structure in very large networks. Physical review E, 2016. IEEE.
70(6):066111, 2004.
[18] X. Yan, J. Guo, S. Liu, X. Cheng, and Y. Wang. Learning
[5] W. Gao and J. Wu. A novel framework for service set topics in short texts by non-negative matrix factorization on
recommendation in mashup creation. In Proceedings of term correlation matrix. In Proceedings of the 2013 SIAM
IEEE International Conference on Web Services, pages 65– International Conference on Data Mining, pages 749–757,
72, Honolulu, jun 2017. IEEE. San Diego, 2013. SIAM.
[6] Y. Hao, Y. Fan, W. Tan, and J. Zhang. Service recommenda- [19] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and
tion based on targeted reconstruction of service descriptions. X. Li. Comparing twitter and traditional media using topic
In Proceedings of IEEE International Conference on Web models. In European Conference on Information Retrieval,
Services, pages 285–292, Honolulu, jun 2017. IEEE. pages 338–349. Springer, 2011.
[7] S. Hochreiter and J. Schmidhuber. Long short-term memory.
Neural computation, 9(8):1735–1780, 1997.