RoboMaker Users Guide
RoboMaker Users Guide
Users Guide
RoboMaker
RoboMaker
Users Guide
CONTENTS
iii
Contents
INTRODUCTION ........................................................................................ 1 What is RoboMaker?............................................................................1 Organization ......................................................................................1 Before You Read On ............................................................................1 Other Resources .................................................................................2 ROBOMAKER BASICS ................................................................................. 3 The Robot..........................................................................................3 Objects .............................................................................................3 Robot Libraries and Robot Projects ........................................................4 The Robot State .................................................................................4 Steps ................................................................................................5 Connections and Execution Flow ...........................................................6 Error Handling....................................................................................8 GETTING STARTED .................................................................................. 10 A Tour of the RoboMaker User Interface ............................................... 11 The Robot View ............................................................................. 11 The State View .............................................................................. 12 The Step View ............................................................................... 15 The Objects View ........................................................................... 16 A Tour of the RoboDebugger User Interface .......................................... 17 Robot Navigation and Editing.............................................................. 19 Undoing and Redoing Changes ......................................................... 19 Cutting, Copying, and Pasting .......................................................... 19 Manipulating Steps and Connections ................................................. 19 Step Actions and Data Converters ....................................................... 20 Patterns .......................................................................................... 20 Expressions ..................................................................................... 24 Working with Robot Projects and Robot Libraries ................................... 27 Putting It All Together ....................................................................... 29 TUTORIAL 1: ESSENTIALS ......................................................................... 31 TUTORIAL 2: FORM SUBMISSION ................................................................ 46 HOW TO CONFIGURE A ROBOT ................................................................... 63 HOW TO CONFIGURE THE OBJECTS OF A ROBOT .............................................. 65 HOW TO USE THE TAG FINDERS.................................................................. 67 Understanding Tag Paths ................................................................... 68 How the Tag Finder Works ................................................................. 69
iv
ROBOMAKER USER'S GUIDE Configuring the Tag Finders of the Current Step .................................... 70
HOW TO SUBMIT A FORM .......................................................................... 71 Simple Form Submission.................................................................... 71 Form Basics ..................................................................................... 71 Which Step Action Should I Use?......................................................... 74 Using the Submit Form Action ............................................................ 75 Using the Loop Form Action................................................................ 78 Uploading Files ................................................................................. 82 Using the Pop-up Menu in the Page View .............................................. 83 HOW TO LOOP THROUGH PAGES ................................................................. 84 Pages where First Page Links to All Other Pages .................................... 84 Pages where Each Page Links to Next .................................................. 85 HOW TO EXTRACT CONTENT ...................................................................... 87 Extracting Text................................................................................. 88 Extracting Clips ................................................................................ 89 Extracting Binary Data....................................................................... 90 Using the Pop-up Menu in the Page View .............................................. 90 Performing Common Tasks................................................................. 91 Extracting Only Part of a Text .......................................................... 91 Converting Content ........................................................................ 91 Number and Date Extraction and Formatting...................................... 92 Extracting Only a Subset of the Tags in the Found Tag ........................ 92 HOW TO EXTRACT CONTENT FROM A TABLE ................................................... 93 Content Irregularities ........................................................................ 93 Structure Irregularities ...................................................................... 93 HOW TO CLIP ........................................................................................ 95 What is Clipping? .............................................................................. 95 How Clipping Works .......................................................................... 95 The Structure of a Clipping Robot........................................................ 97 A Simple Clipping Robot .................................................................. 97 A Robot with Multiple Clip Branches .................................................. 98 A Robot with Automatic Navigation Sequences ................................... 98 Creating a Clipping Robot ................................................................ 100 The Portlet View ............................................................................. 101 Working with Clip Branches .............................................................. 104 Adding a New Clip Branch ............................................................. 104 Editing a Clip Branch .................................................................... 107 Using another Clip Branch for a Page .............................................. 108 Using Clip Conditions .................................................................... 110 Modifying Clips ............................................................................... 113 Selecting the Tags to Clip.............................................................. 114
CONTENTS
Changing Layout and Styles .......................................................... 115 Modifying the Pages before Clipping................................................ 118 Working with Windows and Frames ................................................... 120 Selecting the Window to Show in the Portlet .................................... 121 Blocking Popup Windows ............................................................... 121 Handling Login and Single-Sign-On ................................................... 122 Performing Automatic Login........................................................... 122 Performing Automatic Logout......................................................... 125 Supporting other Types of Single-Sign-On ....................................... 126 Adding an Automatic Navigation Sequence ......................................... 127 Other Topics .................................................................................. 128 Restricting the Clipping ................................................................. 128 Clipping Protected Resources ......................................................... 129 Configuring the Clipped User Actions............................................... 130 Passing Additional Information to a Clipping Robot............................ 131 Deploying a Clipping Robot .............................................................. 133 Generating the Clipping Portlet ...................................................... 133 Handling Clipping Sessions on RoboServer....................................... 134 HOW TO HANDLE ERRORS ....................................................................... 135 Handling a Steps Own Errors ........................................................... 135 Handling a Steps Received Errors ..................................................... 137 Using the Until Successful Branch Branching Mode ............................ 138 More Examples of Using Until Successful Branch ............................... 139 Viewing the Error Handling in the Robot View ..................................... 140 HOW TO WRITE A ROBOT WITH INPUT OBJECTS ........................................... 141 HOW TO MAKE ROBOTS MORE ROBUST ...................................................... 143 HOW TO REUSE SESSIONS ...................................................................... 144 HOW TO DEBUG A ROBOT ....................................................................... 146 Basic Debugging............................................................................. 146 Debugging from the Current Location in RoboMaker............................. 148 Making RoboMaker Go to a Location from RoboDebugger...................... 148 Using Breakpoints........................................................................... 149 Single-Stepping.............................................................................. 149 Using Environments ........................................................................ 149 HOW TO USE THE BROWSER TRACER ......................................................... 152 Setting Up a Browser ...................................................................... 152 Tracing ......................................................................................... 152 The Difference View ........................................................................ 153 JavaScript Trace ............................................................................. 153 HTTP Trace .................................................................................... 153 Saving and Loading Trace Sessions ................................................... 154
vi
INDEX................................................................................................ 155
INTRODUCTION
Introduction
What is RoboMaker?
RoboMaker is the application for creating and debugging robots. In RoboMaker, you can create and debug robots of any kind, including data collection robots that extract objects from a web site, and clipping robots that clip a part of an HTML page to be shown in another context, e.g. a portal. RoboMaker is an integrated development environment (IDE) for robots. This means that RoboMaker is all you need for programming robots in an easy-tounderstand programming language with its own syntax (structure) and semantics (meaning). To support you in the construction of robots, RoboMaker provides you with powerful programming features including interactive visual programming, full debugging capabilities, an overview of the program state, and easy access to context-sensitive on-line help.
Organization
The User's Guide is structured as follows: First, you are introduced to the essential concepts of RoboMaker. Then you are taken on a tour of the user interface and provided with an overview of the core building blocks of any robot. With the basics firmly in place, we get to the tutorials that show you how to use RoboMaker to create robots that do something useful. The tutorials get gradually more advanced until, finally, you are ready to create robots that perform tasks that you decide. The tutorials are the meat and bone of this User's Guide and it is critical to its success that you master them before proceeding. The rest of the User's Guide is divided into various topics (How To...). You should skim through them to get an idea of what they cover. Then you can return later when you need more information or help on one of the topics.
Other Resources
Additional, mostly referential documentation on RoboMaker is available in the RoboMaker entry in RoboHelp, which is accessible from the Help menu in RoboMaker. You should also check out the support site at this URL: http://support.kapowtech.com/
ROBOMAKER BASICS
RoboMaker Basics
RoboMaker is a programming environment for programming robots in a special-purpose programming language with its own syntax and semantics. Like other programming environments, RoboMaker uses several concepts that you, as a robot designer, must understand in order to fully comprehend the workings of RoboMaker. It is the purpose of this chapter to introduce the most important of these concepts. Don't worry if on your first reading of this chapter you don't understand all the RoboMaker concepts described below; they will be become clearer to you as you explore RoboMaker and start to write robots. However, it is recommended that you refer back to this chapter whenever you encounter a concept whose meaning you do not understand.
The Robot
The most important concept in RoboMaker is robot. A robot is a program designed to accomplish some task, usually involving a web site. Typically, one robot is written per task per web site. For example, you would create one robot for extracting news from cnn.com, and another robot for extracting product information from an online product catalogue. In RoboMaker, you create one robot at a time. Basically, a robot can be programmed to do (automatically) everything you can do in a browser, such as Internet Explorer.
Objects
A robot outputs objects. An object is a collection of attributes. An attribute has an attribute name and may contain a single attribute value. For example, a robot that extracts news from some web site will output news objects; each news object has attributes with attribute names such as headline, body text, date, author, etc., and each outputted news object will have different attribute values for each attribute (unless, of course, the same news object is outputted more than once!). An outputted object is called a returned object. Some robots accept input objects as input. The input objects are a collection of objects that the robot can use in performing its task. For example, a shopping robot that orders books at amazon.com might accept input objects containing a user information object and a book object.
Output
Robot
Output Objects
All objects are designed in the ModelMaker application and can be imported into RoboMaker. Objects are part of a domain model. See the ModelMaker Users Guide for more information on designing objects and domain models using ModelMaker.
The windows element is the currently open windows, each containing a web page or part of a web page. At least one window is always open, and one window is marked as the current window. The objects element contains the current values of the objects. The cookies and authentications elements are
ROBOMAKER BASICS
the HTTP cookies and authentications, respectively, received during communication with a web server.
Steps
A robot is made up of steps. A step is a building block in a robot program.
Figure 2: A Step
A step accepts a robot state as input and, depending on the step configuration, outputs zero or more robot states. A step consists of several elements, including a step name, a list of tag finders and a step action. The step name provides a symbolic name for the step, such as Extract Headline and Load Search Page. In Figure 2 above, the step name is MyStep. The tag finders find the tags in the page that the step action (see below) should work on. Some step actions require a single tag, whereas others can handle more than one tag. Some step actions accept no tags at all. The step action is the action that the step performs. For example, an Extract action might extract the text from a tag and store it in an object attribute. And a Click action might load the URL residing in an <a>-tag and replace the page of the current window in the robot state with the newly loaded web page. An action usually changes the robot state. For example, the Extract action changes the objects, and the Click action changes the pages/windows, the cookies and the authentications. The action is the heart and brain of the step, and it is the selection of the right step action that is the challenge of robot writing. Some actions are termed loop actions. A loop action outputs zero or more robot states to the step that follows it. For example, a For Each Tag action looping through the <tr>-tags in a <tbody>-tag (inside a <table>tag) will output one robot state for each <tr>-tag in the <tbody>-tag; if the <tbody>-tag has eight <tr>-tags, then the For Each Tag action will generate and output eight robot states. When a loop action is outputting its Nth robot state, its current iteration is said to be N. The step shown in Figure 2 contains a loop action and the current iteration is 3. Some actions use data converters for converting data, e.g. converting text to a number or uppercasing it. A step can be executed. A step that is executed accepts a robot state as input and, by applying the tag finders and step action in turn, produces zero or more robot states as output. A step is valid if it has been properly configured so that execution can be attempted. For example, if a step has no action, it is invalid since execution cannot be attempted.
All robots containing at least one step have a first step. The first step is the step that first gets executed when the robot is executed.
This robot consists of three steps named step A, step B, and step C. Assuming that no errors occur, and that each step generates exactly one output robot state, then this robot is executed as follows: An initial robot state will be generated and inputted to step A (being the first step). Step A will produce an output robot state. This output robot state will be the input robot state to step B. Similarly, step B will produce a robot state and this will be the input robot state to step C. Once step C has executed and outputted a robot state, execution completes. In short, the execution of steps can be described as follows: A, B, C. Sometimes, a step generates no output robot state when executed. This is quite normal for steps containing a conditional action that is, an action that analyzes the input robot state and only outputs a robot state if the input robot state satisfies certain conditions. In the simple robot above, if step B outputs no robot state, then the execution of steps will be as follows: A, B. Note that step C will not get executed. The general rule is: If a step outputs no robot state, then execution will not proceed beyond that step. Other steps, namely those containing a loop action, might output more than one robot state. Consider the robot below where step B contains a loop action:
Assuming there are no errors, that step B outputs three robot states, and that all other steps output exactly one robot state, then the steps will be executed in the following order: A, B[1], C, D, B[2], C, D, B[3], C, D, where B[N] refers to the Nth iteration of the loop action contained in step B. Note that the robot states outputted by step B will be new robot states that is, each iteration will output a new robot state. Hence, step C will receive a new input robot state each time it is executed.
ROBOMAKER BASICS
A step can connect to more than one step. This is called branching. Consider the robot below:
In this robot, step A has two branches, one consisting of step B and step C, and another consisting of step D and step E. How branches are executed depends on the branching mode that has been selected for the step that has the branches (in this case step A). With the default branching mode, which is called All Branches, all of the branches are executed, one after another. Therefore, assuming that no errors occur and that each step generates exactly one output robot state, then the robot above will be executed as follows: A, B, C, D, E. However, it is important to note that step B and step D will each receive a copy of the same robot state outputted by step A. Were it not for the fact the some steps might have external effects, branches could in principle be executed in parallel. Sometimes you want to select (i.e. execute) only one of several branches. One way to handle this is to let the first step in each branch contain a conditional action. In the robot above, step B and step D could each contain a conditional action, each configured so that they collectively ensure that either step C or step E gets executed, but not both. Branches can be, and often are, mutually intertwined. Consider the following robot:
This robot illustrates how connections are ordered. Unless otherwise noted, connections are executed top-down. In this robot, however, the branches of step D are executed in the order specified by the numbers, that is, step E is executed before step C. Assuming no errors occur and that each step generates exactly one output robot state, then the robot is executed as follows: A, B, C, D, E, C. The first time step C is executed it will receive the robot state outputted by step B; the second time step C is executed it will receive the robot state outputted by step D.
Error Handling
Steps generate an error if they fail to process the input state. For example, the tag finders might fail to find the tag due to a dramatic page layout change. A step that generates an error is said to fail. For the purposes of this section we will assume that a step that generates an error will report it immediately. Several other error handling possibilities exist; see the How to Handle Errors chapter for more information. Consider the simple robot below and assume that each step expects to output exactly one robot state:
If step A reports an error, then the execution of steps is as follows: A. If step B reports an error, then the execution of steps is as follows: A, B. If step C reports an error, then the execution of steps is as follows: A, B, C. The general rule is: If a step generates an error, then the steps beyond that step will not be executed. Error handling is affected by loop actions. Consider the robot below and assume that step B expects to output three robot states, and all other steps expect to output exactly one robot state:
If step C reports an error the second time it is executed, then the execution of steps is as follows: A, B[1], C, D, B[2], C, B[3], C, D. Note that the error causes the loop action to go to the next iteration. In another situation, if step B reports an error when generating its second output robot state, then the execution of steps is: A, B[1], C, D, B[2]. Note that the error causes step B to fail completely, and, hence, execution does not go to the next iteration. The general rule is: If a step containing a loop action fails, then it fails completely like any other step, i.e. execution does not proceed beyond the step. Error handling is also affected by branching. Consider the robot below and assume that each step expects to output exactly one robot state:
ROBOMAKER BASICS
If step B reports an error, then the execution of steps is as follows: A, B, D, E. Note that execution will proceed to each branch regardless of whether a step on a previously executed branch has reported an error. When an error is reported then an error report is generated. An error report contains a message briefly describing the error, and a location and location code for the step that reported the error. The location of the step that reported the error is the list of steps (including iteration numbers) one needs to execute in order to reach that step from the first step. Consider the robot below:
If step C reports an error on the second iteration of step B, then the location is written as: step A - step B[2] - step C. Note that the location contains the step names and iteration numbers, separated by hyphens. The location code is similar to the location, but the name of each step is replaced by a unique identifier for that step, thereby avoiding name clashes. For the location example above, the location code may be: <0>.<1>[2].<2>. You can use the location code in RoboMaker to go directly to the step that reported the error.
10
Getting Started
This chapter gets you started with RoboMaker. It introduces you to the RoboMaker user interface, including selected menus and functions, and to the RoboDebugger sub-application. Then, the core building blocks of any robot, namely the step actions and data converters, are described. Finally, there are sections on patterns and expressions. When reading this chapter, it is recommended that you startup RoboMaker and explore the RoboMaker user interface as you follow the tour. Note, however, that the tour explores the RoboMaker user interface as it appears at startup if you do not create or load a robot. This means that the user interface will be quite empty, and many functions, such as debugging, will not make much sense. Don't worry about this; there will be plenty of opportunities to see many of these functions in action when you start on the tutorials following this chapter.
GETTING STARTED
11
Robot View
State View
Step View
Objects View
At the top, you see the menu bar and the toolbar. Now, let us go through each of the views in the RoboMaker Main Window.
12
selected, you can apply actions to them. For example, you can insert a new connection by first selecting the step that the connection should start at, then icon in the toolbar. the step that it should end at, and finally clicking the You can also right-click on a step or connection to bring up a pop-up menu. To deselect the currently selected steps and connections without applying any action, click outside of the robot. Invalid steps are underlined in red, and if you move the mouse to an invalid step, an explanation of why the step is invalid is shown.
GETTING STARTED
13
Browser View
In the Tag Path View, you see the path from the root tag of the page to the selected tag. In the Browser View, you see the page as it appears in a browser. This view has two modes, selected by clicking the icons in the lower left of the Page View. In Normal Browser View mode ( ), you see the page exactly as it appears in a browser. In Boxed Browser View mode ( ), you see the page as it appears in a browser, but with colored boxes around specific tags, for example <table>- and <form>-tags. In the Source View, you see the HTML source of the page. The Source View has three modes that can be selected by clicking the icons to the right of the Browser View mode icons. In Normal Source View mode ( ), you see the plain HTML. In Colored Source View mode ( ), you see the HTML with color-highlighting. In JavaScript Source View mode ( ), you see the HTML with JavaScript highlighted. In all three modes, you can choose whether to show line numbers in JavaScript using the Line Numbers checkbox in the lower right corner of the Page View.
14
You can select a tag in the Page View by left-clicking in any of the views. The currently selected tag is shown with a green, dashed box in the Browser View and the Source View, and with a green background in the Tag Path View and the Tree View. You can hold down Alt while clicking inside the currently selected tag to move the selection one level out, i.e. select the tag that encloses the selected tag. You can also hold down Alt and Shift while clicking inside the selected tag. This will move the selection one level in towards the tag that you clicked. You can also change the current selection using the buttons to the right of the and icons move the selection one level out or in. Tag Path View. The icon selects the icon selects the root tag of the page, while the The innermost tag inside the selected tag. The and icons select the tag above or below the selected tag. You can also search for a tag by clicking the icon. The Page View also shows the tags found by the Tag Finders of the current step. These tags are called found tags and will be shown with a red, dashed box in the Browser View and the Source View, and with a red background in the Tag Path View and the Tree View. If you edit the Tag Finders, you can click icon to show the new tags found. You can also configure the Tag the icon, to use Finders to use only the currently selected tag by clicking the the currently selected tag as well as any other tags found by clicking the icon, or to not icon, to not use the currently selected tag by clicking the icon. use any tags at all by clicking the Furthermore, the Page View shows the current tags. Current tags are marker tags that are used as reference when finding other tags. Current tags can be set by step actions for example, some loop actions use a current tag to mark the result of the current iteration of the loop. You can also set current tags manually. Current tags are shown with a blue, dashed box in the Browser View and the Source View, and with a blue background in the Tag Path View and the Tree View. In all views of the Page View, you can right-click a tag to open a pop-up menu that allows you to configure the current step. This is very useful and will probably become your preferred way of configuring the current step. From the menu, you can choose Use only this Tag or Use this Tag to configure the Tag Finders to find the tag that you clicked. You can also choose an action such as Enter Text from the menu. This will configure the current step to use the corresponding action, in this case Enter Text, on the tag that you clicked. You can copy text from the Tag Path View or the Source View by holding down Ctrl while selecting the text with the mouse. You can also copy the HTML text icon. of the selected tag by clicking the In the Cookies tab, you see the Cookies View. Here, you see the cookies in the current robot state. Cookies are added to this list as the robot loads web pages that use cookies.
GETTING STARTED
15
16
When you have selected an action in the Step View, it will be displayed immediately below the action selection box. For a description of the actions available, see the Step Actions section below. In the Error Handling tab in the Step Window, you can see how the current step handles errors both its own errors and received errors. You can also select the branching mode for the step. See the How to Handle Errors chapter for more information.
GETTING STARTED
17
list of the particular objects. When you select an object in this list, that particular object is shown in the right part of the view. In the Input Objects tab, you find the list of the input objects of the robot. You can add, remove, or rearrange input objects by pressing the Add/Remove button. The view shows either the input values or the values at the step. The input values are used when writing and testing the robot. The input values can be edited and you can apply them by pressing the Apply button. When a robot is run on RoboServer, it will get the input values from the client. The values at the step are the values of the input objects at the current step, and these cannot be edited. In the Output Objects tab, you find the list of the output objects of the robot. You can add, remove, or rearrange output objects by pressing the Add/Remove button. The view shows either the initial values or the values at the step. The initial values are the values that the object attributes will have at the start of the robot, i.e. at the first step. The initial values can be edited and you can apply them by pressing the Apply button. The values at the step are the values of the output objects at the current step, and these cannot be edited.
18
Below the menu bar and toolbar is the Robot View, similar to the one in the RoboMaker Main Window. Note, however, that the Robot View in RoboDebugger has a current step only when you are actually debugging the robot. This current step is not the same as the current step in the Robot View in the RoboMaker Main Window. The current step in RoboDebugger is the step currently being debugged. Below the Robot View is a large panel divided into three sub-panels, the main panel and two panels named Summary and Stop When. In the main panel, you see the results of the debugging process divided into four tabs. In the Input/Output tab, you see the input objects, if any, and a list of all returned objects so far during the debugging process. In the Error Reports tab, you see a list of the error reports generated so far during the debugging process. In the Log tab, you can see whatever has been written to the log so far during the debugging process. (Some actions, particularly those that take a while to execute, such as the Loop Form action, write status information to this log.) Whenever the debugging process has been temporarily stopped, the State tab shows the robot state that is input to the current step. The State tab contains five sub-tabs. The Objects sub-tab shows the list of objects. The Windows, Cookies, and Authentications sub-tabs show the state, in much the same way as the State View in RoboMaker. The Error Report sub-tab contains the error report generated at
GETTING STARTED
19
the current step, if any. For all error reports, you can click the Goto button to go directly to the step that generated the error that is, the step that generated the error report will become the current step in RoboMaker. In the Summary panel, you see an overview of the number of returned objects and generated error reports so far during the debugging process. In the Stop When panel, you can specify the criteria for when the debugging process should temporarily stop (besides ending normally). For more on using RoboDebugger, see the How to Debug a Robot chapter.
20
Patterns
A pattern is a way of describing a text. For example, the text "32" can be described as a text containing two digits. However, other texts also contain two digits, e.g. "12" and "00". We say that these texts match the pattern. (RoboMaker patterns follow the Perl5 syntax.) A pattern is composed of normal characters and special symbols. Each special symbol carries its own special meaning. For example, the special symbol "." (dot) means any single character and matches all single characters, e.g. "a", "b", "1", "2", ... Figure 7 below provides an overview of the most commonly used special symbols. For a complete overview of all the special symbols available, see the RoboHelp entry on Patterns.
GETTING STARTED
21
Special symbol . \d \D \s \S \w \W
Meaning
Any single character, e.g. "a", "1", "/", "?", ".", etc. Any decimal digit, e.g. "0", "1", ..., "9". Any non-digit, i.e. same as ".", but excluding "0", "1", ..., "9". Any white space character, e.g. " " and line break. Any non-white space character, i.e. same as ".", but excluding white space (such as " " and line break). Any word (alphanumeric) character, e.g. "a", ..., "z", "A", ..., "Z", "0", ..., "9". Any non-word (alphanumeric) character, i.e. same as ".", but excluding "a", ..., "z", "A", ..., "Z", "0", ..., "9".
Figure 7: The Most Commonly Used Pattern Special Symbols Example: The pattern ".an" matches all texts of length three ending with "an", e.g. "can" and "man" but not "mcan". Example: The pattern "\d\d\s\d\d" matches all texts of length five starting with two digits followed by a white space and ending with two digits, e.g. "01 23" and "72 13" but not "01 2s".
If you want a special character, such as "." or "\", to act as a normal character, you can escape it by adding a "\" (backslash) in front of it. So, if you wish to match exactly the "." character, instead of any single character, you should write "\.".
Example: The pattern "m\.n\\o" only matches the text "m.n\o".
You can organize a pattern into subpatterns by the use of parentheses: "(" and ")".
Example: The pattern "abc" can be organized as "(a)(bc)".
22
Subpatterns are useful when applying pattern operators. Figure 8 below provides an overview of the pattern operators available.
Operator ? * + {m} {m,n} {m,} a|b Meaning
Matches the preceding subpattern, or the empty text. Matches any number of repetitions of the preceding subpattern, or the empty text. Matches one or more repetitions of the preceding subpattern. Matches exactly m repetitions of the preceding subpattern. Matches between m and n repetitions (inclusive) of the preceding subpattern. Matches m or more repetitions of the preceding subpattern. Matches whatever the expression a would match, or whatever the expression b would match.
Figure 8: The Pattern Operators
Example: ".*" matches any text, e.g. "RoboMaker", "1213" and "" (the empty text). Example: "(abc)*" matches any number of repetitions of the text "abc", e.g. "", "abc", "abcabc", and "abcabcabc", but not "abca". Example: "(\d\d){1,2}" matches either two or four digits, e.g. "12" and "6789", but not "123". Example: "(Robo)?Maker" matches "RoboMaker" and "Maker". Example: "(Robo)|(Maker)" matches "Robo" and "Maker".
As with other special characters, you can escape the special characters that appear in pattern operators by adding a \ backslash in front of the character. Subpatterns are useful when you want to extract specific text pieces from a text. When you make a subpattern using parentheses, you can extract the part of the text that is matched by that subpattern. For example, consider the pattern "abc (.*) def (.*) ghi". This pattern has two subpatterns that are made by means of parentheses. If the pattern is matched against the text "abc 123 def 456 ghi", the first of those subpatterns will match the text "123", and the second subpattern will match the text "456". In an expression (see the section named Expressions), you can refer to these subpattern matches by writing "$1" and "$2". For example, the expression "X" + $1 + "Y"+ $2 + "Z" will
GETTING STARTED
23
produce the result "X123Y456Z". This is a very important extraction technique in RoboMaker. By default, the repetition pattern operators (*, +, {...}) will match as many repetitions of the preceding pattern as possible. You can put a "?" after the operator to turn it into an operator that matches as few repetitions as possible. For example, consider the pattern ".*(\d\d\d).*". If the pattern is matched against the text "abc 123 def 456 ghi", the subpattern "(\d\d\d)" will match the second number in the text ("456"), since the first *-operator will match as many repetitions as possible. If you put a "?" after the *-operator, so that the pattern becomes ".*?(\d\d\d).*", the subpattern "(\d\d\d)" will match the first number in the text ("123"), since the *?-operator will match as few repetitions as possible. It is recommended that you experiment with patterns on your own. The best way to do this is to launch RoboMaker and find a place where you can enter a pattern, such as in the Test Tag action. Then, click the Edit... button to the right of the pattern field, to open the Pattern Editor Window, shown in Figure 9 below.
In the Pattern Editor Window, you can enter a pattern and test whether it matches the test input text in the Input panel. When you open the window, RoboMaker will usually have set the test input text to the text that the pattern will be matched against if the given step is executed on the current input robot state. However, you can also edit the test input text yourself, to try the
24
pattern on other inputs. To test the pattern, click the Test button. The result of the matching will then be shown in the Output panel. The Symbol button is very useful when you want to enter a special symbol in the pattern. When you click it, a pop-up menu will be shown, from which you can choose the symbol to insert in the pattern. This way, you dont have to memorize all the special symbols and their meanings. For more on patterns, consult the RoboHelp entry on patterns.
Expressions
An expression evaluates to a text.
Example: The expression "a" + "b" evaluates to the text "ab".
An expression is composed of one or more sub-expressions, each separated by a "+" (plus). A sub-expression evaluates to a text. An expression is evaluated by adding together (concatenating) the sub-expressions, one-byone from left to right. Figure 10 below provides an overview of the most commonly used subexpression types. For a complete overview of all sub-expression types available, see the RoboHelp entry on expressions.
SubExpression Type Text Constant Notation Meaning
Evaluates to the specified text, e.g. "Stephen King", or >>Stephen King<<. Evaluates to the value of the specified attribute, e.g. Book.author might evaluate to "Stephen King". Evaluates to the URL of the current page. Evaluates to the text matched by subpattern n in an associated pattern (if any). For example, this is used in the Advanced Extract data converter, as shown below. $0 evaluates to the text matched by the entire pattern. Evaluates the specified function by passing it the specified arguments and converting its result to a text.
Attribute Value
URL $n
Function
func(args)
GETTING STARTED
25
Figure 10: The Most Commonly Used Sub-Expression Types Example: The expression "The author of the book " + Book.title + " is " + Book.author + "." evaluates to the text "The author of the book Pet Semetary is Stephen King.", if the attributes title and author in the Book object contain the texts "Pet Semetary" and "Stephen King", respectively.
Note that you can specify a text constant using either the quote notation or the >>text<< notation, for example "Stephen King" or >>Stephen King<<. If you use the quote notation, and you want a quote character to appear inside the text, you have to write it as two quote characters. For example, write "This is some ""quoted"" text" to get the text "This is some "quoted" text". If you use the >>text<< notation, anything can appear inside the text, except ">>" and "<<". Thus, you can write quotes directly, as in >>This is some "quoted" text<<. The >>text<< notation is useful for long texts that contain many quote characters, such as HTML. Figure 11 shows the most commonly used functions in expressions. For a complete overview of all functions available, see the RoboHelp entry on expressions.
Function eval(arg) round(arg) Meaning
Evaluates to the numeric expression specified by the argument. Evaluates to the nearest integer of the specified argument number.
Figure 11: The Most Commonly Used Sub-Expression Functions Example: The expression "3+4 equals " + eval(3+4) + "." evaluates to the text "3+4 equals 7.".
It is recommended that you experiment with expressions on your own. The best way to experiment with expressions is to launch RoboMaker, select the Extract action for the current step, and then add an Advanced Extract data icon to configure the data converter. This opens the converter. Click the Advanced Extract Configuration Window shown in Figure 12 below.
26
In the example shown, note the use of the $n notation to extract parts of the input text. Try to type your own input text into the text area to the left of the Test button, your own pattern into the Pattern property, and your own expression into the Output Expression property. Then hit Test to view the text that the expression evaluates to in the text area to the right of the Test button. Also, try to click the Edit... button to the right of the expression field. This opens the Expression Editor Window shown in Figure 13 below.
GETTING STARTED
27
In the Expression Editor Window, you can enter an expression and test what it evaluates to. If the expression is associated with a pattern, as in the Advanced Extract data converter, the result of matching the pattern against the current input text will be shown in the Input panel. You can see whether the pattern matches, and if so, what subpattern matches your expression can refer to using the $n notation. Note that the testing functionality is not available everywhere in RoboMaker. Click the Expression button to open a useful pop-up menu, from which you can choose among the available sub-expression types and functions. This way, you dont have to memorize all of these. For more on expressions, consult the RoboHelp entry on expressions.
28
example one robot project for each application in your company that uses robots. A robot project is a folder located anywhere in the file system. The project folder can have any name you want, but must contain the following subfolder:
Library this folder is the robot library in the project
In the Library folder, you should place all robot files, domain model files (containing objects used by the robots), and other files used by the robots, such as files that are loaded from the robot library. You can organize the files in the Library folder in any way you want, using sub-folders as appropriate. Figure 14 shows an example of the contents of a project folder named NewsAndStocksProject for a project that develops a robot library for extracting news from news sites and stock quotes from stock sites.
NewsAndStockProject/ Library/ News/ CNN.robot Reuters.robot News.model Stocks/ Nasdaq.robot NYSE.robot Stocks.model Figure 14: An Example of a Project for News Extraction
As you can see, the project has a Library folder with robot and domain model files divided into News and Stocks sub-folders. In RoboMaker and the other development applications in Kapow Mashup Server, you are always working on a specific project, referred to as the current robot project. When you install Kapow Mashup Server, a default project will be created for you in the installation folder. To create a new project, you simply create a project folder located anywhere you want and containing a Library folder, as well as other sub-folders that you want. To switch to your new project, you must change the current robot project. This is done by opening the Settings application, specifying the path to your new project folder in the Current Project Folder property in the Project tab, and then clicking OK to close Settings. After this, you need to restart all applications that you have running to make them work on the new project. You can switch back and forth between projects any time you want, but remember to restart the applications each time. When you want to distribute and deploy your robot library in a runtime environment, such as RoboServer, you can pack the robot library into a single file called a robot library file. You can do this by choosing Create Robot Library File from the Tools menu in RoboMaker or ModelMaker. This will pack
GETTING STARTED
29
together all files contained in the robot library of the current robot project and save the result as a single file having a name that you choose. Note that you should save all your open files, such as robots and domain models, before doing this, to get the latest changes into the robot library file. You can then make the robot library file available to RoboServer and execute robots from the robot library. See the RoboServer Users Guide for more information on how to do this. As mentioned, a robot library may contain files used by the robots. You can load a page from a file in the robot library that the robot belongs to. This is done using the special non-standard protocol named library. For example, if the file MyPage.html is located in the folder MyFolder in the robot library folder, you can load from that file using this URL:
library:/MyFolder/MyPage.html
This will work no matter whether the robot library is represented as a folder or has been packed into a robot library file.
So, the typical robot starts with one or more steps, each containing a Click (or Load Page) action, in order navigate to the interesting content on some web
30
site. It proceeds with one or more steps, each containing an Extract action, and ends with a step having a Return Object action that returns the extracted object. Note that in many robots the navigation and extraction parts overlap because the content to extract is located on several pages. Again, this is similar to when you look for content yourself; often, you have to visit several pages to get the content you want. Most robots include other actions than the ones mentioned above, e.g. a For Each Tag action for loading several similar looking pages or extracting attributes from several similar looking table rows. Because robots have different tasks, they have different needs. For this reason, we have included a considerable number of step actions and data converters in RoboMaker. Start with familiarizing yourself with the basic and most commonly used step actions and data converters, and then begin to explore. Experience shows that one can create most robots using only a handful of step actions and data converters. So, find your own favorite step actions and data converters and stick to them until you feel a need to explore others. You are now ready for the first tutorial, where you will learn how to write your first robot.
TUTORIAL 1: ESSENTIALS
31
Tutorial 1: Essentials
In this tutorial, you will write your first robot. You will learn how to:
navigate using the Click action, loop through tags using the For Each Tag action, extract attribute values using the Extract action, test content using the Test Tag action, return objects using the Return Object action, debug your robot with RoboDebugger, and plenty of other things to get you started using RoboMaker!
The robot you will create in this tutorial will navigate to a page containing a table, extract the person data contained in that table, and output several PersonOutput objects. Before proceeding, we recommend that you open your favorite browser, and navigate to http://www.kapowtech.com/tutorial/case1/index.html to take a look at the pages involved in this tutorial. Let us begin by starting up RoboMaker and selecting Create a new robot..., which starts the New Robot Wizard as shown below.
32
This wizard will assist you in configuring the robot. Choose Data collection robot as the robot type and continue to the next step of the wizard. As the URL to start from, enter "www.kapowtech.com/tutorial/case1/index.html".
TUTORIAL 1: ESSENTIALS
33
In the next step of the wizard, add the object called PersonOutput. This object will be used to extract person data.
34
Click Finish to create the robot. The RoboMaker Main Window should now look like this:
As you can see, two steps have been inserted. The first step, called Load Page, loads the page using a Load Page action, and the second step, which is the current step, has not been configured yet. Now lets configure the second step of the robot, so make sure it is the current step. In the Browser View, we see the link Go to Table which leads to the page containing the table that we want to extract data from. To load this page, we choose Click as the action of the current step. In the Action tab in the Step View, select the Click action. (If you wish to read more about the Click action, click More.... This brings you to a help page in RoboHelp. All step actions and data converters in RoboMaker have such a help page.) To select the link to be clicked by the Click action, click the Go to Table link in the Browser View. This will select the <a>-tag that defines the link (in the icon to Tag Path View, you can see that the "a" is selected). Then, click the configure the tag finders to find only that tag. To load the page, click the icon. This causes two things to happen: First, it adds a new step after the current step. Then the new step becomes the current step. Changing the current step has some interesting effects: it always results in an update of the robot state shown in the State View, because the
TUTORIAL 1: ESSENTIALS
35
State View always shows the input state to the current step. The input state to the current step is always the output state of the previous step. To update the robot state in the State View, RoboMaker will execute as much of the robot as is needed to get the updated robot state. In our example, the output state of the previous step contains the loaded page. An alternative and easier way to load the page would have been to simply right-click the link Go to Table and select Click in the pop-up menu. This would configure the current step to load the page referred to by the selected link, using the Click action, insert a new step after the current step and go to the new step. The RoboMaker Main Window should now look like this:
You have now reached the page containing the content that we want to extract. Hence, the navigation part of the robot is over. However, before starting on the extraction part, let us try to change which step is the current step without adding a new one. You can make any step in the Robot View the current step by simply clicking it. Try clicking the first step in the Robot View. In the Step View, we can see that the Load Page action has been selected as action, and that the URL from which the action loads is the URL that we entered in the New Robot Wizard. Now try making the second step the current step. Note how the State View updates itself to appear exactly as it did when you finished configuring that step a moment ago. The changes of current step
36
went pretty fast, didnt they? The reason for this is that RoboMaker caches (stores) the output robot states from selected steps in order to minimize the waiting time when the current step changes. The idea of caching is not unique to RoboMaker; your own browser also caches loaded pages so that you can quickly step back and forth between them. Like in a normal browser, you sometimes want to refresh the cache. You can refresh the cache in RoboMaker by clicking the icon. Normally, however, it is not necessary to refresh the cache. Let us return to the extraction part. Before continuing, make sure that the last (third) step is the current step. Taking a look at the table on the web page, we discover that the table contains three columns (PersonId, Name, and Age) and four rows (not counting the headline row). Furthermore, the trained eye will discover an irregularity in the table: Bill has no age! (As you will discover when you begin to write your own robots, these kinds of irregularities are quite common on real-world web sites.) How do we deal with this irregularity? First, we need to decide whether we wish to extract a PersonOutput object when there is no age available for that person. This is an important question that you will probably encounter many times: How much information should, as a minimum, be available in a returned object? Fortunately, we can see the right answer to that question by looking at the Output Objects tab in the Objects View. As you can see, the object attributes personId and name have a small red dot next to them. This means that these two attributes are mandatory and must be given a value before the PersonOutput object can be returned by the Return Object action. The Age attribute has no red dot next to it. This means that the attribute is optional (i.e. not mandatory) and may be given a value before the PersonOutput object is returned. So, we should extract four PersonOutput objects from the table. How do we do this? There are several approaches, but let us settle on one that uses the For Each Tag action to loop through (i.e. do the same for) each row in the table. Select the For Each Tag action in the Action tab in the Step View. Input "tr" into the Tag property to tell the For Each Tag action that it should loop through the <tr>-tags (the table rows) contained in some tag. Next, type "1" in the First Tag Number property to skip the headline row. Finally, we need to identify which tag the For Each Tag action should look for the <tr>-tags in. Try clicking on the table in the Browser View. Then look at the Tag Path View icon to configure and select the innermost <tbody>-tag. Finally, click the the tag finders to find this <tbody>-tag. (Another, much simpler way to do all this would have been to right-click the table, and, in the pop-up menu, select Loops, then For Each Table Row, and finally Exclude First Row.)
TUTORIAL 1: ESSENTIALS
37
icon to add a new step and make it the current step. The input to Click the the current step is the output of the first iteration of the For Each Tag action (iteration 1). You can change the iteration number of the For Each Tag action icon (decrease iteration number by one) or the icon by clicking the (increase iteration number by one), or by directly typing the iteration number into the small text field in-between and hit return. You can also go to the first or icons, respectively. Try and change or last iteration by clicking the the iteration number to 3.
38
Note that the current tag is now the third row in the table. Let us extract the content of the current row. Right-click the PersonId "2" in the Browser View, select Extraction in the pop-up menu, then Extract Number, and finally PersonOutput.personId. Because we are extracting a number, the Extract Number Configuration window pops up. Select the Convert to Integer option, and click OK. The current step will now be configured to use an Extract action, with the Attribute property set to PersonOutput.personId, and an Extract Number data converter added to the list of data converters. Do (more or less) the same for the Name "Jim": Right-click "Jim", select Extraction, then Extract Text, and finally PersonOutput.name. Do (more or less) the same for the Age "72": Right-click "72", select Extraction, then Extract Number, and finally PersonOutput.age. Select the Convert to Integer option in the Extract Number Configuration window that pops up, and then click OK to close the window. You have now extracted a PersonOutput object! Let us return it by selecting the Return Object action for the step. Remember to select PersonOutput in the drop-down box for the Object property in the Return Object action.
TUTORIAL 1: ESSENTIALS
39
The robot now consists of seven steps: two steps concerned with navigation and five steps concerned with extraction. Let us have a closer look at how the objects change as the current step changes. As you can see in the Output Objects tab of the Objects View, you have extracted the personId, name, and age attributes of the PersonOutput object. Now, try to make the previous step (named "Extract Age") the current step by clicking on it. Note that this causes the value of the age attribute to become empty. The reason for this is that the Objects View shows the objects input to the current step when Show Values at Step is selected; and as the attribute value for age has not yet been extracted, it is empty. Try clicking the previous step (named "Extract Name") and note that the value of the name attribute becomes empty. Finally, if you make the step named "Extract Person Id" the current step, then the value of the personId attribute also becomes empty. Now, change the iteration number of the For Each Tag action to 1 by clicking icon twice (or by clicking the icon). Then change the current step the back to "Return Object" and note how the values of the attributes change to match those for the second row (containing info on "Bob") in the table even though you created the extraction steps for the third row (containing info on "Jim") in the table. This is because the branch beyond the For Each Tag step is
40
applied on all robot states outputted by the For Each Tag action. This is a general principle for all loop actions and it is highly useful when you need to do the same thing more than once. icon and notice how the PersonOutput object changes. Try clicking the Also, notice that there is also only one PersonOutput object at any time and not one per person data in the table; the same PersonOutput object is reused in different iterations. Keep clicking the icon until the following message appears:
This error occurs because there is no age in the table row for Bill. This causes the tag finders in the step named "Extract Age" to fail. When you click "OK", this step will be made the current step. How do we deal with this missing age attribute value? We will select this approach: Only extract an age attribute value if there is one. In other words, there are two cases: One in which there is an age value, and one in which there is not. We can represent these two cases by branching into two branches, each starting with a conditional step containing a Test Tag action. The first Test Tag action will only continue execution beyond the step if there is an age value, and the other Test Tag action only continues execution beyond the step if there is no age value. icon (and not the icon!) to insert a new step between the Click the "Extract Name" and "Extract Age" steps. Click "3" or "Bill" in the Browser icon to configure the View, select "tr[4]" in the Tag Path View and click the tag finders to use the <tr>-tag as input. Select the Test Tag action in the Action tab in the Step View. We want this Test Tag action to continue execution beyond this step only if the row contains an age value. We enter the pattern ".*\d+" (which matches all texts ending with one or more digits) into the Pattern property, select the Continue if Pattern Matches Found Tag action, and select Only Text in the Match Against property. The Only Text option is selected because we want the pattern to be matched against just the text contents of the found tag, without the tags.
TUTORIAL 1: ESSENTIALS
41
To verify that the Test Tag action works correctly, click the "Extract Age" step. This will cause the following message to appear:
The message says that the Test Tag action has stopped the execution. Click icon OK to dismiss the message. Change the iteration to 2 by clicking the twice. Then click the "Extract Age" step again. This time the Test Tag action will not stop the execution because the pattern matches the text that is, the row contains an age value. Now, let us create the branch for the case in which there is no age value. Make the "Extract Name" step the current step by clicking on it. Then, click icon to add a new branch to the "Extract Name" step. The new branch the contains a single step that becomes the current step. As before, click 1, Ted or 25 in the Browser View, select "tr[2]" in the Tag Path View and click
42
the icon to configure the tag finders. Select the Test Tag action in the Action tab in the Step View. This action should be configured so that it stops execution if the text contains an age value. Enter ".*\d+" in the Pattern property, set the Action property to Stop if Pattern Matches Found Tag, and select Only Text in the Match Against property. The RoboMaker Main Window should now look like this:
Now, let us create a connection to the "Return Object" step. To do this, place the mouse cursor just to the right of the current step until a white arrowhead appears, then drag this arrow to the "Return Object" step. An alternative way to connect the two steps is to hold down the Ctrl key, and then click first the current step and then the Return Object step, to select both steps. Then, right-click the Return Object step to bring up the popup-menu for the step, and choose Add Connection from the pop-up menu. (To remove a connection between two steps, either hold down Ctrl and click the connection, then click icon, or right-click the connection to bring up the pop-up menu and the select Delete from the menu.)
TUTORIAL 1: ESSENTIALS
43
Verify that the Test Tag action works correctly by changing the iteration using and icons and then click the "Return Object" step. You should only the be allowed to execute beyond the current step (containing a Test Tag action) when the iteration is 4. And this is exactly what we want. We have now achieved the desired behavior: The first (top-most) branch only allows iterations 1, 2, and 3 to continue (those with an age value). The second branch only allows iteration 4 to continue (which has no age value). Have you noticed that the connections between steps are sometimes black and sometimes dark gray? This brings up the concept of the execution path. The execution path includes all steps from the first step to an end step (an end step is a step with no step after it) such that it includes the current step. As you change the current step repeatedly between the two Test Tag steps, notice how the execution path changes. You can use the execution path to see which of several branches was taken to reach a step. For example, if you make the Return Object step the current step, then the execution path will tell you which of the two branches that was executed to reach that step. That's it! Congratulations, you have now created your first robot! Let us test the robot in RoboDebugger and verify that it extracts the PersonOutput objects we expect, namely four PersonOutput objects, one for icon to open each row in the table containing person data. Click the
44
RoboDebugger. Then click the icon in the RoboDebugger Main Window to start the debugging process. As the debugging process runs, objects are returned and displayed. When the debugging process completes, the RoboDebugger Main Window should look like this:
If your RoboDebugger returns the same objects as suggested by this screenshot, then your robot is working as expected. Return to the RoboMaker Main Window (by either closing the RoboDebugger Main Window or simply icon to save the switching to the RoboMaker Main Window) and press the robot for later use. It might seem that you had to do a lot of work to simply extract some person data from a table. Well, when you are trained in using RoboMaker, you can create a simple robot like the one in this tutorial in one or two minutes! Also, the robot is rather robust; for example, it will still work correctly if persons are added to, or removed from, the table, and if the age attribute value is missing for any person, not just "Bill". So what you have is a robot that can be reused as the table content grows or shrinks, and that can handle some table irregularities. And for many kinds of robot tasks, this flexibility is exactly what you want and need. For more on robot robustness, see the How to Make Robots More Robust chapter.
TUTORIAL 1: ESSENTIALS
45
Before you proceed to the next tutorial, we recommend that you read the RoboHelp online entries for the step actions you have used so far. Also, you might want to experiment with the robot you have created:
Try to modify the robot so that it only extracts PersonOutput objects for table rows that contain both a name and an age. (Hint: The solution involves removing the two Test Tag steps and the connection between the "Extract Name" and "Return Object" steps. To remove the connection, right-click it and select Delete from the pop-up menu. After this, a good idea would be to add a Test Tag step immediately after the For Each Tag step.) Try to modify the robot so that it only extracts a PersonOutput object from the first table row. (Hint: The solution involves removing the For Each Tag step.) Try changing the robot so that it loads the table page directly, without loading the http://www.kapowtech.com/tutorial/case1/index.html page first. (Hint: The solution involves changing the URL in the first "Load Page" step, and removing the "Click" step.) Try recreating the robot from scratch without referring to this tutorial. This time, try right-clicking in the Page View to insert the steps.
46
submits a form, uses input objects, and converts values using data converters.
The robot you create in this tutorial will navigate to a page containing a form, fill out the form with data from an input object, and submit it. The robot is representative for all tasks related to form submission, including logon, registration, and order submission. Before proceeding, we recommend that you open your favorite browser, and navigate to http://www.kapowtech.com/tutorial/case2/index.html to take a look at the pages involved in this tutorial.
47
Let us begin by starting up RoboMaker. If you have just completed the icon to start the New Robot previous tutorial, then you can simply click the Wizard. Choose Integration robot as the robot type and continue to the next step of the wizard. As the URL to start from, enter "www.kapowtech.com/tutorial/case2/index.html". In the next step of the wizard, add the object called PersonInput. This object will contain the data that we need to fill in the form.
48
Now go to the next step of the wizard. In this step, you should not add any objects. In practice, this is highly unusual, but in this tutorial we will not extract any objects. In the last step of the wizard, give the robot the id "2" and press Finish. The RoboMaker Main Window should now look like this:
Let us provide some test values to the PersonInput object. These values will only be used when we are developing and debugging the robot in RoboMaker. When the robot is run on RoboServer, the value of the PersonInput object will be provided by the client. However, while developing and testing the robot in RoboMaker, it is useful (and often necessary) to have some test values in the input objects. In the Objects View, you can see the PersonInput object in the Input Objects tab. Make sure that Edit Input Values is selected. Type "John" into the firstName attribute, "Doe" into the lastName attribute, select true for the isMale attribute, and select false for the isMarried attribute.
49
Press Apply to apply the input values. Always remember to do this after you have entered or edited values in the Objects View, otherwise your new values will be lost. Now, RoboMaker executes to the second step using the new input values. When writing a robot that uses input objects, you can easily test different combinations of input values this way. The robot should work correctly for all valid input objects (e.g. a PersonInput object where the value of the isMale attribute is false), so later in the tutorial well try out other test values of the PersonInput object. We will now load the page containing the form to submit. Right-click the "Go to Form" link, and select Click in the pop-up menu.
50
To submit a form, you first fill out the form and then click the submit button. In this tutorial, we need to put the values of the PersonInput object into the form. The text field for the first name should contain the value of the firstName attribute of the PersonInput object. Right-click on the "First Name" text field and select Enter Text from Attribute, and then select PersonInput.firstName.
51
Similarly, right-click the "Last Name" text field and select Enter Text from Attribute and then PersonInput.lastName.
52
Next, we handle the Male and Female radio buttons which should be set according to the value of the isMale attribute of the PersonInput object. Therefore, we test this attribute value. In the Action tab of the Step View, click Select an Action and in the Conditions category, select Test icon. Attributes. Add a condition by clicking the
53
Select PersonInput.isMale as Attribute and enter true in the Value text field.
54
In the Test Attributes action, we set the Action property to Continue if All Conditions are Satisfied (which is the default value). Now, after the Test Attributes step, we need a step that selects the Male checkbox (execution will only continue past the Test Attributes step if the value of the isMale attribute is true). First, insert a new step by clicking the icon. Then, right-click the Male radio button (not the text Male) and select Forms and then Select Radio Button. This selects the radio button and inserts a new step.
55
If the value of the isMale attribute is false, the Test Attributes step will stop the execution. There should be an alternative execution path for this case. Make the Enter Last Name step the current step by clicking it. Now, lets change the test values of the input object. In the Input Objects tab of the Objects View, make sure that Edit Input Values is selected, and change the value of the isMale attribute to false. It is not strictly necessary to change the other values, but let us change the firstName attribute to Joanna and the isMarried attribute to true. Press Apply to use the new input values. When RoboMaker has executed to the Enter Last Name step, add a new branch after this step by clicking the icon. The new branch contains a single step that becomes the current step. Select the Test Attributes action in the Action tab in the Step View. This action should be configured so that it stops execution if the value of the isMale attribute is true. As before, add a icon, select PersonInput.isMale as Attribute, condition by clicking the enter true in the Value text field and click OK to close the Attribute Condition Configuration window.
56
In the Test Attributes action, we set the Action property to Stop if All Conditions are Satisfied.
57
icon and right-click the Female radio Insert a new step by clicking the button (not the text Female), select Forms and then Select Radio Button. This selects the radio button and inserts a new step. We want to connect the two branches to get the same behavior regarding the Married checkbox, so we do not need this new step. Therefore, delete the last (unnamed) step of the second branch and create a connection between the Select Radio Button step of the second branch and the last (unnamed) step of the first branch.
58
Now, we need to set the Married checkbox. Go to the last (unnamed) step, right-click the Married checkbox (not the text Married) and select Forms and then Set Checkbox.
59
. Now we need to specify whether the checkbox should be checked or unchecked, and we want this to depend on the isMarried attribute. The Set Checkbox window allows us to do this using a value selector. The value selector is a component that is used in many different places in RoboMaker. It allows you to specify a value in several different ways, depending on your needs. The value selector has a drop-down box to the right where you can choose how to specify the value. The value can be specified as a fixed value, a value from an attribute, a value from an expression, or a value from a list of data converters. The date converters are useful when you want to convert the value before using it. In the Set Checkbox window, we must specify one of the values checked, true, 1, or on if we want the checkbox to be checked, and one of the values unchecked, false, 0, or off if we want it to be unchecked. In this case, we want to use the value of the isMarried attribute. This value needs no conversion, since it is true when we want the checkbox to be checked and false when we want it to be unchecked. If we needed to convert the value, we could specify the value using a list of data converters that could do the conversion.
60
So, in the right drop-down box, choose Attribute instead of Value. In the left drop-down box, choose the PersonInput.isMarried attribute. The Set Checkbox window should now look like this:
61
Click OK to close the Set Checkbox window. The RoboMaker Main Window should now look like this:
You have now filled out the form. Right-click the Send info button and select Click.
62
If you have done everything correctly, the RoboMaker Main Window should now look like this:
The tutorial ends here. Normally, after filling out and submitting a form, you would continue the navigation (perhaps by filling out and submitting more forms) to the page containing the information you want and then extract it. You might want to save your robot for future use as a template for how to fill out and submit forms in RoboMaker. We recommend that you consult the RoboHelp online entries on the step actions you have used in this tutorial. Also, you might want to experiment with the robot you have created:
Try changing the input values in the PersonInput object and submit the form again to verify that the robot fills out and submits the form correctly. You may need to shift between the two branches depending on the value of the isMale attribute. Remember to click Apply after changing the input values in the Objects View. Try to modify the robot so that it no longer accepts an input object, but instead uses fixed values when submitting the form. (Hint: The solution involves deleting the PersonInput object from the Input Objects in the Objects View.) Try recreating the robot from scratch without referring to this tutorial.
63
The available properties in the Robot Configuration Window depend on the robot type. In this section, we explain the properties for non-clipping robots. For information about the properties of clipping robots, see the chapter How to Clip, and the online documentation. In the Basic tab, you can configure default options which apply to all step actions of the robot. A step action can override these global options as needed. You can also enter a comment for the robot. This is useful if you want to document how the robot works, what should be taken into account when editing the robot, etc. In the Advanced tab, you can specify an optional robot id for the robot. If you do so, the id must be unique among all robots in the robot library. If you use RoboManager, then you should use that application to keep track of robot ids. You can register your robot in RoboManager and get an id for it by clicking Register.... If you do not use RoboManager, you will have to keep track of the robot ids yourself. A robot must have a robot id if you use the Database
64
Storage Environment or the Database Message Environment. See the RoboRunner Users Guide for more information about environments. Also in the Advanced tab, you can specify an optional proxy server to use for all page and data loading done by this particular robot. You should use this property only rarely. Normally, it is better to specify one or more proxy servers for the entire installation. This is most easily done in the Settings application. See the Installation Guide for further details on this. The proxy server specified for a particular robot will override proxy servers specified any other way.
65
The Input Objects tab shows the input objects that must be inputted to the robot when it is run on RoboServer. If these input objects are not inputted to the robot at runtime, then the robot run will fail. In the bottom of the view, you can select how input objects should be shown. If Edit Input Values is selected, the input values of the input objects are
66
shown, and these can be edited. The input values can be applied by pressing Apply. Note that these values are only used when you are working with the robot in RoboMaker. When the robot is run on RoboServer, then the input values will be overridden (i.e. replaced) by the values of the input objects. If Show Values at Step is selected, the values of the input objects at the current step are shown, and these cannot be edited. The Output Objects tab looks like this with a single output object added:
The Output Objects tab shows the objects that can be returned by the robot. This is an example of how the view looks. For each object, you can configure and apply the initial values for the object attributes when Edit Initial Values is selected. If Show Values at Step is selected, the values of the output objects at the current step are shown, and these cannot be edited.
67
68
You can use indexes to refer to specific tags among tags of the same type at that level. Consider this tag path: html.body.div[1].a[0] This tag path refers to the first <a>-tag in the second <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the Link 4 <a>-tag. Note that indexes in tag paths start from 0. If no index is specified for a given tag on a tag path, the path matches any tag of that type at that level, as we saw in the first tag path above. If the index is negative, the matching tags are counted backwards, i.e. starting with the last matching tag which corresponds to index -1. Consider this tag path: html.body.div[-1].a[-2] This tag path refers to the second-to-last <a>-tag in the last <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the Link 5 <a>-tag. You can use an asterisk (*) to mean any number of tags of any type. For example, the tag path html.*.table.*.a
69
refers to an <a>-tag located anywhere inside a <table>-tag, which itself can be located anywhere inside an <html>-tag. There is an implicit asterisk in front of any tag path, so you can simply write "table" instead of "*.table" to refer to any table tag on the page. The only exception is tag paths starting with a punctuation mark (.), which means that there is no implicit asterisk in front of the tag path, so the tag path must match from the first (i.e. top-level) tag of the page. With asterisks, you can create tag paths that are more robust against changes in the page, since you can leave out insignificant tags that are liable to change over time, such as layout related tags. However, using asterisks also increases the risk of accidentally locating the wrong tag. You can provide a list of possible tags by separating them with '|', as in this tag path: html.*.p|div|td.a This tag path refers to an <a>-tag inside a <p>-, <div>-, or <td>-tag located anywhere inside an <html>-tag. In a tag path, text on a page is referred to just as any other tag, using the keyword "text". Although text is not technically a tag, it is treated and viewed as such in a tag path. For example, consider this HTML:
<html> <body> <a href="url...">Link 1</a> <a href="url...">Link 2</a> </body> </html>
The tag path "html.body.a[1].text" would refer to the text "Link 2".
Equals Text specifies that the attribute value must match a specified text. Note that the text must match the entire attribute value.
70
Containing Text specifies that the attribute value must contain the specified text. Pattern specifies that the attribute value must match a pattern. Note that the pattern must match the entire attribute value.
Tag Pattern: In this property, you can specify a pattern that the HTML of the tag must match (including all tags inside it), for example ".*<b>.*Stock Quotes.*</b>.*". Note that the pattern must match the entire HTML of the tag. Tag Depth: This property determines which tag to use if matching tags are contained inside each other. The default value is Any Depth which accepts all matching tags. If you select Outermost Tag, only the outermost tags are accepted, and similarly, if you select Innermost Tag, only the innermost tags are accepted. Tag Number: This property determines which tag to use if more than one tag match the tag path and the other criteria. You specify the number of the tag to use, either counting forwards from the first tag or counting backwards from the last tag that matches.
For example, if you set the tag path to "table", the Tag Attribute property to "align=center", and the Tag Pattern property to ".*Business News.*", then the Tag Finder would locate the first <table>-tag that is center aligned and that contains the text "Business News".
71
and to submit the form, you can use the Click action. You can also loop through options or radio buttons by using the following actions:
For Each Option For Each Radio Button
Form Basics
This section describes some basic properties of forms. Consider the following example of a book search form, shown in Figure 19 as HTML, and in Figure 20 as it appears in a browser.
72
<html> <body> <form action="http://www.books.com/search.asp" method="get"> Author: <input type="text" name="book_author"> <p> Title: <input type="text" name="book_title"> <p> Language: <select name="book_language"> <option value="lang_0" selected>English</option> <option value="lang_1">French</option> <option value="lang_2">German</option> <option value="lang_3">Spanish</option> </select> <p> Format: <input type="checkbox" name="book_format" value="format_pb">Paperback <input type="checkbox" name="book_format" value="format_hc">Hardcover <input type="checkbox" name="book_format" value="format_ab">Audiobook <p> Reader Age: <input type="radio" name="reader_age" value="age_inf">Infant <input type="radio" name="reader_age" value="age_teen">Teenager <input type="radio" name="reader_age" value="age_adult" checked>Adult <p> <input type="submit" value="Search"> </form> </body> </html>
73
A form contains a number of fields. For example, the first <input>-tag in the example form defines a field named book_author. Note that the name of a field is usually different from what the user sees in a browser. For example, the book_author field will appear to be named Author in the browser, not book_author. A field can be defined by more than one tag. For example, the book_format field is defined by three <input>-tags in the example form. Tags that use the same field name and are of the same field type (text field, radio button, checkbox, etc.) define the same field. A field can be assigned one or more values. For example, the book_format field can be assigned the value format_pb to select paperback format. Note that, like the field name, the value that is assigned to a field is usually different from what the user sees in a browser. For example, the user will see the text Paperback, not the value format_pb, when choosing the paperback format. Depending on the field type, some fields can be assigned more than one value at the same time. For example, since book_format is a checkbox field, we could assign both the value format_pb and the value format_hc to the book_format field to select both the paperback format and the hardcover format. Most fields have a default value. The default value is the value that is initially assigned to the field in the form. For example, the book_language field has the default value lang_0, because of the selected attribute. A form is submitted by sending the current values of the fields to the web site. Only fields that have one or more current values are sent. For example, if
74
none of the checkboxes of the book_format field in the example form are checked, no value is sent for that field. In a browser, the submission of a form usually happens when the user clicks a submit button. There are two kinds of submit buttons: normal submit buttons and image submit buttons. Normal submit buttons are defined using a <button>-tag or an <input>-tag, in both cases with the type attribute set to submit. If a normal submit button has a field name and value, that field will be sent with the specified value when the button is clicked. Image submit buttons are defined using an <input>-tag with the type attribute set to image. An image submit button defines two fields, named button name.x and button name.y, where button name is the name contained in the name attribute of the <input>-tag. If the <input>-tag has no name attribute, the fields will be named x and y. When an image submit button is clicked, these two fields are assigned the x- and ycoordinates of the position in the image where the mouse was clicked. Some web sites use this for creating image maps with different behavior depending on where the user clicks. Some forms use JavaScript. For example, the <form>-tag may have an onsubmit attribute that contains JavaScript to be executed before the form is submitted. Similarly, an <input>-tag may have an onclick attribute that contains JavaScript to be executed when the user clicks on the field. Most forms use JavaScript to simply validate that the user has filled out the form correctly. In this case, you can simply ignore the JavaScript when submitting the form. However, some advanced forms use JavaScript to change the form dynamically as the user enters values into it, or to change the form before it is submitted. RoboMaker can handle these situations, with help from the Execute JavaScript action.
75
example form. If you want to search for books in all available languages and for all reader ages, you cannot do this in a single form submission, because the site will not allow such a general search. Instead, you have to loop through the languages and the reader ages, and make a form submission for each combination of language and age.
The Submit Form action requires the <form>-tag as the found tag. The basic principle of the Submit Form action is that you specify the values of all fields that you want to set to something other than their default values. You also choose which submit button to use. As the default, the Submit Form action performs the entire operation of submitting the form and loading the resulting page. However, you can also configure the Submit Form action to not submit the form, and instead either generate an <a>-tag containing the URL that represents the form submission, or to change the current values in the form.
76
If you want to set the values of one or more fields in the form to a value other than their default values, you have to look at the HTML of the form to find the names of the fields and the values to assign to them. For each field that you want to set, add a field value assignment in the Field Values property of the Submit Form action. A field value assignment assigns a specified value to a selected field.
The value to assign in a field value assignment can be specified in several different ways depending on your needs, using a value selector. Use the drop-down menu on the right side of the value selector to select one of the following ways to specify the value (not all of these ways are available everywhere where a value selector occurs):
Value: Here, you enter the value directly as text or select a fixed value. This is useful if you want to specify a fixed value, without any computations or conversions. Attribute: Here, you select the value of an attribute in an object. For example, this is useful in a robot taking input objects if you have the value for the field ready in an object attribute and do not need to convert it in any way. Expression: Here, you enter an expression. This is useful if you want to make simple computations to get the value.
77
Converters: Here, you select a list of data converters, whose output is used as the value. The first data converter is given an empty text as input. This way of specifying the value provides the greatest flexibility, since you can make almost any kind of conversion or computation to get the value. For example, in a robot taking input objects, this is useful if you have the value in an object attribute, but need to convert it to the values used in the form.
You can assign multiple values to the same field by adding more than one field value assignment for that field. After adding and configuring the field value assignments, you should select the submit button to be used. Normally, you can use the Submit Button property of Submit Form action to do this. You can choose Default Submit Button to use the default button, which is the first submit button in the form. You can also choose a specific button in the form. The available buttons are shown together with the field names that correspond to the buttons, and with the values that will be assigned to the fields when the buttons are used.
78
The Loop Form action requires the <form>-tag as the found tag. To configure the Loop Form action, you must provide information about all fields that should be assigned values other than their default values. For each field, the Loop Form action needs to know whether to loop through the field, how to loop through the field, and which values to assign to the field. The Loop Form action also needs to know which submit button to use. Like the Submit Form action, the default behavior of the Loop Form action is to perform the entire operation of submitting the form and loading the resulting page, for each iteration. But you can perform different actions, just as with the Submit Form action, by either generating <a>-tags containing the URLs that represent each form submission, or changing the current values in the form in each iteration. Now, consider the book search example form from above. Assume that we want to search for all hardcover books by the author John Doe, in all
79
languages, and for all reader ages. Since the form does not allow us to choose all languages or all reader ages at the same time, we need to loop through the form using the Loop Form action. What we want is to set the book_author field to John Doe, the book_format field to format_hc, and then make a form submission for all possible combinations of values that can be assigned to the book_language field and the reader_age field. To do this, we must create a number of field groups. A field group is a group of one or more fields that must be looped through together. To create a field group, you first select the type of field group that you want. The type determines the number of fields in the group and how the fields are looped through. The two most common field group types are the following:
One field with one value: This is a field group containing one field which is assigned one value. Use this field group type if you want to assign a specific value to a field, without any looping through the field. A field group of this type is similar to a field value assignment in the Submit Form action. One field with values to loop through: This is a field group containing one field with a list of values to loop through. Use this field group type when you want to loop through a list of values, assigning one value at a time to the field.
In most cases, these two field group types are sufficient. The two other available field group types, Multiple fields to loop through and Two fields that define a range, are only needed in rare cases. For more information on these field groups, look in the RoboHelp entry on the Loop Form action. In our example, we would create the following four field groups:
A field group containing the book_author field. This field should not be looped through, but simply assigned the value John Doe, so we use the One field with one value field group type. A field group containing the format_hc field. This field should not be looped through either, but assigned the value format_hc, so, again, we use the One field with one value field group type. A field group containing the book_language field. This field should be looped through, using the available values in the form, i.e. lang_0, lang_1, etc. Therefore, we use the One field with values to loop through field group type. A field group containing the reader_age field. This field should also be looped through, using the values from the form, i.e. age_inf, age_teen, and age_adult. So, again, we use the One field with values to loop through field group type.
80
Now, let us look at how to configure each field group. The One field with one value field groups are configured in exactly the same way as a field value assignment in the Submit Form action.
The One field with values to loop through field groups are configured by first choosing the field to loop through, and then specifying the values to loop through.
81
The values are specified by choosing and configuring a value list. There are four types of value lists:
List of values: This is a fixed list of values that you specify yourself. Values from form: This list contains the available values for a selected field, as they appear in the form. Number range: This list contains a range of numbers. Values depending on other fields value: This list is similar to the List of values type, except that the values to use can depend on the current value of another field.
The Values from form type is the typical choice when you want to loop through the available values of a field. This value list has the advantage that it will adapt to changes in the available values in the form, such as if a language was added to the list of available languages in the book_language field. In our example, we would use this value list in the field groups for the book_language and reader_age fields. For more information about each type of value list, see the RoboHelp entry for that list. Remember that you only have to create field groups for the fields that you want to assign other values than their default values. Every field that is not included in a field group will be assigned its default value from the form, in every iteration. If you want to assign more than one value to a field, you can create multiple field groups of type One field with one value containing that field. However, it is currently not possible to assign more than one value to a field when looping through it using the One field with values to loop through field group type. When the Loop Form action loops through its field groups, it will make an iteration for each possible combination of field group iterations. So, in our example, it will make an iteration for each possible combination of language and reader age. After adding and configuring the field groups, you should select the submit button to use. This is done in the same way as in the Submit Form action. Like in the Submit Form action, you can select No Submit Button to control the assignments to the submit button fields yourself. For example, you may want to set loop through multiple submit buttons. Some web sites have an upper limit on the number of objects that they will show as the result of a form submission. For example, a book site may not show more than the first 200 matching books. If you want all matching books, you can use the Loop Form action in a special optimization mode. In this mode, the Loop Form action can optimize the looping through the form such that all objects are obtained, but without exceeding the maximum number of objects in each form submission. See the RoboHelp entry on the Loop Form action for more on this.
82
Uploading Files
Some forms contain file fields that allow you to upload files. A file field is defined by an <input>-tag of type file, such as the following: <INPUT type="file" name="attachedFile"> In the Select File action, there are two ways to upload a file using a file field like this: The first way is to upload a file from the file system. To do this, select File in Local File System from the drop-down box and enter the file name. When the form is submitted, the specified file will be loaded from the file system and uploaded as part of the form submission. Note that the file name must be an absolute file name, including the drive name, if any, and the directory path to the file. The second and most common way to upload a file is to specify the file contents to upload, instead of loading the file from the file system. To do this, select File Contained in Attribute from the drop-down box. Then, you may select the attribute that holds the file contents from the drop-down box named File Content. Typically, you will get the contents from either a binary attribute in which you have downloaded the file earlier using the Load Data action, or from an attribute containing text that you have extracted earlier. Optionally, you can specify the content type and the file name of the file. The content type should be the MIME type of the contents, optionally followed a charset. You may use one of the predefined content types, acquire it from an attribute or specify a custom content type. For example, the content type could look like this for an image: image/gif and like this for a plain text: text/plain; charset=iso-8859-1 Note that when downloading files using Load Data, you can store the content type and file name of the downloaded data as part of the download. You can then use this content type and file name when uploading the file with the Select File action.
83
84
1234
Here, the first page contains direct links to all other pages. That is, you can get to any page directly from the first page by following the corresponding link. The first page sometimes also contains a link to itself. Such pages can be looped through quite easily using a For Each Tag step, as shown in this excerpt from a robot:
Here, we are looping through the result pages from a search request, symbolized by the step named (Submit Form). The first result page can be processed directly, so there is a connection from the form submission step directly to the step that processes a page, symbolized by the step named (Process Page). The remaining pages are looped through by the For Each Tag action in the second branch from the form submission step. First, the Test Tag step checks that there is in fact more than one page. If so, we simply loop through the tags containing the links to the pages, load each page using a Click action, and then continue to the processing of the page. If the first page
85
has a link to itself, the For Each Tag action should be configured to skip this first link, so that the first page isnt processed twice.
Next
Next
Next
Here, each page simply links to the next page, typically with a link or form button named something like Next. To loop through such pages, use the Repeat action. The Repeat action will loop through the pages that are supplied to it by another action named Next. The principle is as follows: The Repeat action must be given the first page as input. It will then loop through the pages, and in each iteration it will output a page. In each iteration, we can process the current page, and we must also give Repeat action the next page, using the Next action. If we dont give the Repeat action a new page, it will not provide another iteration, i.e. the loop will end. This excerpt from a robot shows an example:
Here, like before, we are looping through the result pages from a search request, symbolized by the step named (Submit Form). The form submission step will output the first result page, which we give to the Repeat action. In the first branch from the Repeat action, we process the current page. In the second branch, we load the next page by clicking its link. The Next action will send the page back to the Repeat action, which will output it in its next iteration. When the last page is reached, the Click action will generate an error. Therefore, the Click step is configured to ignore errors and skip the rest of the branch. In the Click step, this is done in the Error Handling tab by setting the Own Errors property to Ignore and Skip Branch. Please see Handling a Steps Own Errors for more information on this.
86
An alternative way of handling the last page is shown in the robot excerpt below:
To detect when the last page has been reached, we use a Test Tag action in the second branch. The Test Tag action checks that the page contains a nextpage link, for example by looking for an <a>-tag containing the text Next. If the page contains such a link, we load this page and give this to the Next action. When the last page is reached, the Test Tag action will stop execution down the second branch, and no new page will be given to the Repeat action, causing the loop to end. Note that finding the link to the next page can be tricky. A common mistake is to find the previous-page link on some pages instead of the next-page link, because the layout of the pages changes slightly between the first page, the subsequent pages, and the last page. Another common mistake is to not detect the last page reliably. You may have to configure the tag finders of the steps carefully to make things work (see the chapter How to Use the Tag Finders). When you are working with a robot in RoboMaker, RoboMaker may not always be able to step correctly back and forth between iterations of a Repeat action. If you are not sure whether RoboMaker has got it right, click Refresh to update.
87
The Extract action is used to extract text content from the tag, optionally including the HTML tags. The Extract URL action is used to extract a URL from a tag attribute containing a URL, and making that URL absolute. The Extract Clip action is used to extract a stand-alone HTML clip from the tag, with support for preparing the HTML to appear on its own apart from its original HTML page. The Extract Tag Attribute action is used to extract the value of a tag attribute. The Load Data action is used to extract binary data such as images and PDF files, but it handles any kind of binary data.
Often you need to reformat (or normalize) the extracted content, and the Extract, Extract Clip, and Extract Tag Attribute actions allow you to do this by configuring a list of data converters.
88
Extracting Text
The Extract action is used for extracting text.
For short text, like a product name or a price, extract as Only Text. This will simply extract the text between the tags. If you want to extract a longer text with sections, headings etc. as plain text, but still want the text to appear close to how it appears in a browser, you should extract the text as Structured Text. If some sort of special markup is desired, e.g. brackets surrounding the headings, then Structured Text has rudimentary support for that. If the markup requirements cannot be fulfilled with Structured Text, then use Advanced Structured Text which allows you to set mappings from the HTML tags into your proprietary markup.
89
Extracting Clips
The Extract Clip action is used for extracting a stand-alone HTML clip from a page.
The Extract Clip action is useful when you want to extract parts of a page, or an entire page, and want to preserve the HTML formatting of what you extract. For example, you can use Extract Clip to extract web content with the original formatting preserved. Note that if you want to clip functionality, not just content, from a web site, you should create a clipping robot instead of using Extract Clip. See the How to Clip chapter for more on this. The Extract Clip action allows you to extract more than one tag from the same page and combine them into a single clip. You can choose between several ways of combining the individual clips. The Extract Clip action can also modify and adjust the clipped HTML in various ways that make it suitable for appearing on its own, separate from its original HTML page. This includes handling of layout, URLs, and JavaScript.
90
Only binary attribute types can be used to store the loaded data in. The binary attribute types are Binary, Image, PDF, and Session. They are all equivalent except that the Image, PDF, and Session types allow you to preview the data.
91
The principle is to configure the Pattern property to match the entire text, with the text to extract being matched by a subpattern, enclosed by parentheses. In this case, the pattern used is ".*by\s(.*)\.", which means that the text between by and the period will be matched by the subpattern. For more information on patterns, see the section Patterns in the chapter Getting Started.
Converting Content
Conversion is used whenever you want to normalize content, such as when one text should be replaced by another text. For example, you might want to normalize country codes to their natural language description, e.g. "US" should be normalized to "United States". For plain text conversions, you
92
should use the Convert Using List data converter. For conversions based on patterns or expressions, you should use the If Then data converter.
93
The first step contains a For Each Tag action that loops through the <tr>-tags in the <tbody>-tag of a <table>-tag. It is followed by several steps that each extract content from a cell (column-wise) in a table row.
Content Irregularities
Sometimes the content of cells in the same table column differs in format. For example, it might sometimes be empty, sometimes contain "Bob" (firstname) and sometimes "Bob Smith" (first name and last name). One way, and probably the simplest way, to deal with content irregularities is to use the If Then data converter in the step doing the extraction of some attribute value. You configure its If and Else If properties so that they match each format variation. The corresponding Then properties then extract the matching subpattern. However, for the "Bob Smith" case, which contains two attribute values (first name and last name), you need to create two steps: one that extracts the first name and one that extracts the last name. This is because the Extract action only allows you to extract one attribute value. Each of the two steps would then contain an Extract action with an If Then data converter so that the first step extracts the first name (if any), and the second step extracts the last name (if any).
Structure Irregularities
Sometimes the rows of a table vary in the number of cells they contain. A common way of dealing with such irregularities is to test the format of each table row. For example, you might want to consider only rows containing a certain number of cells, or only rows containing a specific text. To do this, you add branches after the For Each Tag step (that loops through the table rows), so that each branch starts with a conditional step (having a
94
Test Tag action) that accepts all rows matching (or not matching) some format (written as a pattern). The conditional step is then followed by one or more extraction steps that assume the format accepted by the conditional action. The robot would look something like this:
When this robot is run, each conditional step will be executed in turn for each table row. Normally, we only want execution to proceed past at most one of the conditional steps. You can ensure this by writing the patterns of each conditional action in such a way that no text will match more than one pattern. Note that the branches beyond the conditional steps need not be kept separate. If two or more branches share extraction steps, you might want to merge the branches after the steps that are different.
HOW TO CLIP
95
How to Clip
This chapter describes how to use Kapow Mashup Server for clipping from web sites. This allows you to reuse existing web sites for new purposes, such as in portals. This chapter explains the new clipping functionality that was introduced in Kapow Mashup Server 6.0. The clipping functionality from Kapow Mashup Server 5.5 is still available, but you will need to refer to the Kapow Mashup Server 5.5 documentation for information about this. Also, note that if you just want to extract stand-alone clips, without any functionality and without using a clipping robot, refer to the Extracting Clips section in the How to Extract Content chapter.
What is Clipping?
Clipping means reusing existing web sites in a new context, typically as a portlet in a portal. Clipping allows you to reuse existing web sites without affecting the web sites or changing a single line of code in them. You can clip entire web sites, or selected parts of the web sites, such as selected pages or parts of pages. As part of the clipping process, you can also modify the clipped pages to suit your needs. For example, you can change the layout and styling of the pages to match the look-and-feel of your portal. You can also remove or change parts of the pages, such as removing advanced functionality that you do not want to expose in your portal. As part of the clipping, you can also do automatic login and logout on the web site that you are clipping from. This can be done as part of a single-sign-on solution that covers your portal. This way, the portal user only needs to log into the portal itself, after which he will be logged in automatically to all applications that he accesses from the portal. You can also do other types of automatic navigation as part of the clipping, such as automatic navigation to the first page to be shown in the portlet, and automatic pre-filling of forms.
96
Figure 32 shows the basic setup when a clipping robot is deployed at runtime.
Portal Server RoboServer Clipping Robot Web Site Clipping Session
Browser
Clipping Portlet
The generated clipping portlet is deployed on a portal server. When the portal user accesses the clipping portlet for the first time from his browser, the clipping portlet starts a new clipping session. It does this by executing the clipping robot on a RoboServer with a Begin Session command as input to the robot. On the RoboServer, the clipping robot will create a new clipping session that resides on that RoboServer. The robot will then perform the navigation necessary to reach the first page that should be clipped, clip from that page, and return the clip to the portlet. The portlet will then show the clip to the user. The robot state, e.g. windows, pages, JavaScript, cookies, etc., that results from the navigation on the web site will be kept on RoboServer, as part of the clipping session. The clip shown by the portlet is a specially modified version of the original page. Besides the modifications and layout changes that the robot may perform on the clip, the clipped page has also been specially instrumented for the clipping process. All original JavaScript has been removed from the page, since that JavaScript would not execute correctly in the new portal context that the clip appears in. Instead, the clipped page has been instrumented with special event handlers that capture selected user interactions with the page. When the user interacts with the clip, e.g. clicks a button or presses a key, one of two things happen. For low-level user actions, such as pressing a key or moving the mouse, the user interaction is typically handled locally in the users browser, since this does not normally require any page loading, JavaScript execution, or similar. For high-level user actions, such as clicking a button or submitting a form, the interaction is typically captured by a event handler and handled by the portlet. The portlet will handle such an interaction by executing the robot with a command that specifies the type of user interaction and relevant additional information such as which button was clicked or which form was submitted. The robot will then perform the same interaction on the web site that is being clipped, using the robot state stored in the clipping session. If the interaction triggers JavaScript, that JavaScript will be executed by the robot in its original context of the web site. This way, the actual interaction with the web site is
HOW TO CLIP
97
done by the robot, in the original context of the web site, using the full state necessary to do the interaction correctly. Thus, the actual interaction with the web site is unaffected by the clipping and the changes done to the clipped pages to make them suitable for view in the portlet. When the user logs out of the portal, or his portal session times out, the clipping portlet will end the clipping session by executing the robot with an End Session command. This will cause the robot to do whatever is necessary to end the interaction with the web site, such as logging out from the web site, and then end that particular clipping session on the RoboServer. The communication between the portlet and the clipping robot is done using Kapow Mashup Server objects, as for any other robot. The input to the clipping robot, which includes the command to perform, is represented by a ClipRequest input object. The output from the robot, which includes the resulting clip, is represented by a ClipResponse output object.
The robot starts with a step that uses the Begin Clip action. This step performs the command that the clipping portlet has requested:
For the Begin Session command, the Begin Clip step creates a new clipping session and loads the first page from the web site to clip from. For a user interaction, the Begin Clip step performs the requested user action on the current robot state in the clipping session, e.g. clicks a button or submits a form. For the End Session command, the Begin Clip step prepares for ending the clipping session.
After the Begin Clip step follows a default clip branch. The default clip branch handles the clipping from all pages that are not handled by any other clip branch in the robot. In this simple case, there is no other clip branches, so the default clip branch will handle the clipping from all pages. The default clip branch starts with a step named Default?, with a Test Default Clip action. The Test Default Clip action serves only to identify this branch as the default clip branch, and always lets execution continue along the branch.
98
After the Test Default Clip step is a step with a Clip action. This step performs the actual clipping and stores the resulting clip in the ClipResponse output object. The Clip step also stores the current robot state in the clipping session, for use in the next execution of the robot. The default clip branch ends with a step with an End Clip action. This step returns the ClipResponse object to the portlet and stops the robot execution.
There are two clip branches besides the default clip branch. The non-default clip branches have the same structure as the default clip branch, except that the first step in the clip branch has a Test Clip action instead of a Test Default Clip action. The Test Clip action checks whether this clip branch should handle the clipping of the current page (or pages). The clip branches will be tried in turn until the Test Clip step of a clip branch accepts the current page, in which case that clip branch is used. If none of the non-default clip branches match, the default clip branch is used. The Test Clip action is configured by specifying one or more clip conditions that the current windows must match or not match. A clip condition can check the URLs of the current windows, the contents of the pages in the windows, etc. This way, you can specify exactly which pages this particular clip branch should handle. When you have multiple clip branches, the default clip branch is optional.
HOW TO CLIP
99
After the Begin Clip step, the robot has three branches that are executed depending on the current command to the robot. Each branch starts with a step with a Test Clip Command step action that determines whether the branch should be executed for the current command:
If the command is Begin Session, the first branch is executed. If the command is a normal clip command, i.e. one that represents a user action, the second branch is executed. If the command is End Session, the third branch is executed.
The first branch performs the automatic login, in this case by entering the username and password in a login form and submitting the form. This branch then joins with the second branch. They join in the step named Logged In, whose only purpose is to serve as a joining point. After this join step, one or more clip branches follow, as in the simpler clipping robots described earlier. Thus, the clip branches are executed both for the Begin Session command and for the normal clip commands. The third branch from the Begin Clip step performs the logout when the session has ended. This is typically done by clicking on a logout button. Note that the steps named Begin Login, End Login, Begin Logout, and End Logout are just Do Nothing steps that serve only to mark the start and end of the login/logout sequences, and to make it easier to edit the sequences.
100
HOW TO CLIP
101
On the second page, enter the URL that the robot should start from. This is the first page to clip from, or the page to start automatic navigation from. In this example, we enter www.google.com:
The next pages in the wizard help you to configure the robot for automatic login, if that is needed. In this example, we just click Finish after entering the URL. This will create a simple clipping robot with a default clip branch. By default, this robot will clip all pages that are reachable from the first page (directly or indirectly). The pages will be clipped in their entirety, without any modifications. You can subsequently configure the robot to behave differently, as explained in the rest of this chapter. When the robot has been created, RoboMaker will open up the Portlet View and show the first clip created by the robot.
102
Thus, the Portlet View allows you to navigate through the clips, refine your robot, and check whether everything looks and works as you want it to.
You should use the Portlet View as your primary way of moving around in a clipping robot. In a clipping robot, you usually cannot click back and forth between branches as you would do in a normal robot. The reason is that the branches in a clipping robot depend on the current command to the robot, i.e. the current ClipRequest input object, as well as the robot state in the current clipping session. Therefore, if you click on a step in another branch than the current, RoboMaker will typically tell you that the step cannot be reached. Instead, you need to use the Portlet View to navigate to the clip for which that branch applies. You can then click back and forth between the steps within the branch, in the usual way. icon in You can go back to earlier clips in the Portlet View by clicking the icon. the toolbar of the Portlet View. You go forward again by clicking the This is similar to the back and forward buttons in a browser, but slightly
HOW TO CLIP
103
different, since you a moving back and forward between the clips made by the robot, not the loaded pages as in a browser. You can start a new clipping session in the Portlet View by clicking the icon in the toolbar of the Portlet View, or the icon in the Main Window of RoboMaker. This is useful when you want to go back to the first clip that will be shown in the portlet. It is also useful if you get stuck trying to click back icon to and forth between branches in the robot. In that case, click the start over, and navigate to the appropriate clip using the Portlet View. By default, the Portlet View opens and closes automatically. You can also open and close it yourself, by clicking the icon in the Main Window of RoboMaker. If you do not want the Portlet View to open and close automatically, uncheck the Open/Close Automatically checkbox in the toolbar of the Portlet View.
104
HOW TO CLIP
105
On the next page, enter the name of the new clip branch:
Use the name to distinguish the new clip branch from other clip branches, so that it is easy to remember which pages this clip branch handles. The Edit Clip Wizard will often suggest a name for you based on how you navigated to the page, and the page itself.
106
When you have entered a name, you can either click Finish to create the new clip branch now, or click Next to configure the clip condition for the new clip branch:
The clip condition will be used in the Test Clip step of the new clip branch to determine which pages the branch should handle. The Edit Clip Wizard will suggest a clip condition that will usually work well, but in some cases, you may need to adjust the suggestion or change the configuration entirely. You can also adjust the clip condition after creating the branch. When you have configured the clip condition, click Finish to create the new branch. When the new branch has been created, you will be placed at the Clip step of the new branch, so that you can configure how the clipping is done in the branch. See the section Modifying Clips later in this chapter for how to do this. icon in the Main When you have configured the new branch, click the Window of RoboMaker to see the resulting clip in the Portlet View. Instead of using the Edit Clip Wizard, you can also add a new clip branch by icon in the toolbar of the Portlet View. clicking the
HOW TO CLIP
107
Select Edit current clipping rules (or Edit default clipping rules if you are on the default clip branch), and then click Finish. You will then be placed at the Clip step of the current clip branch, allowing you to configure the clip branch. When you are done, click the the clip in the Portlet View. icon in the Main Window of RoboMaker to see
icon in the Portlet You can also edit the current clip branch by clicking the View, or the Main Window of RoboMaker. If you are on the default clip branch, you will be asked for confirmation that you want to edit the default clipping rules, since this will affect all pages clipped by the default clip branch. Note that you can also go to the Clip step yourself by clicking on the step in the Robot View.
108
HOW TO CLIP
109
On the next page, select the clip branch that you want to use instead:
110
Click Finish if you want to use the clip condition suggested by the Edit Clip Wizard, or click Next to configure the clip condition:
The clip condition will be used in the Test Clip steps of the affected clip branches, to ensure that the selected clip branch is the only one that matches the pages, besides the default clip branch. Click Finish when you are done. The Edit Clip Wizard will then reconfigure the clip branches accordingly, and show you the resulting clip, where the selected clip branch is used instead. Instead of using the Edit Clip Wizard, you can also click the Portlet View to clip using another branch. icon in the
HOW TO CLIP
111
At least one of the clip conditions in the first list must be satisfied, and none of the clip conditions in the second list may be satisfied. You can use the second list as a list of exceptions, when the clip conditions in the first list are broader than what you want to handle in the branch.
112
First, you select which window you want to check. Then, you select the condition that the window must satisfy. A number of conditions are available, such as conditions that check the URL of the window or the page contents of the window. If you select the Advanced condition, you can specify multiple conditions that must all be satisfied by the window.
HOW TO CLIP
113
In most cases, it is sufficient to check only one window. However, if you need to add conditions for other windows, you can do so in the Other Windows property:
Here, you can specify a list of additional conditions that must be satisfied. In each condition, you select a window and a condition, as in the clip condition itself. Using these additional conditions, you can create clip conditions that match only if a very specific set of windows is open, with specific URLs and page contents in each window. When you create a clip condition, it is often useful to enter a description for it. This makes it easier to distinguish the clip conditions, and remember what their purposes are.
Modifying Clips
By default, all pages are clipped in their entirety with the original sizes, layout, and styles preserved. In this section, we will explain how to modify the clips, such as clipping only parts of a page, changing layout and styles, and modifying the contents of the page. You modify clips by configuring the clip branch that handles the clips that you want to change. If you make the modifications in the default clip branch, the modifications will apply to all pages that are clipped by the default branch. If
114
you want to make modifications only on some pages, add one or more clip branches for these specific pages, and configure those clip branches to do the modifications. To modify the clip for the clip branch that you are currently on, use the Edit icon in the Portlet View, as explained earlier in the Clip Wizard, or click the section Editing a Clip Branch. When you are done with the modifications, icon in the Main Window of RoboMaker to see the resulting clip in click the the Portlet View. For changes in layout and styles, you can also specify default changes in the Robot Configuration Window. These changes will apply to all clip branches that have not been individually configured to use other changes than the default ones.
HOW TO CLIP
115
You can also clip a range of tags instead. To do this, choose Tag Range in the Clip From property of the Clip action. The Clip action will then clip all tags between the two found tags. Note that you can select the tags to clip only in the current window. If you have multiple windows open, the other windows will always be clipped in their entirety. See the section Working with Windows and Frames later in this chapter for more on working with multiple windows.
116
You can specify default layout changes that will be used by all clip branches by default, and you can specify individual layout changes in each clip branch. To icon to open the Robot specify the default layout changes, click the Configuration Window. Select the Layout Changes tab:
In the Original Layout property, you can specify what to do with the original layout and styles of the page. For example, you can specify that all layout and styles should be removed from the clips. This is useful if you want to completely restyle the clips without regard to the original layout. In the Sizing property, you can specify what to do with the original size specifications in the clip, e.g. widths of tables. For example, you can specify that all absolute sizes should be removed. This is useful if you want to adapt the overall size of the clip to fit into the portal.
HOW TO CLIP
117
In the Add new Style Sheet Link property, you can specify a style sheet link to be added to the clip. This is useful when you want to restyle the clips to use another style sheet, such as the standard style sheet in your portal. In the Layout Change Rules property, you can specify layout change rules that you want to apply to the clips. For example, this layout change rule changes all usages of the original style class headingb to the style class hd2, which could be a style class in the portal style sheet that you want to restyle the clip to use:
The default layout changes that you specify in the Robot Configuration Window will be used by all clip branches, except the clip branches that you explicitly configure to not use the default layout changes. To configure a clip branch to not use the default layout changes, go to the Clip step of that branch. In the Layout Changes tab, choose Specify, and then specify the layout change settings that you want to use for this particular clip branch.
118
This will insert a Hide Tag step before the Clip step. The Hide Tag step will use icon to styling to hide the tag, so that it is invisible to the user. Click the see the results. Note that we hide the button instead of actually removing it from the page. This is to avoid breaking the functionality of the page. If you remove tags from the page, or otherwise change it, you may break things like JavaScript that rely on the page having a particular structure and contents. Therefore, it is usually safer to just hide tags, using styles, instead of removing them. However, if you subsequently remove all styles from the clips as part of your layout changes, the hiding will be lost. In that case, you will need to remove the tags instead.
HOW TO CLIP
119
Another example of modifying the pages is to adjust the functionality of a form by modifying or inserting hidden <input>-tags. For example, on Google, you can insert a hidden <input>-tag in the search form to reduce the number of search results shown on each page in the search results:
When modifying the original pages like this, you need to take into account that the clip branch may be applied multiple times to the same pages. This will happen if the user can make interactions with the page that cause him to stay on that page without loading a new page. In this case, the clipping robot will be executed multiple times to clip from the same page. This will also happen in some cases where the portlet needs to obtain the current clip again, such as if the user moves to another page in the portal and back again to the page containing the portlet.
120
Because of this, if you are not careful, the clip branch will perform the same modifications multiple times on the same page. To avoid this, you may need to check whether the change has already been made on the page. For example, you may need to check whether your hidden <input>-tag has already been inserted on the page, and skip the inserting in that case. Here is how this would look for the example above:
Note that this is not an issue if you are just hiding tags, using Hide Tag, since Hide Tag does nothing if the tag is already hidden.
HOW TO CLIP
121
You can select any window or frame as the window/frame to show in the portlet itself. If you are clipping from a page with frames, you can select a specific frame to show, instead of the entire page with all frames. This way, you can exclude the rest of the page from the clip.
122
You can block (i.e. exclude) all popups, or block selected popups based on the window names, i.e. the names shown in the window tabs in the Page View. These default settings will be used in all clip branches where you have not configured the popup window handling individually. You can configure the popup window handling individually for a clip branch using the Popup Windows property in the Clip step of the branch.
HOW TO CLIP
123
On the first page of the wizard, select the type of login to perform. The most common type is form login, where the username and password are entered into a form on the login page, such as this one:
The other login type is HTTP login, where the username and password are entered into a special prompt window opened by the browser, such as this one:
124
When you have selected the login type in the wizard, click Next. On the next page, enter the username and password to use while editing the robot in RoboMaker:
The username and password that you enter are for development purposes only. For example, enter the username and password for a test account on the web site. When you deploy the robot, you can configure where the username and password should be obtained from at runtime. See the section Deploying a Clipping Robot later in this chapter for more on deploying a clipping robot. When you have entered the username and password, click Finish. If you selected the HTTP login type, no actual login sequence needs to be added to the robot. The Begin Clip step at the start of the robot will do the necessary login. If you selected the form login type, the wizard will add a login sequence to the robot. The sequence will be executed when a new clipping session starts. See the section The Structure of a Clipping Robot earlier in this chapter for more on the structure of a clipping robot. When the login sequence has been added, you will be placed at the location in the sequence where you should insert the login steps. The login steps should enter the username and password into the login form and submit the form. The username and password should be obtained from the ClipRequest.username and ClipRequest.password attributes. At runtime,
HOW TO CLIP
125
the clipping portlet will send the appropriate username and password to the robot in these attributes. An easy way to enter the username and password is to right-click on the appropriate fields in the form and choose Enter Username or Enter Password:
When you have inserted the steps, click the clip in the Portlet View.
icon in the Portlet View, You can verify the login sequence by clicking the to start a new clipping session. If you want to edit the login sequence, choose Edit Login Sequence from the Login menu in the Portlet View. This will place you at the start of the login sequence. You can remove a login sequence by choosing Remove Login Sequence in the Login menu. You can edit the test username and password used when developing the robot in RoboMaker, by choosing Edit Test Username and Password from the Login menu. Note that you can also define the automatic login in the New Robot Wizard when you create your new clipping robot, by clicking Next after entering the start URL of the robot.
126
To configure your robot to log out automatically, choose Add Logout Sequence in the Login menu in the Portlet View. This will open the Add Logout Sequence Wizard. Simply click Finish in the wizard. The wizard will then add a logout sequence to the robot, and place you at the location in the sequence where the logout steps should be inserted. The logout sequence will be executed whenever a clipping session ends, i.e. when an End Session command is sent to the robot. The logout sequence will be executed with the current robot state in the clipping session, i.e. the logout sequence will continue from the point that the user has currently navigated to. In the logout sequence, insert the steps necessary to log out. For example, insert steps to navigate to a page where a logout button is present, and a step to click on the button. Remember that the logout sequence must work no matter which page the user is currently on. When you have inserted the logout steps, you can test the logout sequence by icon to start a new clipping session, and then choosing first clicking the End Session in the View menu of the Portlet View to end the session. You can edit the logout sequence by choosing Edit Logout Sequence in the Login menu of the Portlet View, and you can remove it by choosing Remove Logout Sequence.
HOW TO CLIP
127
4. Open your clipping robot. 5. In the ClipRequest input object, click the Paste button in the cookies attribute, to paste the cookies of the copied session into the cookies attribute. 6. Click Apply to apply the changes. Now, the robot is configured to use the obtained cookie while running in RoboMaker. Note that you will have to repeat this process if the cookie that you obtained times out. If the single-sign-on solution uses HTTP headers, you probably need to ask your systems administrator to provide you with a valid header. You can then paste this header into the ClipRequest.headers attribute in the Objects View, to make the robot use the header while running in RoboMaker. Remember to click Apply in the Objects View after entering the header.
128
Other Topics
This section explains various other topics in relation to clipping.
In the Clipping Restrictions property, you can restrict the clipping by specifying which links can be followed. In the Excluded Links property, you can specify what should happen if the user tries to follow a link that has been excluded. Here is an example of a configuration:
Here, the clipping has been restricted to domains ending with app1.mycompany.com. Links to other domains will be disabled, and the user will see the message This link has been disabled if he tries to follow them. Instead of disabling the links, you can also specify that the links should be opened in another window and not be clipped. As an example, if you create a Google search portlet, you would probably configure all links away from google.com to open in a new window and not be clipped. Note that opening links in other windows without clipping works only if the links can be accessed directly without a session, i.e. are not protected by a firewall and do not require cookies, authentications, etc.
HOW TO CLIP
129
Browser
Clipping Portlet
Resources
This requires the resources to be directly accessible from the portal users browser. However, in some cases, the resources are protected from direct access. For example, the resources may be protected by a firewall between the user and the web site, or the resources may be dependent on the user session. You can solve this problem using resource clipping. Resource clipping means that the resource loading from the portal users browser is channeled through the clipping portlet and the clipping robot. This way, the resources will be loaded by the clipping robot itself, which does have access to the resources. Note that resource clipping is more performance expensive than loading the resources directly from the portal users browser, so you should only use resource clipping for protected resources. The default resource clipping settings can be found in the Resource Clipping tab of the Robot Configuration Window:
130
These default settings will be used by all clip branches that have not been configured to use individual resource clipping settings. To enable resource clipping for all resources, choose All. To enable resource clipping for selected resources only, choose Resources Matching these Rules, and adjust the default rules to cover the particular resources. To configure resource clipping individually for a specific clip branch, go to the Clip step of the branch. In the Resource Clipping tab, select Specify, and configure the resource clipping there.
These default settings will be used in all clip branches that have not been configured with individual settings. To configure a clip branch with individual settings, go to the Clip step of that branch. In the User Actions tab, choose Specify, and configure the settings there. By default, all high-level user actions, such as clicking and submitting forms, will be captured and forwarded to the robot. Low-level user actions, such as
HOW TO CLIP
131
moving the mouse or entering characters on the keyboard, will be handled locally in the portal users browser. These default settings reflect that low-level user actions typically occur in rapid succession and require quick feedback to the user, so it is usually not desirable to trigger robot executions for such actions. On the other hand, handling these actions locally in the users browser means that no page loading, JavaScript, etc. can be triggered by these actions, since this requires a robot execution. So, for example, if the user enters something in a text field, the text will be entered, but no JavaScript will be triggered for the individual key presses, even if there are JavaScript event handlers registered for the individual key presses in the text field. If you are clipping from a site where it is important to trigger JavaScript for specific low-level user actions, try enabling robot execution for these actions. For example, if you are clipping from a site that has JavaScript-based menus, and these menus do not work correctly with the default settings, try enabling robot execution for some of the mouse actions, such as the Move Mouse To and Move Mouse From actions. An alternative approach for such cases is to rewrite the JavaScript in the pages to not be dependent on low-level user actions. This can be done on-thefly as part of the clipping, without affecting the original web site, using the principles described in the section Modifying Clips earlier in this chapter. The Portlet View has a special view mode that is useful when you want to see which user actions will trigger a robot execution. To switch to this view mode, icon in the Portlet View toolbar. This will show green boxes around click the the elements on the page for which the user actions will trigger robot execution, and red boxes around the ones that will be handled locally in the icon. users browser. To switch back to the normal view mode, click the
132
The additional information is passed to the robot as name-value-pair properties in the ClipRequest.properties attribute:
When you generate the clipping portlet from the robot, you can configure where to obtain the properties from in the clipping portlet, such as the user preferences configured for the portlet. See the online documentation for the portlet generation wizards. In the clipping robot, you can retrieve the properties from the ClipRequest.properties attribute using the Get Property data converter. Here is an example of an Enter Text step that retrieves a property and enters the value into a text field:
HOW TO CLIP
133
When you are working with the robot in RoboMaker, you can enter test values for the properties in the ClipRequest.properties attribute in the Input Objects tab of the Objects View. Remember to click Apply after editing the properties.
Follow the instructions in the wizard. Refer to the online documentation for the wizards, or the Code Generation Guide, for help on this. Note that you can choose where the clipping portlet should obtain the robot library containing the clipping robot from. This is done in the Robot Library property of the Deployment page of the wizard. If you want the robot library to be included in the clipping portlet, so that the clipping portlet is selfcontained, select Embedded in Request. During development of the robot, it can be useful to select Default Robot Library instead, so that the default robot library is used instead. The default robot library is the library in the current project of the installation, i.e. the project that you are currently working on. With this selection, the clipping robot will be loaded from your current project before each robot execution. This means that any changes you make to the robot will take effect immediately in the portlet. Thus, you do not have to re-generate the portlet for changes in the robot to take effect. Note that this applies only if you are running against a RoboServer on your local machine.
134
In the Session Timeout property, you can specify the basic timeout of the clipping sessions created for this robot. When a clipping session has been inactive for the specified period of time (in minutes), the clipping session will be ended automatically by RoboServer. If the user interacts with the clipping portlet after this, a new clipping session will be created. In the Allow Session Termination after property, you can specify whether it is allowed for RoboServer to end a clipping session early when it needs to make space for a new clipping session. If you leave the property empty, RoboServer will never end a clipping session early, i.e. before the timeout specified in the Session Timeout property. If you specify a value in the property, this means that RoboServer is allowed to end a session early if the session has been inactive for at least the specified period, and RoboServer has reached its maximum allowed number of clipping sessions and needs to create a new clipping session.
135
136
The first option, Report Here, is the default one. It causes the error to be reported immediately, and the execution of the steps beyond the given step to be aborted. For example, consider this robot:
Generates Error
Assume that an error occurs in step B. Since it has the default Report Here option selected (as indicated by the absence of an icon in the step), an error report will be generated immediately, and steps C and D will not be executed. The error report will specify that it was generated at the location of step B, and it will contain a single error message describing the error and the location where it occurred (also at step B). The second option for the Own Errors property is Send Backwards. This option sends the error backwards to the preceding step in the robot, without executing the steps beyond the step that failed. What happens to the error in the preceding step depends on how that step has been configured to handle received errors. This will be explained in the next section. The third option for the Own Errors property is Ignore and Go to Next Step. This option causes the error to be ignored and the execution to proceed with the next steps after the one that failed. In other words, the step that failed is simply skipped. Take a look at the robot below:
Generates Error
Here, again, step B generates an error. However, the Ignore and Go to Next icon. This Step option has been selected for the step, as indicated by the causes the error to be ignored, and the execution to continue with steps C and D. These steps will both be given the same input robot state as was given to step B. The Ignore and Go to Next Step option is useful if you have a step that will succeed only in some cases, and which should simply be skipped in the cases where it fails. Note that the Ignore and Go to Next Step option is not allowed if the step has a loop action.
137
The fourth option for the Own Errors property is Ignore and Skip Branch. This option causes the error to be ignored and the execution of the steps beyond the given step to be aborted. In other words, the step that failed and all steps following it are simply skipped. Please see the chapter How to Loop Through Pages for an example of how to use this option.
Here, step D generates an error and has been configured to send its own errors backwards, as indicated by the icon. Therefore, the error is sent backwards to step C, without executing step E. In step C, the Received Errors property has been set to Send Backwards, as indicated by the icon. This means that received errors are simply sent further backwards to the preceding step, in this case step B. The Received Errors property of step B has been set to Report Here (as indicated by the absence of an icon). This means that received errors are reported at this point. The generated error report will specify that it was created at the location of step B, and will contain an error message specifying that the error was generated at the location of step D. After reporting the error, execution will proceed to the next branch or iteration that is to be executed, i.e. no execution of step B or the subsequent steps is done. In the example shown here, sending back the error serves no real purpose. It simply changes the location where the error is reported. However, sending back errors becomes useful if you combine it with the branching mode called Until Successful Branch.
138
In this robot, we have used the Until Successful Branch branching mode in step B. This is indicated by the dashed connections from the step. In this branching mode, the branches will be executed one at a time until one of them is successful. Successful means that the branch does not send any errors backwards. In this example, step D generates an error, and the steps have been configured to send this error backwards to step B. This means that the branch is considered to have failed, and the second branch is executed, according to the branching mode. In the second branch, step G generates an error, which is also sent backwards to step B. Since this branch was also unsuccessful, the third branch is executed. This branch sends no errors backwards, and is therefore considered successful. Because of the branching mode, no more branches are then executed, i.e. the fourth branch is not executed. If the third branch had sent back an error, too, the fourth branch would have been executed. If the fourth branch had also sent back an error, all branches would have been considered to have failed. In this case, the errors that would have been collected at step B would then have been handled according to the Received Errors property of step B. In this example, step B has been configured to report its received errors. So, an error report containing the four received errors would have been generated, and no more execution would have been done. If step B had instead been configured to send its received errors backwards, all four errors would have been sent backwards to step A. Thus, more than one error can be sent backwards at a time. As you may have guessed, the Until Successful Branch branching mode is useful if you want a robot to try more than one approach to achieving something. Add a branch for each approach. The robot will then try the approaches one at a time until one of them succeeds. If all approaches fail, you can report the errors from all approaches. You can then examine the error
139
report to figure out why none of the approaches worked. In some cases, you want to ignore the errors when all approaches fail. This can be achieved by adding an extra branch as shown below.
The extra branch has a single step containing the action named Do Nothing. As the name suggests, this action does nothing, so, in the example above, the extra branch will always execute successfully without sending back any errors. Therefore, the errors that may occur in the preceding branches will be discarded. Instead of the Do Nothing action, you can use other actions that do not generate errors. For example, you can use the Write Log action if you want an entry in the log in case all the preceding branches generated errors. Using the Until Successful Branch branching mode can be rather complex. In many cases, it is easier to use the default All Branches branching mode, and then put a step with a conditional action in front of every branch, to determine when that branch should be executed. However, the Until Successful Branch mode is useful in the cases where it is difficult or impossible to use a conditional action to determine when a particular branch should be executed.
In this case, we use the Until Successful Branch branching to try three different approaches to something. The interesting thing is that the three
140
branches join together at the end, to share the steps that they all have in common. You can join branches as much as you like, and you can even send back errors from the common steps if you want to. Another common example is shown below:
Here, we use a branch to jump past three steps if one of them fails. This is useful if you want to skip more than one step in the case of an error. If you want to skip just one step, use the Ignore and Go to Next Step option in Own Errors for that step.
141
The Convert Attributes action for converting attribute values extracted from a web site, or converting input object attribute values before being inserted into a form. The Test Attributes action for testing an attribute value, e.g. an input object attribute value, according to one or more conditions, such as "price < 5000".
142
The Get Attribute data converter for fetching an attribute value for subsequent processing or insertion into a form input field. The Convert Using List data converter for converting content. This data converter is useful for normalizing input object attribute values for insertion into a web site form, or normalizing content extracted from a web site.
143
For more information on RoboMaker techniques that can be used to make robots more robust, you should consult the following chapters: How to Extract Content, How to Extract Content From a Table, How to Handle Errors, and How to Use the Tag Finders.
144
The Save Session action, which saves a session in the session pool or an attribute. The Restore Session action, which restores a session from the session pool or an attribute.
In order to restore a session from the session pool, it is necessary to identify it. A session is identified by a site name, a username and a password. Let's look at the robot for a website that requires that you log in. We want to share the session of a logged-in user. The robot would look something like this:
When the robot is run, it will first ask for a session from the session pool, and if one exists with the given identification parameters, that session will be used. If no session with the given parameters exists, the step will fail, and the second branch will be executed, which does the logging in by actually going through the necessary web pages, and finally stores the obtained session so that other robot runs can make use of it. After a session has been obtained, a page should be loaded, and on that page some conditional action should be applied to see that the session is truly still active. The conditional action should be set to generate an error when stopping, and these errors should be sent back, so the second branch is used if a session obtained from the session pool has become inactive. The session pool optimization need not be used only for logins, but could also be used for long navigations that are necessary to obtain certain cookies, or other time-consuming tasks. A robot should normally never rely on a session being available, but always provide a fallback for obtaining a session. In RoboMaker it is important to understand a little about the inner workings of the session pool if you want to utilize it. This is because the execution of a
145
robot in RoboMaker is not controlled by the natural flow of a robot run, but by the user interaction. First a session should be stored by executing the step containing the Save Session action. Selecting the step following the Save Session step does this. After this the Restore Session action will be able to pick up the stored session. Saved sessions will remain in the session pool even icon, or after loading different if you refresh the cache by clicking the robots. There is currently no way to remove a session from the session pool besides restarting RoboMaker.
146
Basic Debugging
To open RoboDebugger, click the icon in RoboMaker. This opens the RoboDebugger Main Window, which is shown below. RoboDebugger always works on the current robot in RoboMaker. To start debugging the robot, click icon. the
As the robot is being executed in RoboDebugger, you can watch the current location in the Robot View of RoboDebugger. You can also watch the results of the execution in the main panel. In the Input/Output tab, the Input panel shows the input objects, if any, and the Output panel shows all objects that have been returned so far during the execution. If the robot has no input objects, the Input panel is not shown. In the Error Reports tab, you can
147
see all error reports that have been generated so far during the execution. In the Log tab, you can see what has been written to the log so far during execution. In the State tab, you can see the robot state, if any. Also, in the Summary panel to the right of the main panel, you can see a summary of the execution, containing the number of returned objects and the number of error reports generated. It is important to understand that RoboDebugger performs its own execution of the robot, independently of the execution done in RoboMaker. Therefore, RoboDebugger has its own current step and its own current robot state, independent of the current step and current robot state in RoboMaker. In RoboDebugger, the current step is the step that is about to be executed, or is being executed, in the debugging process, and the current robot state is the input to that step. icon. You can also You can stop the debugging at any time by clicking the make the debugging stop when certain events occur. This is done in the Stop When panel. Here, you can choose whether the debugging should stop when objects are returned, when errors are reported, and when breakpoints (see below) are reached. Of course, debugging will always stop when the execution of the robot has completed. When debugging has stopped, you can see the reason for the stop in the status bar at the bottom of the RoboDebugger window. If the debugging has stopped before the execution of the robot is complete, you can watch the current robot state in the State tab. The Objects, Windows, Cookies, and Authentications sub-tabs show the robot state in the same way as in the State View in RoboMaker. The Global Variables sub-tab shows the global variables, if any. The Error sub-tab shows the error report, if the execution stopped because an error report was generated. If debugging has stopped before the execution of the robot is complete, you icon. You can also restart the can resume the debugging by clicking the icon. This will abort the current debugging debugging by clicking the process and make RoboDebugger ready to start a new debug from the start of the robot. The debugging is also restarted automatically whenever the current robot is modified or replaced by another robot in RoboMaker. If the robot has input objects, the input values of these can be edited in the Input panel, and when you press Enter, the debugging will be restarted with the new input values. The input values cannot be edited while a debug is running, so if you want to change the input values, you must first restart the debugging. If you have a really big or long-running robot, you may want to uncheck the Show Location During Execution option in the Robot View Options submenu of the View menu. This will cause the current location not to be shown in the Robot View during the execution, which will speed up the execution.
148
149
Using Breakpoints
You can make RoboDebugger stop at a specific step in the robot by setting a breakpoint on that step. The easiest way to do this is to right-click on the step in the Robot View and select Toggle Breakpoint in the pop-up menu. icon in the step. The breakpoint will be indicated by a small When RoboDebugger reaches a breakpoint during debugging, it will stop, unless you have chosen not to stop at breakpoints in the Stop When panel. You can resume the debugging by clicking the icon. You can remove the breakpoint from a step by right-clicking on it and selecting Toggle Breakpoints. If you select one or more steps, you can remove all breakpoints of these steps by selecting Remove Breakpoints. You icon. can also remove all breakpoints in the robot by clicking the
Single-Stepping
You can make RoboDebugger execute one step at a time. This is called single-stepping. It is useful if you want to examine the execution very closely. You can single-step when RoboDebugger is ready to start a new debug, or when it has stopped during a debug. To execute the next step, click the icon. RoboDebugger will then execute that step and stop. You can then click the icon again to execute the next step, and so on. At any step, you can icon. also resume normal execution by clicking the
Using Environments
RoboDebugger includes a feature for running a robot using environments. Generally, the environments determine how returned objects are stored, how messages (including error reports) are processed and stored, etc. This section will only describe how you can use the environments to run a robot that generates a file containing the returned objects. The returned objects will be stored in either CSV or XML format. If you wish to learn more about environments, you should consult the RoboRunner User's Guide, and the RoboHelp online entry on environments.
150
To output the returned objects to a file, click the icon in RoboDebugger to enable the use of environments. (You can click it again to disable the use of environments.) Click the icon to configure the environments. This opens the Configure Environments window as shown below:
Select the "File Storage Environment", and click the icon to configure it. This opens the File Storage Environment Configuration window as shown below:
For the File Name property, enter the name of the file that the returned objects should be stored in. For the File Format property, select the format in
151
which to store the returned objects. When done, click "OK" to return to the RoboDebugger Main Window. To start the debugging process, click the icon. When the debugging process completes, the file you specified above has been created. Note that you cannot change the current environment settings during a debug. To change the settings, you need to restart the debug first by clicking icon. the
152
Setting Up a Browser
A browser, such as Internet Explorer, can be traced by setting it up to use a special proxy server which is built into RoboMaker and started when RoboMaker starts. This proxy server typically runs on port 9999, but if you start multiple instances of RoboMaker, additional instances will use different ports. You can see the exact port number in the Browser Tracer window. In Internet Explorer, setup the proxy server by opening Internet Options and choosing LAN Settings from the Content tab. Enable "Use a proxy server for your LAN" and type "localhost" in the Address field, and 9999 in the Port field. You should also clear the browser's cache because cached JavaScript files cannot be traced.
Tracing
To record a trace for either RoboMaker or a browser connected through the icon for the source you want to trace. Browser Tracers proxy, click the While recording, things may run much slower than normal since vast amounts of data is collected. Thus, you should make sure to disable recording by icon again once you have traced what you wanted. clicking the In a typical tracing scenario you would do the following: 1. Enable trace recording for RoboMaker. 2. Execute the step action in RoboMaker that you are interested in, say, a Load Page. 3. Disable trace recording for RoboMaker. 4. Enable trace recording for the proxy. 5. Perform the exact same actions in your browser, say, load a page. 6. Disable trace recording for the proxy. Now, you have produced two traces which you can compare side-by-side in the difference view.
153
JavaScript Trace
Below each JavaScript trace, the JavaScript source code for the currently selected trace entry is shown. When a trace entry is selected, the corresponding source code line is highlighted in the source view. The trace entry is the runtime result of the execution of the highlighted source code line. Each source code line may, of course, be executed multiple times, in which case multiple trace entries are produced - all corresponding to the same source code line. Stepping through trace entries can help you understand how a piece of JavaScript code works.
HTTP Trace
The HTTP trace shows HTTP traffic. Selecting a trace entry shows the details about that HTTP event in the detail view below the trace. The detail view
154
shows the request and response headers, as well as the request and response data sent. Normally, only POST requests will contain request data.
INDEX
155
Index
A
actions. See step actions attributes, 3 authentications, 5 Authentications View, 15 clip branches, 98, 104 clip condition, 98, 110 clipping session, 96, 103, 135 ClipRequest object, 97 ClipResponse object, 97 creating a clipping robot, 100 default clip branch, 98 deployment, 134 editing a clip branch, 107 End Session command, 97 form login, 124 hiding tags, 119 HTTP login, 124 layout changes, 116 login, 123 logout, 126 modifying clips, 113 modifying pages, 119 overview, 95 passing additional information, 133 popup windows, 121 portlet, 95, 134 Portlet View, 102 protected resources, 130 resource clipping, 130 restricting links, 129 RoboServer, 95, 135 selecting tags to clip, 114 single-sign-on, 123, 127 Test Clip action, 98, 110 Test Clip Command action, 99 Test Default Clip action, 98 adding a clip branch, 104 automatic login, 123 automatic navigation, 99 automatic navigation sequence, 128 Begin Clip action, 97 Begin Session command, 96 Clip action, 98
B
branching, 7 All Branches mode, 7, 141 branching mode, 7 Until Successful Branch mode, 140 breakpoints, 151 Browser Tracer, 154 browser setup, 154 difference view, 155 http, 156 javascript, 155 saving and loading, 156 tracing, 154 Browser View, 12
C
clipping, 95
156 user actions, 96, 131 username and password, 125 using another clip branch, 108 windows and frames, 121 clipping robots, 95 creating, 100 deploying, 134 moving around in, 102 structure of, 97 conditional actions, 6, 7 tutorial, 40 conditional actions, 29 connections, 6 adding new, 12, 42 removing, 42 converters. See data converters cookies, 5 Cookies View, 15 current iteration, 5 current robot project, 28 current step, 12 Step View, 15 current tags, 14 current window, 5, 12
E
environments, 151 errors, 8 error handling, 8, 137 error reports, 9 own errors, 137 received errors, 139 execution path, 43 expressions, 24 Expression Editor Window, 26 extraction, 29, 87 from tables, 93 of binary data, 90 of clips (stand-alone), 89 of range, 92 of text, 88, 91, 92 using patterns, 23
F
fields, 73, (See also forms) forms, 46, 71 basics, 71 choosing a step action, 74 default values of fields, 73 field groups, 79 field value assignments, 76 fields, 73 looping through, 78 simple submission, 71 submit buttons, 74 tutorial, 46 uploading files, 82 using pop-up menu, 83 value lists, 81
D
data converters, 5, 20 chaining, 20 debugging, 148 breakpoints, 151 environments, 151 from current location in RoboMaker, 150 making RoboMaker go to a location, 150 RoboDebugger Main Window, 17, 148
157
P
page Page Views, 12 page loading looping through pages, 84 patterns, 20 escaping, 21 operators, 22 Pattern Editor Window, 23 special symbols, 20 subpatterns, 21 Portlet View, 102 projects, 4, 27
I
initial values, 17, 66 input values, 17, 66
J
JavaScript JavaScript Source View, 13
L
libraries, 4, 27 location, 9 location code, 9 looping loop actions, 5 through forms, 78 through pages, 84
R
returned objects, 3, 30 RoboDebugger Main Window, 17, 148 RoboMaker, 1, 3 RoboMaker Main Window, 11 robot id, 63 robot id, 63 robot libraries, 4, 27 robot library files, 29 robot projects, 4, 27 current robot project, 28 robot state, 4 authentications, 5 cookies, 5 current robot state, 12 objects, 3 refreshing, 36 Robot State View, 12 windows, 4 Robot State View, 12
N
navigation, 29
O
objects, 3 attributes, 3 configuration, 65 input objects, 3, 17, 143 input values, 17, 65 Objects View, 16 output objects, 17 initial values, 17, 66 returned objects, 3, 30 Objects View, 16
158 Robot View, 11 robots, 3 clipping, 95 configuration, 63 editing, 19 execution, 6 navigation, 19 Robot Configuration Window, 63 robustness, 145 structure, 29 name, 5
T
tables extracting from, 93 tag finders, 5, 67 tag path, 68 Tag Path View, 12 tags current, 14 found, 14 Tree View, 12
S
Source View, 12 step actions, 5, 20 conditional. See conditional actions selecting, 15 Step Action Selection Guide, 15 steps, 5 actions, 5 actions, conditional. See conditional actions actions, selecting, 15 connections between. See connections current iteration, 5 current step. See current step invalid, 12
U
uploading files, 82
V
value lists, 81 value selector, 59, 76
W
windows, 4 current, 5, 12