Whitepaper Best Practices For Using The OpenText Integration Center A Technical Paper
Whitepaper Best Practices For Using The OpenText Integration Center A Technical Paper
Abstract This paper is intended for IT professionals interested in understanding and learning about OpenText Integration Center. It presents an introduction to the solution, its architecture, and key features, as well as useful best practice information and use scenarios.
Contents
Introduction ................................................................................................................. 4 Integration for the Enterprise .................................................................................... 5 Transform Information for Corporate Intelligence ................................................... 5 Integration in the ECM World ................................................................................. 5 Open Text Integration Center ..................................................................................... 7 Open Text Integration Center Architecture .............................................................. 8 Adaptable Architecture ............................................................................................. 10 Integration Center Components .............................................................................. 11 The Integration Center Engine ............................................................................. 11 The Integration Center Repository ....................................................................... 11 Integration Center Designer ................................................................................. 12 Integration Center Scheduler ............................................................................... 13 Administration Tools ............................................................................................. 13 Web Services Publisher ....................................................................................... 14 Integration Center Connectors ............................................................................. 15 Web Services Connectors ............................................................................. 15 API Connectors .............................................................................................. 15 Database Connectors .................................................................................... 16 Text Connectors ............................................................................................. 16 Integration Center Key Features ............................................................................. 18 ECM Objects Native Support and Graphical Interface for Information Integration Projects .............................................................. 18 Environment Neutral ...................................................................................... 18 Extraction ....................................................................................................... 19 Incremental Extraction/Change Data Capture ............................................... 19 Middleware and Standards Support .............................................................. 19 Transformation Functions .............................................................................. 20 Data Mapping ................................................................................................. 21 Transaction and Nested Transaction Support ............................................... 22 Validation against Metadata ........................................................................... 22 Data Loading .................................................................................................. 22 Tracking Changes .......................................................................................... 23 Dynamic Impact Analysis ............................................................................... 23 Auto-Documentation ...................................................................................... 24 Versioning ...................................................................................................... 24 Metadata Management .................................................................................. 24 Flexible Scheduling ........................................................................................ 25 Data Quality Management ............................................................................. 25 Error Handling ................................................................................................ 26
Audit and Monitoring ...................................................................................... 26 Optimizing Performance and Throughput .............................................................. 27 Failover Capabilities ....................................................................................... 27 Performance Measurement ........................................................................... 27 Process Optimization and Tuning .................................................................. 28 Parallelism and Process Slicing ..................................................................... 28 Processing Methodology ...................................................................................... 29 Transformation Performed Exclusively by the Engine ................................... 29 Transformation Performed Partially by the Engine and Remote Databases ............................................................................................... 29 Transformations Performed by the Engine and Remote Databases ............................................................................................... 30 About Open Text Connectivity: Your Trusted Link between People and Information ......................................................................................................... 32
Introduction
The challenge of managing and leveraging all of the data within an enterprise grows increasingly complex. More and more applications, such as Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), and Supply Chain Management (SCM), have become embedded in the enterprises daily business and, combined with Web applications and legacy systems, they have created an elaborate and complicated IT environment. Many of these applications represent large investments by the company, yet the data contained in these systems is often isolated and not easily accessible. In todays competitive and demanding business environment, organizations are recognizing the value of analyzing all enterprise data and content to gain a single version of the truth about customer relationships, business performance, and supplier capabilities. And this analysis is starting to take place in real time, as businesses operate on 24/7 requirements. The first step in this analysis process is Data and Content Integrationaccessing and consolidating the disparate data and systems to feed data warehouses, operational data stores and analytic applications alongside corporate content repositories and new Web 2.0 technologieswhich is the basis for analysis of the entire enterprise. Moreover, to enable faster implementation of business processes, organizations need a solution that provides the ability to exchange data between all systems in their IT environment. This paper discusses Open Text Integration Center as an enterprise data integration tool with a focus on its technical capabilities, key features, and best practices for configuration.
When migrating corporate content and metadata from a competitors document management system to the Open Text Enterprise Content Server, only Integration Center ensures that all types of content make the transition, but its uses go beyond competitor migration scenarios. Open Text Integration Center can decommission legacy data by moving them to Open Text ECM Suite from outdated systems. Open Text Enterprise Web Services provides the connection between Integration Center and Open Text Enterprise Content Server, and Content Linka sample web services libraryensures that Integration Center is ready to connect from the start.
The benefit of a hub-and-spoke architecture with a centralized and open repository is that organizations can maintain full control of all data exchange processes, business rules, and metadata that make up any and all projects within the enterprise, instead of being locked into disparate closed systems. This enhances environment management and empowers knowledge workers to make better, more efficient use of business intelligence and analytical applications. Since its initial development, Integration Center has followed an open and extensible design concept in order to provide a solid platform for future development, simplifying development of additional functionality and unifying the look and feel of different applications in the Center. This structural architecture has enabled Open Text to develop a procedural approach to data transformation and exchange processing that gives users unlimited capabilities to transform and process all types of data, whether its traditional structured data or less structured information in content repositories, wikis, and blogs. With this approach, users are not limited to the functions provided by the tool. Instead, they are free to develop their own re-usable transformation code to any degree of complexity.
Integration Center is built on client/server architecture, and incorporates a centralized and open metadata repository. It can be implemented within a distributed deployment model, allowing multiple developers to work on projects simultaneously with complete version control and customized access privileges.
Adaptable Architecture
You can install Open Text Integration Center on Windows, UNIX and Linux platforms and its repository can reside on a variety of RDBMSs, including MS SQL and Oracle. For simple ETL-type scenarios in which you want to extract structured data from a source RDBMS, transform it, and then load it into a target RDBMS system, simply install Open Text Integration Center on a server between the source and target systems. For high-volume, time-critical projects, you can install Open Text Integration Center on multiple servers and define multiple engines on each box, thereby deploying multiple Integration Center Engines to share and expedite the extract, transform, and load process. Integration Center Processes can be assigned to multiple engines in this case. For additional performance gains, you can install Integration Center on the same server as the target RDBMS database to avoid network latency. As outlined previously, Open Text Integration Centre can migrate semi-structured information, such as Microsoft Word, Excel, and PDF documents, to and from Open Text ECM Suite. When dealing with large volumes of documents, installing Integration Center on the same server as Open Text ECM Suite ensures that all web services communication happens locally, dramatically increasing the speed of document ingestion. You can use parallel processing to take advantage of multiple CPU processors on the server.
Each component of a data transformation and exchange process is created as an object and stored in this repository. Relationships between objects are automatically maintained, with a comprehensive set of dependency management features. Integration Centers dependency management capabilities provide dynamic impact analysis whenever changes to metadata are identified. Every dependent object impacted by a change (internally or externally) is automatically indentified before the next data transformation and exchange process is executed. This ensures information quality and consistency, and reduces the time required to maintain data integration processes.
Designer is the developers tool that is used to design and create data mappings, extracts , transformations, and exchange processes .
Administration Tools
Integration Center includes powerful administration tools: Administration Console, Real-Time Administrator, and Execution/Log Viewer. Administration Console is the central management tool for Integration Center, which you can use to perform essential administrative tasks, including: creating, initializing, and connecting to repositories importing and exporting repositories configuring Integration Center Services defining hosts and configuring Loaders used for bulk transfers between source and target Tables defining users and their rights defining and connecting projects importing and exporting projects to and from a repository
Real-Time Administrator is a real-time communication management application that provides an overview of Integration Center Services as well as administration and/or execution threads running on all host machines defined for a particular Repository. You can use Real-Time Administrator to: view the properties of host machines defined in the active repository view the status of Integration Center Services for host machines view the status of administration and execution threads view and stop Process executions
Execution Viewer provides you with real-time monitoring of Process or Module executions. It lets you view or interrupt the progress of any running execution, and is launched each time you manually execute a Process or Module in Designer or Scheduler, or view a Process execution in Real-Time Administrator. You can also launch it as a standalone application (GenRun.exe) from within Windows Explorer for log viewing purposes. In this case, the application is called Log Viewer.
Execution Viewer launches only when a Process or Module is executed manually. When a Process or Module is executed as a result of a scheduled program, Execution Viewer does not run. For more information on managing Process/Module executions, see Designer, Scheduler, or Real-Time Administrator Help. Log Viewer lets you view Process or Module execution logs. You can load and view the contents of the following types of execution logs: Process or Module execution log (.xml) files on your local machine. To view these files, launch Log Viewer as a standalone application (GenRun.exe) from Windows Explorer. Process logs listed in the Logs or History view in Scheduler. To view these logs, launch Log Viewer by double-clicking any Process log in Scheduler.
The Integration Center Repository must be installed on one of the following RDBMS for use with Web Services Publisher: MySQL Microsoft SQL Oracle DB2 Informix Sybase
API Connectors
Integration Center includes several types of intelligent API connectors, which enable it to connect to applications/systems with very complex database schemas or lacking web services connectivity. They are pluggable metadata bridges embedded in Designer that enable the importation of data structures from ECM Repositories, CASE tools, ERP systems, XML Schema, or Web Services Description Language (WSDL) documents. Integration Center is certified by SAP for both CA-ALE and BW-STA interfaces. The solution also includes pre-built connectors for middleware and resources such as: FTP, MQ Series, Lotus Notes, MS Exchange and HTTP/HTTPS, as well as a framework for building connectors against additional APIs.
Mainframe Intelligent Connectors Integration Center includes a set of Mainframe Connectors, which consist of two tiers: a dedicated piece of data access middleware (installed on the host), and an ODBC driver for the specific legacy system. This structure allows Integration Center to extract information from various systems on the mainframe, including: VSAM, IMS/DB, Adabas, Image, Allbase, Eloquence, KSAM, and FDGen files. Integration Center MetaLinks The Integration Center MetaLinks are pluggable metadata bridges embedded in Designer that enable the importation of data structures from ECM Repositories, CASE tools, ERP systems, XML Schema or Web Services Description Language documents.
Database Connectors
Integration Center Database Connectors connect to most relational databases, including Oracle, Microsoft SQL Server, Sybase, IBM DB2, Teradata, Essbase, and others. Native population of multi-dimensional databases, such as Essbase, lets users directly create all hierarchies or members, set all necessary attributes, and load or refresh cubes. Through native access, users do not require an additional staging area or complex, multi-layer tools from multiple vendors. There are two advantages to this approach namely much better performance due to the elimination of any staging area, and maintenance of programmatic control of multi-dimensional cubes within the transformation logic. The complete list of available connectivities is included in the Installation and Administration Guide.
Text Connectors
Integration Center also natively accesses files such as Fixed Length Files (including mainframe flat files), Delimited Files (CSV), or XML files, and allows processing of any complex files (such as EDI, IDoc, or WebLogs).
Organizations can access information stored in SAP applications, combine it with data from other sources, and then share it with other systems throughout the enterprise. Integration Center delivers native connectivity to extract SAP data, supports bi-directional data interchange with SAP through the SAP IDoc format, and populates SAP BW with data from other systems.
Environment Neutral
Integration Center is completely platform and database neutral. These features allow Integration Center users to develop generic business rules without binding them to any specific environment. Objects created in Designer are stored in the centralized metadata Repository. This centralized development model eliminates the need to re-code business rules, lookup tables, and custom functions for each new transformation project. At execution time, the Integration Center Engine reloads metadata-driven processes and generates the appropriate code for the target environment.
Currently, Integration Center supports natively, either in 32 bit mode or in 64 bit mode, six main platforms (Windows, Sun Solaris, IBM AIX, SUSE and Red Hat Linux).
Extraction
Integration Center extracts data from the source databases using native SQL grammar, making it possible to optimize the use of source database power and minimize network traffic. By accessing only the source rows that are pertinent to the transformation work, the Engine avoids loading all the data into a staging area. When working with text sources, Integration Center has a variety of tools to manage complex structures like hierarchical data dumps from mainframes or EDI files. Integration Center function remains the same, regardless of whether the source or target is a text file or database table.
Transformation Functions
Integration Center has a complete set of transformation functions, making it as capable as a programming language, but providing a graphical and optimized user interface to make the design of transformation routines more productive. Integration Center offers roughly 150 generic functions that can be used to build complex expressions or custom functions. These functions cover the entire spectrum of string, dates, number or Boolean manipulation. Complex clauses such as IF, THEN, ELSE or CASEcan also be written in expressions. These functions can be processed inside the Integration Center Engine on Windows, UNIX or Linux, but they can also be automatically translated in native SQL functions in order to execute them on the database engine side. Using these standard functions, Integration Center users can create their own macros to describe business rules. For example, a function Discount can be calculated from a given sales amount and used everywhere across Integration Center transformations, being processed inside the Integration Center Engine or on a remote database. For a full description of all available functions, see the Designer Users Guide PDF or Designer Help. Support for Stored Procedures and SQL Functions Integration Center can also invoke stored procedures or any piece of SQL code that can be executed on databases. These SQL scripts can be declared in the Integration Center Repository to guarantee the reusability of existing code defined within relational databases, either source or target. These stored procedures and SQL Functions can be used to retrieve data, either to extend Integration Centers transformation feature set or simply to trigger external processing on the database side. This also enables better distribution of processing by allowing the use of remote databases transformation functions within data integration processes. For example, Oracle sequences can be reused this way.
Support for External Functions To extend the processing capabilities of Integration Center, it is also possible to use any legacy function in a DLL written in C++ or any other language. These external functions are declared once in Designer, and can be used seamlessly in all Integration Center transformations, thereby preserving legacy investments. Also, Integration Center can call any Web Service, executable, external batch or shell script for specialized transformation needs. National Language and Unicode Support Integration Center delivers comprehensive National Language Support and Unicode support. It allows simultaneous connections to multiple systems encoded in different character sets and exchanges data between these systems. Integration Center supports most single-byte, double-byte and multi-byte character sets as well as Unicode. Whenever possible, Integration Center can convert data from one character set to another and simultaneously manipulate strings encoded in different code pages. Integration Centers user interface also fully supports Unicode. It allows manipulation of metadata encoded in different character sets, and delivers support for international development teams.
Data Mapping
Integration Center provides different ways to define mapping. Whenever possible, the tools can automatically detect mapping based on field names, field order, or any custom algorithm. Also, simple graphical mapping from the source to target can be manually defined using drag and drop functionality, and more complex mapping can be done using the Integration Centers graphical procedural language. Aggregating, Filtering, Sorting, and Creating Joins When multiple sources, heterogeneous or otherwise, are required, users are able to define datasets (logical views on the information system) to de-normalize, join, aggregate, sort, and distinguish data from the various source systems. These datasets can combine multiple objects from heterogeneous systems, and can also be used in other datasets.
These operations are defined graphically inside Integration Center, with no need to write SQL code. Nevertheless, they cover the entire functional spectrum of relational database features. Users can define regular joins, external left or right joins, full outer joins, calculated joins, or recursive joins involving the same table or view several times, through aliases that Integration Center manipulates transparently. Filters are transformed in WHERE or HAVING clauses, and sorting becomes ORDER BY. Integration Center recognizes each SQL grammar, adapting itself to the source or target DBMS.
Data Loading
Integration Center has multiple loading strategiessingle row, packet, and bulk. In certain cases, the loading can be done by the source database directly, when the developer has decided to bypass the engine altogether. For more details, refer to the topic, Integration Centers Unique Processing Methodology.
Integration Centers design environment does not impose a pre-defined methodology to implement data-loading processes. It has been designed to be highly productive and generic to support different needs. Integration Center provides a comprehensive and flexible solution that supports both a full data refresh strategy as well as data updates when needed. Delete, Insert, and Update strategies are all supported natively. Integration Center also provides high-level, user-friendly commands such as SmartInsert and SmartUpdate, designed to simplify row additions or updates in tables (database Merge).
Tracking Changes
Integration Center provides the ability to track differences between an object definition stored in the Integration Center Repository and the state of the same object as it exists in the remote system (physical object in the source or target). By utilizing the Track Changes wizard, Integration Center users can automatically detect and import changes into the Integration Center Repository. Every change made to an object, whether its located in a remote database or in Integration Center, is also stored and available for documentation purposes. This means that Integration Center is always consistent with data structures as they exist on remote sources and targets, ensuring data accuracy and consistency in every data transformation and exchange process.
This impact analysis is triggered either by changes made by the developer within the Integration Center environment or by the Track Changes feature. For example, if a source data structure changes, the Track Changes feature detects it, and the impact analysis identifies the effect of the change on all related interfaces. Integration Centers impact analysis eliminates the need for developers to spend time manually tracking down dependencies whenever a change is made. Integration Center provides a persistent list of invalid and undefined objects, allowing developers to know the exact state of their metadata, and the immediate consequence of making a change to it. This dynamic impact analysis helps developers fix impacted objects by providing a thorough description of required changes, and even auto-correction mechanisms. This decreases the length of the maintenance cycle and increases developer productivity.
Auto-Documentation
Integration Center Designer automatically manages the documentation of projects, including dependencies between objects, modification history and comments. Integration Center users can automatically print or generate HTML documentation, dependency graphs, or dataflow graphs at any time. This significantly reduces documentation efforts, and ensures the accuracy of project documentation.
Versioning
Integration Center is built on client/server architecture and leverages an open metadata repository. It can be implemented in a centralized or distributed deployment model, allowing multiple developers to work on projects simultaneously with complete version control and customized access privileges. Integration Center natively supports version management and status management (for example, Development, Test, and Production). All versions of data integration projects are independent from each other and can be used in parallel. All objects in these projects have timestamps for creation and modification as well as user information and comments. The history of data structure modifications is also maintained automatically by the tools.
Metadata Management
With the vast amount of information that organizations currently have at their disposal, there is an ever-increasing need to collect, manage and reuse that information. Organizations want to know what information they possess, its location, its origin, and its size. This data about data is called metadata. It can describe any characteristic of the datasuch as the content, its structure, its quality, or any attributes related to its processing or changes.
Quite simply, metadata is an important catalog of information from any number of sourcesdata warehouses, data exchange tools, business intelligence tools, ERP CRM, SCM, business process modeling, workflow, data quality tools, ECM systems, or any other application dealing with data. Metadata secures the lineage of data, enabling knowledge workers to gain access to business rules, and to understand where the data came from and how it has been handled to date. This makes the time they spend on query and analysis activity more productive. Metadata management provides critical access for both business users and technical users working with the data. Depending on the type of user, metadata can serve either as a blueprint to the inner technical workings of the warehouse, or as a roadmap to assist in navigating the warehouse and locating useful information. Metadata also delivers valuable help to organizations when it comes to their compliance with regulatory rules. The Integration Center Repository contains all metadata used by data integration processes. This metadata is made available to users through Integration Center toolseither by querying Integration Centers open database repository or though XML datagrams.
Flexible Scheduling
Integration Center includes a complete scheduling facility, making it possible to schedule process execution at a fixed or recurring time, periodically (daily, weekly, monthly), triggered by outside events, or from the polling service (file based events). Data Integration processes can also be triggered by external events or Message Oriented Middleware (MOM) such as IBM WebSphere MQ. Combining these functions, Integration Center developers can build as complex scheduling rules as necessary. Integration Center Scheduler is not always required, as there is support for external applications setting Integration Center variables, launching processes and receiving the result of the process. This makes it very easy to implement use of system management applications like IBM Tivoli or CA Unicenter. The substitution process is straightforward, and it can be implemented on UNIX, Linux, or Windows, using standard API calls or command line utilities.
Error Handling
Integration Center, through its graphical procedural language, also delivers exceptional error handling capabilities. Integration Center automatically reports all errors and anomalies in its log. Technical exceptions (such as Datatype issues, or constraint violation), are automatically handled by the tool, while other exceptions types, such as business-rules-driven exceptions, can be handled through userdefined exceptions. Integration Center users can then implement various exception handling strategies and decide if the execution should be stopped after a certain number of exceptions, or if incriminated data should be output into rejection files. All of this can be done using Integration Centers procedural language, which provides users with an easy and comprehensive mechanism for error handling. It is possible to define any logical test and implement virtually any type of processing according to organization business rules.
Failover Capabilities
Integration Center does not impose any methodology on failover functionality. The open architecture enables the developer to use any preferred technology for failover systemsincluding power-off restarts or complex rules on continuity. To permit such implementations, Integration Center provides the key features required to implement complex failover strategy. When triggering a process execution, Integration Center users are able to define the list of the Integration Center Engines and timeout for each one. If the process execution fails, Scheduler can automatically trigger a fail process that will implement the desired failover strategy or restart the same process. Within each process, users can define restart points and therefore automate process restarting. Open Text training and best practices routinely teach these different approaches, and can help Integration Center developers find the best approach for each specific situation.
Performance Measurement
Integration Center has a unique performance meter inside its logs. All the different tasks are timed, including the module coherence tests and SQL statement performance, as well as the load processes. Also the volume of impacted data on every single target system is readably available in these logs.
As a result, the Data Integration process administrators can easily spot any potential performance problems. Integration Center can automatically email this report to the developer or the system administrator after each execution, as well as keep it in the Repository, making it possible to analyze the performance. It is also possible to use performance measurement tools to detect and isolate networks, machines, or any other potential bottlenecks.
Processing Methodology
Integration Center offers a unique methodology that distributes transformation workload by offloading certain tasks to idle database engines during off-peak hours to maximize efficiency and system performance. The following graphics and descriptions depict the three modes of transformation processing that Integration Center offers:
Each data access mode is accessible through a common user interface and data integration process. These various modes are defined using the same graphical metaphor and programming methodology. By maximizing user control over data flow, Integration Center data access architecture enables users to significantly improve the performance of their data exchange processes. Being able to select, manage, and summarize only relevant data, and control the platform on which work is executed, vastly improves performance. Regardless of which data access model is chosen, Integration Center impact analysis capabilities are maintained, ensuring that if changes are made to any element of the data exchange process, administrators are notified prior to the next scheduled execution.
About Open Text Connectivity: Your Trusted Link between People and Information
Open Text Connectivity connects people, data and applications in mission-critical environments with an award-winning suite of solutions. For over 20 years Open Text Connectivity has continued to combine the best of both worlds: the strength of one of the largest software companies and the spirit of a customer-focused business.
Sales:
Support:
w w w. o p e n t e x t . c o m
For more information about Open Text products and services, visit www.opentext.com. Open Text is a publicly traded company on both NASDAQ (OTEX) and the TSX (OTC). Copyright 2009 by Open Text Corporation. Open Text and The Content Experts are trademarks or registered trademarks of Open Text Corporation. This list is not exhaustive. All other trademarks or registered trademarks are the property of their respective owners. All rights reserved. SKU#_EN