This is a replicated copy of Deliver Data Where You Need It: A Replication Solution for Windows NT. It is provided for completeness and avoid "broken" link.

Bottom

Deliver data where you need it: A replication solution for Windows NT

by Richard Sinn

In today's heterogeneous systems and network environments, getting the data you want is not an easy task. Although there are a lot of tools available for processing and analyzing data on Windows NT, data are often distributed in distinct locations. The data might be stored in MVS mainframe databases, groupware Lotus Notes databases, UNIX-based LAN databases or any databases created within a Windows NT network.

Employing administrators to copy needed data or creating batch jobs to extract data might seem to be attractive solutions at first glance. But managing multiple operating systems, network environments and databases while requesting certain data at specific time frames might not be doable with simple batch jobs or the use of data administrators. As a result, such vendors as IBM and Lotus provide full packaged Replication Solutions for their customers.

Table of contents

Replication overview

Competitive advantages may be obtained by consolidating and analyzing data for better business decisions. Learning to uncover and leverage those advantages is what business intelligence is all about. Enabling business intelligence is made possible when the right applications and tools are available to analyze data, and, most importantly, when the data is prepared in a format suitable for analysis. Data Replication Solution can deliver enterprise data where and in the way it is needed. Replication really empowers the Windows NT network. Replication is needed if:

Table of contents

Why replication

Replication is not the same as a hot backup solution where the application is run in a master and slave relationship. Replication can be viewed as an extension to the databases regardless of what kind of databases are used and which locations or operating systems they run on. Replication should unite distributed relational databases into a cohesive and integrated database solution. Data changes should be automatically captured from the origin source database and propagated to any specified target databases, keeping the two or more databases consistent.

In planning to implement a replication solution, consider not using any vendor-specific protocol. Many database administrators will tell you SQL is the common standard among databases and the replication solution also should use SQL. The replication solution also should be able to meet the growing demands for data across the enterprise by delivering the right data in the right format into the right hands at the right time. Replication can help to:

Table of contents

How to implement a replication solution

One of the most popular replication solutions provided by IBM illustrates how to replicate IBM DB2 data from an MVS mainframe to Windows NT. IBM Data Propagator and DB2/UDB Control Center running on Windows NT are used in this example. Data Propagator is a product that will run on multiple platforms with DB2; there is actually no difference in the basic scenario when applying the same procedure for replication between two Windows NT systems. Please note that this example only demonstrates replication between DB2 to DB2 systems. How to replicate non-DB2 to non-DB2 systems is discussed later in the article.

Table of contents

System planning

Assume the MVS mainframe is the source system where the data is originally located. IBM Data Propagator for MVS with the Capture and Apply components must be purchased. Any mainframe hardware that supports MVS/ESA version 5 release 1 or higher, or MVS/ESA version 4 or higher will support replication. If replicating data from MVS to Windows NT, then the Windows NT system will be considered the target system. DB2/UDB on Windows NT using IBM DB2 Enterprise edition must be set up.

On both (MVS and Windows NT) systems, additional storage will be needed due to replication. The storage will be used by database management system log and journal data - logged to support the capturing and applying of data. In addition, new IBM replication control tables and temporary space file for the Apply program, called spill file, also will take up space.

Table of contents

Replication tools

Data Propagator provides asynchronous replication: the ability to copy changes from one table to another. The key technology is that Data Propagator can replicate "changes." There are three autonomous components: Capture, Apply and Administration.

Figure 1 and Figure 2 show the high level working of Capture and Apply. The Capture program captures changes from DB2 using database log entry. The Apply program copies these changes to the target tables. The Administration component sets up the replication environment. There are three logical servers in Data Propagator architecture. A server in this context can be defined as a logical system holding a specific set of information. The source server contains the source table; the target server contains the target table; the control server contains the replication control information. The source table is the database table being watched by Capture. All the changes of the source table will be replicated to the target table. Please note that all the servers discussed are logical because they all could be physically on the same system or on separate systems. The Capture component normally resides at the source server and the Apply component can reside anywhere in the network.

graphic: figure 1

Figure 1. DataPropagator - Capture.

graphic: figure 2

Figure 2. DataPropagator - Apply.

Capture

Capture is the database log reading program performing run-time change capture. Capture records changes to registered source tables and runs locally on the Data server. Source tables can be copied based on the changes to the tables (Differential Refresh) rather than always copying the entire tables (Full Refresh). Capture will read all the changes from DB2 log entries and automatically update Data Propagator internal staging tables and maintain transaction consistency.

Apply

Apply is the run-time replication manager that creates the target data copy. The Apply program runs locally on the target server. It connects to the source servers and copies source tables to the local database. Apply performs full or differential refresh of a target table based on source-to-target rules that are derived from subscription information.

Administration components

Administration components allow the administrator to setup replication. The main set of graphical user interfaces is the DB2/UDB Control Center (CC) as shown in Figure 3, which is integrated with IBM Universal Database Version 5. It is available on Windows NT, Windows 95 and OS/2.

graphic: figure 3

Figure 3. DB2 UDB for Windows NT control center.

Replication scenario

There are two major steps to be performed before starting Capture and Apply.

  1. Define tables that are replication sources. We call these replication sources "source table."
  2. Define "subscriptions" to those replication sources.

Define replication sources - registration

Use the Control Center to select the MVS system where the source tables are located. The developer might want to select the "before image" with the "after image" of the column for auditing purposes when defining a replication source. The process of defining a replication source is called "registration" in Data Propagator terminology. Users only register a source table once, regardless of how it is going to be used for distribution. The developer also can select only the required columns to replicate instead of the whole table as shown in Figure 4.

graphic: figure 4

Figure 4. Define a replication source.

Define subscription

After defining registration, a list of replication sources will be shown in the Control Center, accordingly. You can then use Control Center to define a subscription based on a replication source as shown in Figure 5.

graphic: figure 5

Figure 5. Define subscription.

Subscriptions can be viewed as the relationship between source tables and target tables. Many target tables can be subscribed to one source table. During this process, the developer can further sub-select the columns to refine the replication criteria as shown in Figure 6. Advanced features such as SQL Before/After and Timing also can be set with the subscription as shown in Figure 7 and Figure 8. After the subscription process, a list of replication sources will be available in the Replication Subscription session of the Control Center as shown in Figure 9.

graphic: figure 6

Figure 6. Advanced Subcription Definition.

graphic: figure 7

Figure 7. Adding SQL statement or SQL stored procedure call.

graphic: figure 8

Figure 8. Subscription timing.

graphic: figure 9

Figure 9. Listing subscription settings in control center.

Capture

There is one Capture program per system. When operations (such as SQL insert, update and delete) are applied to a database, the changes on the database objects are sent to a DB2 log entry. Capture will read from the database log and thus "captures" the changes and inserts them as rows in the internal staging table. All the control tables are created at install time or by the administration component.

Apply

The Apply program is an SQL application. Its main job is to copy data from source to target tables. Before the changed data can be copied to target tables, the target tables have to be synchronized with the source table; therefore, the Apply program does a "full refresh" of the target table. In other words, all the data in the source table will be copied to the target table at replication start time. After a full refresh is done, the Apply program copies the changes from internal staging tables managed by Capture so that only the committed data gets propagated. Apply process is bounded by Apply qualifier defined by the user during subscription.

Table of contents

What is missing -multiplatform replication

So far, only replication between two IBM DB2 systems has been considered. It is very common for customers to have heterogeneous data residing on different platforms. For example, you might have Oracle, Informix, MS Access and MS SQL Servers. To deliver data, "heterogeneous replication" is needed. Another main component in the replication solution is heterogeneous data access. IBM's DataJoiner running on Windows NT does just that. It provides transparent access for customers with multiple homogenous or heterogeneous data sources. It will also work with Data Propagator to replicate data among different heterogeneous data sources (DB2/400, DB2/UDB, Oracle, Informix, Microsoft SQL Server and Microsoft Jet).

Once server mappings and nicknames have been defined in DataJoiner, all the database SQL requests will look identical to access a remote DB2 server. In essence, DataJoiner can be viewed as a database middleware, which will transparently funnel all of your multiple databases into a database accessible in DB2 format. It not only provides heterogeneous data access but also allows join and updates to data from different sources.

Key features of the combination of Windows NT and DataJoiner include:

With DataJoiner and DataPropagator both running on Windows NT, the application can use its capabilities to transparently exchange data with non-DB2 data sources. A complex application that once connected through the network to different databases now can be changed locally with simple logic and no network access. This is because DataJoiner and DataPropagator do all the network connections and replication under the cover. Also, this integrated solution makes it easier to populate a Windows NT data warehouse with any business information in the organization, regardless of what database or server is storing the information. In other words, deliver any data, anywhere.

As shown in Figure 10 and Figure 11, putting DataPropagator with DataJoiner provides a Data Replication solution for small and large companies. DataPropagator with DataJoiner allows the replication of data from/to non-DB2 sources. As for administration, instead of using the Control Center, use the DataJoiner Replication Administrator (DJRA) for setup and administration tasks. DJRA is included with DataJoiner for Windows NT. The same steps are performed for homogenous replication: create control tables at the logical source server, control server and target server. When DJRA recognizes that DataJoiner is defined to go to a non-DB2 source (such as Oracle or Informix), it automatically creates all the internal tables, database nicknames and programming triggers. The setup of replication is done only once and data will be replicated to any platform.

graphic: figure 10

Figure 10. IBM's open and comprehensive data replication solution.

graphic: figure 11

Figure 11. DataJoiner Replication Administrator for Windows NT.

What happens if the data is not stored in database SQL format? As mentioned earlier in the article, Data Propagator can create a replication target type called Consistent Change Data (CCD) table. The format of CCD is published to the public and any application can use CCD as a means of communication. For example, Lotus NotesPumpn uses CCD to replicate data between CCD and Lotus Notes databases; IBM's DataRefresher also uses CCD to replicate data from IMS,X VSAM or flat files.

Table of contents

Conclusion

DataJoiner provides the gateway to heterogeneous data environments. Data Propagator architecture provides a DB2 replication solution. Putting IBM Data Propagator and DataJoiner together enables an advanced comprehensive replication solution to move data from anywhere to anywhere.

 

Photo of Richard Sinn Richard Sinn is a Staff Software Engineer in IBM Santa Teresa Laboratory, San Jose California. He also is a lecturer in San Jose State University and a freelance writer for different magazines and journals. He can be reached via e-mail at sinn@us.ibm.com or at his Web site at http://www.openloop.com/.

 

Top