Cloud Data Integration

Tuesday, January 7, 2014

Jitterbit – Column Names are not coming by default in Flat file target.

Jitterbit is an integration tool that delivers a quick and easy way to design and build integration solutions. You can go through my blog post Introduction to Jitterbit to get an overview of this tool.

Here we are discussing one potential issue that I have encountered while using this tool and a workaround to fix the issue.

I created a simple integration project in which I tried to export the data from Salesforce (Source) and write it to a flat file (target) in a Jitterbit operation. Interestingly, the data records got captured in the target file, but the header information containing the Column Names was missing from the target file.

I was unable to find any setting in the flat file structure definition that asked about including/excluding header from flat files.

After doing some further research, I found out that I need to use a Jitterbit plugin – PrependData - to get the column names in the file.

Please find below the steps that I had followed to fix the issue:

Download the PrependData plugin (zip archive) from Jitterbit Plugin page.
Extract the Zip Archive plugin files and move or copy the extracted root folder to JITTERBIT_HOME/Plugins/Pipeline/User/. Then restart the Jitterbit Process Engine service.
Create a new Script object and add the following text in the script:

$DataToPrepend="Column_1, Column_2, Column_3\n";

$DataToPrepend is a standard global variable of Jitterbit and Column_1, Column_2 are the exact column names.

In the Jitterbit operation, add this Script object created in the above step just before the target object.
Now we need to assign the plugin to the target. So right Click on the Target object and select the Plugin option. This will list all the available installed plugins. Select the DataToPrepend plugin and click Assign button to complete the assignment.

Save the changes, deploy the operation and project and execute the project.
And thus, the column names appear in the flat file.

This is a bit overkill on part of jitterbit as writing to a flat file target with column names should be a standard functionality and using of script and plugin make a simple integration complex.

I hope Jitterbit will address this issue in their future releases and make it part of their standard functionality.

Monday, January 6, 2014

Introduction to Jitterbit

Jitterbit is an integration tool that delivers a quick and simple way to design, configure, test, and deploy integration solutions. It is designed for integrations to be created and managed not only by developers, but also by the analysts who understands business process.

Jitterbit provides an easy, effective, and inexpensive way of integrating data, applications, and even devices across:

On-premise databases, packaged enterprise applications (SAP, Microsoft, Oracle) and custom applications and services (XML, Web Services).
Cloud based applications (Salesforce.com, Workday).
Social media and mobile applications (facebook, twitter, linkedin).

Jitterbit’s wizard driven approach consists of three stages:

Design: Jitterbit Studio utility is used to design the data integration processes in a graphical wizard driven interface. The wizard includes steps for creating end points to establish connectivity with Source and Target Systems, building mappings, applying transformation and scheduling the integration process.
Deploy: Once a project is created in Jitterbit Studio, it needs to be deployed on site, to a server behind firewall, or to the cloud.
Manage: Once the project is deployed, the project is managed with real-time process monitoring tools.

Jitterbit allows companies of all sizes to solve the challenges of application, data, and business process integration between on-premise and Cloud systems. Jitterbit's graphical "No-Coding" approach accelerates and simplifies the configuration and management of on-premise and cloud integration projects.

Sunday, December 1, 2013

6 Steps to Connecting your Database with Salesforce.com using Informatica Cloud

Step - 1 Define the task and select the task operation.

Step - 2 Configure the Database Source and Select the table from which data needs to be read.

Step - 3 Configure the Salesforce Target and select the appropriate object in which data needs to be loaded.

Step - 4 Define data filters that will be applied on the Source.

Step -5 Configure Mappings and Expressions. Drag a field from Source and drop it onto the target.

Step -6 Schedule the task.

References:
https://community.informatica.com/servlet/JiveServlet/previewBody/2233-102-1-2479/6steps.pdf

Wednesday, April 10, 2013

How to use Salesforce to Salesforce Automation with Informatica Cloud

Salesforce to Salesforce is a native force.com feature to share data records in real time between two Force.com environments (orgs). For example, two business partners may want to collaborate by sharing accounts and opportunity data within their orgs. It is very easy to share the data using Salesforce to Salesforce feature.

In most scenarios, data can be shared using the standard Salesforce.com user interface manually. A user creates an account in one org, clicks the external sharing button, selects the appropriate org connection and then shares the record. This involves some manual effort and when sales reps are working on lot of records, the manual sharing process can become a painful exercise and that may cause some user adoption issues in the long run.

There are two objects which are controlling the Salesforce to Salesforce feature at the backend:

PartnerNetworkConnection - Represents a Salesforce to Salesforce connection between Salesforce organizations
PartnerNetworkRecordConnection – Represents a record shared between two Salesforce Orgs using Salesforce to Salesforce.

Whenever the user shares a record using Salesforce to Salesforce, a record gets created in PartnerNetworkRecordConnection object.

Informatica Cloud can be used to make this integration between two Salesforce orgs seamless and automatic. The process flow will be as follows:

User creates the records (which need to be shared) in his org.
As soon as the record is created, a workflow is kicked off which sends an outbound message.
This outbound message will initiate an Informatica Cloud task.
The Informatica Cloud task will insert the record details (created in setp -1) in the PartnerNetworkRecordConnection object.
As soon as the record gets created in PartnerNetworkRecordConnection, it is shared with the Partner Org.

Thus, the records created in one org are automatically shared to the Partner Org in near real time without any manual intervention.

Overall, the benefits of using Informatica Cloud are:

Data Synchronization achieved through the use of out of the box functionality.
Centralizes Integration Logic and no need of writing custom VF/Apex code for automating Salesforce to Salesforce feature.
Record Selection criteria for sharing can be easily defined using filters.
Exception handling of failed records can be set up to monitor the record sharing status.
Scalable approach.

So if you are considering implementing Salesforce to Salesforce feature to share records automatically, you can consider the Informatica Cloud approach as described above.

Feel free to reach out to comment for any clarification or questions.

Thursday, February 7, 2013

Integration and Analytics in the Cloud (100% Cloud)

I work as an Integration Architect.

Few years ago, life of senior management guys was not easy. One of my senior manager had to run a BI report everyday in the morning at 7 AM for doing some analysis and decision making. So he used to reach office before 7 AM, start his workstation, open the appropriate tools and then run the report. And this whole process used to take good 30 minutes to an hour. Apart from it, he used to have concerns regarding elasticity, availability, flexibility and cost of the tools and technologies.

He used to envision that one day he would be able to do this process by click of a button from anywhere, anytime. “No constraints whatsoever.”

We did an integration implementation for a large Canadian Telecom Giant. The name of the project was Marketing Data Mart.

The Marketing Data Mart consisted of an integrated architecture of heterogeneous data stores and technologies to support the ultimate analysis of data.

We needed to integrate the data from the following source systems:

Salesforce.com
Eloqua
Harte Hanks
Dun and Bradstreet Optimizer
Jigsaw Dun and Bradstreet Contacts

To make it happen, we had used the following tools and technologies:

Informatica Cloud - cloud based integration tool (http://www.informaticacloud.com/)
Amazon EC2 (Elastic Cloud Compute) – Cloud based hosting (http://aws.amazon.com/ec2/)
Amazon RDS (Relational Database Service) – Cloud based database (http://aws.amazon.com/rds/)
GoodData – Cloud based reporting and analytics (http://www.gooddata.com/)

Data from all the source systems was loaded and transformed in Amazon RDS and this data was fed into GoodData which enabled complex and analytical reports creation.

There were some initial challenges while configuring the Informatica on Amazon EC2, setting up secure FTP on Amazon EC2 and configuring Amazon RDS and GoodData because of our minimal exposure on these technologies, but we had the vision in front of us that enabled us to overcome all the hurdles and implement the entire integration on cloud. By cloud, I mean 100% on Cloud.

Some of the salient features are:

The complete integration was implemented on 100% cloud based technologies.
Informatica Cloud was configured on Amazon EC2 UNIX instance successfully.
Data Volumes to the tune of 2-3 million records were integrated successfully.
83 separate tables in Amazon RDS, containing data from 6 source systems, are part of the data mart solution.
Complex analytical reports and dashboards were generated using GoodData.
The client previously had to use 3+ separate systems to get reports which then had to be consolidated via spreadsheets & other tools. The reporting from GoodData is a one-stop shop for reporting across multiple systems, all accessible via a web browser. For deeper dives into the data, using sophisticated SQL queries, the client can run reports on the Amazon RDS database.
There was no compromise on the security aspect and the data of the client was stored in highly secure cloud platform.
Amazon EC2 and RDS are highly scalable and there are no concerns with respect to availability and flexibility

We have successfully proved that cloud technologies can be used for complex integrations and now senior managers can feel relieved as they can run the BI reports by click of a button anytime, anywhere.

Friday, January 18, 2013

Data Quality Myths – Understand and Save money

“The data stored in my systems has tremendous potential. But somehow, I am not able to unlock its true value”

Are you also facing the above problem?

Data Quality Issues are the major roadblocks which prevent enterprises to realize the true potential of their data.

The 1-10-100 Rule of total quality management is very much applicable for Data Quality.

It takes $ 1 per record to verify quality of data upfront (prevention), $ 10 to cleanse and de-dupe (correction) and $ 100 per record if nothing is done- ramification of mistakes felt over and over again (failure).

Now that you understand the 1-10-100 rule, let’s look at some of the myths related to data quality which have been followed by many enterprises.

Myth 1 - My data is accurate as I have been using this for years without any problems.

Most people believe that if there are no reported problems or issues, their data is accurate. But have they realized lately that they may have missed many business opportunities which did not substantiated because of bad data? The worst part is that they have no clue about those missed chances.

A recent report from Artemis Ventures indicated that poor data quality costs the United States economy roughly $3.1 trillion per year. To provide some perspective on this unimaginably large figure, that’s twice the size of the US Federal deficit. An estimate from the US Insurance Data Management Association puts the cost of poor quality data at 15% to 20% of corporations’ operating revenue.

Myth 2 - I am getting my data enriched regularly and paying per record for enrichment.

There are various vendors that provide data enrichment services and charge on a per record basis. Let’s assume you are sending 100,000 records and the vendor is charging 30 cents per record, then the total cost of data enrichment is $ 30,000. At a later stage, you realized that 40,000 records were duplicates and ideally, these records should not have been provided for enrichment and that may have saved $12,000.

Generally, the clients send the data to vendor at periodic intervals and they may be losing huge amount of money every time.

Myth 3 - I have been using my data for regulatory compliance and there have been no issues lately.

Pharmaceutical and financial institutions need to provide data to regulators for regulatory compliance. It is a critical task and slightest of the non-compliance can result in serious financial and legal implications. In such situations where stakes are high, one should not wait for issues to arise; they should proactively look for various measures to prevent Data Quality issues.

A report recently issued by Aberdeen Research indicates that almost half of finance employees are “challenged by the fact that their organizations are leveraging risk and compliance data in different formats, making it difficult to compare data.” According to the report, complying with regulations is a key concern for CFOs. And a distressing number of respondents indicated that the existing IT infrastructure is lacking in the advanced capabilities needed to support governance, risk and compliance (GRC) initiatives.

Data has always been the king. The sooner you realize this, the greater you are expected to save.

The best there is, the best there was and the best there ever will be - and that is Data Quality.

It takes just a tiny bit of invalid or bad information to create monumental issues. Bad data multiplies at an exponential rate, corrupting not only the system in which it originates, but also the many other data sources it interacts as it moves across the business. Thus, the longer a company waits to detect and correct a bad record, the more and severe damage it can do.

Thus, there is a need to establish a Data Governance framework - a combination of disciplines, enhanced processes and the right mix of tools and technology addressing the critical data issues that will drive the biggest returns, resulting in clean data that deliver results and information that is accurate.

References:

http://disastermapping.wordpress.com/2012/02/16/the-costs-of-data-quality-failure-2/

http://blog.match2lists.com/general-information/the-costs-of-data-quality-failure/

http://www.accountancysa.org.za/resources/ShowItemArticle.asp?Article=Data+and+Regulation%3A+Compliance&ArticleId=2398&Issue=1113

Pages