AWS Data Warehouse Migration

Facebooktwittergoogle_plusredditpinterestlinkedintumblr

Overview: General Guidelines, considerations for Migrating your existing enterprise data warehouse to Amazon AWS services and a story from www.Blueskymetrics.com.

When I was little, My older brother would buy me an ice cream cone and tell me to just eat the top part with the ice cream and caramel. He would relieve me of the cone secreted away by the paper wrapper and happily finish the rest for me. I only got to eat 1/2 my ice cream because I didn’t realize there was a cone, more ice cream and caramel under the paper wrapper.

Enterprise Data warehouse migrations are like that ice cream cone, You’re going to get about 1/2 of all the “good stuff” right at the beginning, But you need to pull that wrapper off to find the rest. It’s not going to be easy if your hands are covered in ice cream and caramel [ A metaphor for looking beyond the obvious and not being distracted by the finished product – just getting the DB migrated. ]

The follow are some things to think about if you contemplating, planing or implementing a data warehouse migration to Amazon AWS ( or the cloud in general ).

Database Size – Volume Metrics

  • Current Database Size on Disk [ Including replication and/or RAID ].

You need to know the “Actual” Disk Size for Data, Index, Programs, etc. You need to know the actual size with and without replication and/or RAID. You may find that huge EDW was actually smaller than you thought. Build a “1-Pager”: I.e. A one page spread-sheet that you could use as a visual aid when talking to a senior manager, vendor and other team members. You can explain the space calculation in the sheet. However, the top of the page should be a one liner like … “728TB of Data on Disk, 505TB of Index on Disk, 120GB of Program space on Disk = Total 1233.12TB“. You should be able to explain the space usage simply without being overly technical.

Data, Data, Data

  • Pay Attention to the Data and the business reasons for migrating to the cloud.

Don’t loose focus for the data. The whole point of the migration is to enable your business to make better, faster and more insightful decisions. How is the current EDW being used? What’s in the data? How does the business get value from the data? How does the business action on the decisions made? What’s the workflow? How is the data visualized, data mined, reported on ..etc.? What is your access methodology? The list goes on. Just asking a few simple questions at the beginning will go a long way in helping you and your team when the data is migrated.

Schema Metrics – Tables – Objects

Diving deeper under the wrapper. Pay careful attention to the number of schema’s, number and complexity of Tables (storage containers), Indexing (access strategy – like hashing, sort key, storage key, index organized tables, etc., Program units – stored procedures, inline functions, user defined aggregations, triggers and more. Indexes and Stored procedures on a EDW? Everyone has different reasons for implementing one feature or another. Example, I’ve used triggers to identify constraint violations during an ETL process or to identify a slowly changing dimension inconsistency. There are best practices for sure but don’t expect everyone to adhere to a standard.

Most teams do a great job building a schema report before the migration and using this as a check list. Its important to consider that your migration will likely include modeling your Tables (db storage objects) into another technology. This is neither simple nor painful. Its just a fact of the process. Research, understand, build a proof of concept of moving a table or two from the source to the destination technology. i.e. Take a bit of time and measure as you go. How long does it take to export a table partition to a csv using a source utility? How long to use a service or pipeline? How much time did it take to set it up? How long did it load into an S3 bucket? How long do I need to save the migrated data? You get the picture…

There is a new service available to help you migrate your data base. The AWS Database Migration Service. Yes, It does allot of amazing things to help your process and data movement. There another tool, The AWS Schema Conversion Tool used to help identify problem areas ease in modeling efforts. This is not a review of the tools but a high level overview of things to consider. I would encourage you to explore these tools and others ( there are many available ) to help your migration effort.

Know that regardless of any process, service and/or tool – A successful and thorough data migration will start with a deep understanding of the source system size, scale, complexity, data usage, access strategy, usage pattern, number of connected users, expected compute, disk and memory requirements and more. Planing is never wasted. Finally, Don’t be afraid to pull the wrapper off and just start. If you have questions or need help with your migration, Please email me: support@blueskymetrics.com. www.Blueskymetrics.com