Product evaluation checklist (ETL)
Given below are some of the features to expect when shopping for a new enterprise ETL tool. Evaluate 2 or 3 tools under consideration against these features using a spreadsheet in matrix (tabular) format. Use color coding in spreadsheet cells to indicate which tool fares better on each feature. For example, green for good, red for bad. Decision making gets a little easier with this for all stakeholders involved.
Hardware costs and licensing costs have been avoided from this discussion. Decision makers usually know beforehand how much are they allowed to spend. I can’t provide accurate estimates either.
Administration
- LDAP/other features for user login/security administration
- Need for separate app server for user login or for other product features(additional cost vs better handling of services administration)
- Maintenance hassles (number of historical security updates due to vulnerabilities, estimated number of server restarts etc.)
- SLA and response times to tickets (after-sales support)
- OS compatibility (Linux vs Windows vs other)
Product Architecture/Integration Architecture
- Parallelism (general partitioning of data, process parallelism, how the tool handles parallel multi-node queries on databases etc.)
- Database connectivity for all current/future enterprise database products
- Native connectivity for other enterprise data sources (SAS datasets, S3 etc.)
- Cloud compatibility (AWS/Azure/GCP)
- Streaming or near-real time capability for SOA integration.
Development
- Support for custom ETL extensions in Java/general purpose languages
- ETL feature check (CDC, partition schemes for JOINs, XML/JSON compatibility, SCD etc.). Any specific feature related to your use case, for example CRC code generation for CDC or string aggregation features.
- Sorting, Aggregation features
- Exception handling features
Run-time features
- Visual impact analysis/performance monitor (what part of ETL is more time/resource consuming, how many records have been processed so far etc.)
- Run time log and descriptiveness of it.
- Job restart features.
Production control features
- A decent Scheduler
- A mature Source Control system for SDLC releases in different environments (dev, test, UAT, production, prodfix, DR)
Metadata management
- In built metadata framework (or compatibility with your enterprise frameworks). An example of inbuilt metadata framework is IBM Infosphere’s XMETA framework.
- Ability to import metadata from other sources (ERwin Data Models, XSD etc.)
- Audit trails if that’s a requirement (could be accomplished with a metadata framework)