Data Design Patterns — Lambda, Kappa & DDD
3 min readMay 16, 2018
Lambda and Kappa are data pipeline patterns, where incoming data (either batch or real-time data) is pipelined to a serving system for analytics or querying (for ML/BI/Visualization etc.)
Lambda architecture
- Lambda is a popular way of looking at data pipelines although people may not be calling it by that name— it has components for batch, real-time and analytics/serving layer.
- Batch component could be made of Map-Reduce/RDBMS/DWH/DataLake or a combination of these.
- Stream component is made of real-time streaming platforms like Apache Storm or AWS Kinesis.
- Analytics layer is where you have applications to analyze the data using ML, BI, or simple querying applications.
- The drawback is that data is duplicated between batch and streaming layers, and hence need to be in sync.
- But if batch layer holds historical data for historical analytics and streaming layer streams social media analytics or IOT data that’s not always going to historical storage (real-time analytics), and if that’s your business use case, Lambda architecture is well suited.
Kappa Architecture
- Kappa was an idea brought about by the invent of new batch systems that can handle real-time streaming, and at the same time are horizontally scalable. Apache Kafka and Spark motivated Kappa architecture.
- There is no separate batch layer. There are only 2 layers — streaming layer and analytics/serving layer.
- Although Spark hit it off as a replacement for Map-Reduce (which is batch based), Spark Streaming library in Spark can handle real time streaming. It’s called micro-batching but it can accomplish most of what is possible in a full-fledged real-time streaming system.
- This helps reduce the need for additional code to implement and sync up a separate batch based system. But if historical data is needed for future analytics it has to be eventually dumped into a data lake, which is not part of the Kappa architecture.
Domain Driven Design
- Domain is the “sphere of knowledge” surrounding a process or entity. For example, retail banking or video games or a smaller universe like product hierarchy.
- Software design in general is domain driven. The emphasis of DDD pattern is on ignoring the implementation specific details to solely focus on the business domain being modeled.
- The design should speak a common language that the business and technology shares, based on the domain (and not based on the implementation specifics).
- In relational data architecture, this means that a logical data model is purely a representation of business reality in 3rd normal form (this doesn’t apply to dimensional models in DWH), and a physical data model might look too different from the logical data model. This is because LDM follows DDD. PDM follows LDM, but it has to face the reality of implementation on a database and the reality of access patterns of an application.
- When an LDM is very different from the PDM this way, there could be many entities in LDM that wouldn’t be used in PDM. And that’s ok. LDM here serves as an artifact that shares a common language between business and technology.
- Many organizations have LDM and PDM looking the same except for cosmetics and other physical features. For example, descriptive attribute names in LDM, indexes introduced in PDM etc. but model itself is same between LDM and PDM.
- DDD also advocates that design should be centered around specific domains, not as an organization-wide canonical design. Each specific domain and its boundary is called a bounded context.