Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline. The discussion concerns the
Python language, which supports OOP as well as Structural/Functional approaches to programming.
I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that
this is one of those topics where people may have strong opinions one
way or the other. I am interested in those opinions.
Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline.The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.
I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in thoseopinions.
Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple versuscomplex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.
I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (SoftwareArchitecture: SA Martin). I think Design I best adheres to these principles of:
---- Tolerate change,directory is at the same level of abstraction.
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.
I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers
I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, andservices that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,
albeit at a much higher level (SA Martin).
One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.
Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf
---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf)
SEVERAL DESIGNS FOR COMPARISON
DESIGN I:
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py
Of course one could also
DESIGN II:
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py
or probably even:
DESIGN III:
---- manage_the_etl_pipeline.py
---- extract_transform_load.py
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py
As a practical matter, once you got into working with >extract_transform_load.py (for the other designs), I would expect that
you would start wanting to refactor it and eventually end up more like
DESIGN 1. So you might as well start out that way.
(*Please* let's not have any quibbling about "class" vs
"object". We are at a conceptual level here!)
On 2/3/2023 4:18 PM, transreductionist wrote:pipeline. The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.
Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the
opinions.I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in those
Well, you have pretty well stacked the deck to make DESIGN 1 theversus complex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.
obviously preferred choice. I don't think it has much to do with Python
per se, or even with OO vs imperative style.
As a practical matter, once you got into working with extract_transform_load.py (for the other designs), I would expect that
you would start wanting to refactor it and eventually end up more like DESIGN 1. So you might as well start out that way.
The reasons are 1) what you said about separation of concerns, 2) a
desire to keep each module or file relatively coherent and easy to read,
and 3, as you also suggested, making each of them easier to test.
Decoupling is important too (one of the SOLID prescriptions), but you
can violate that with any architecture if you don't think carefully
about what you are doing.
On the subject of OO, I think it is a very good approach to think about architecture and design in object terms - meaning conceptual objects
from the users' point of view. For example, here you have a pipeline (a metaphorical or userland object). It will need functionality to load, transform, and output data so logically it can be composed of a loader,
one or more transformers, and one or more output formatters (more
objects). You may also need a scheduler and a configuration manager
(more objects).
(*Please* let's not have any quibbling about "class" vs "object". We
are at a conceptual level here!)
When it comes to implementation, you can choose to implement those
userland objects with either imperative, OO, or functional techniques,
or a mixture.
Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple
Architecture: SA Martin). I think Design I best adheres to these principles of:I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (Software
directory is at the same level of abstraction.---- Tolerate change,
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.
I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers
services that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, and
albeit at a much higher level (SA Martin).
One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.
Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf
---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf)
SEVERAL DESIGNS FOR COMPARISON
DESIGN I:
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py
Of course one could also
DESIGN II:
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py
or probably even:
DESIGN III:
---- manage_the_etl_pipeline.py
---- extract_transform_load.py
Keep It Simple: Put all four modules at the top level, and run with it
until you falsify it. Yes, I would give you that same advice no matter
what language you're using.
On 2/3/2023 5:14 PM, 2QdxY4RzWzUUiLuE@potatochowder.com wrote:
Keep It Simple:Â Put all four modules at the top level, and run with it
until you falsify it. Yes, I would give you that same advice no matter
what language you're using.
In my recent message I supported DESIGN 1. But I really don't care much about the directory organization. It's designing modules whose business
is to handle various kinds of operations that counts, not so much the
actual directory organization.
The transform is likely dictated by your client's specification. So,
another separation. Hence Design 1.
There is a strong argument for suggesting that we're going out of our
way to imagine problems or future-changes (which may never happen). If
this is (definitely?) a one-off, then why-bother? If permanence is
likely, (so many 'temporary' solutions end-up lasting years!) then
re-use can?should be considered.
Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline.The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.
I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in thoseopinions.
Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple versuscomplex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.
I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (SoftwareArchitecture: SA Martin). I think Design I best adheres to these principles of:
---- Tolerate change,directory is at the same level of abstraction.
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.
I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers
I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, andservices that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,
albeit at a much higher level (SA Martin).
One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.
Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf
---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf)
SEVERAL DESIGNS FOR COMPARISON
DESIGN I:
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py
Of course one could also
DESIGN II:
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py
or probably even:
DESIGN III:
---- manage_the_etl_pipeline.py
---- extract_transform_load.py
This analogy came to me the other day. For me, I would rather walk into a grocery store where the bananas, apples, and oranges are separated in to their own bins, instead of one common crate.
Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline.The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.
I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in thoseopinions.
Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple versuscomplex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.
I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (SoftwareArchitecture: SA Martin). I think Design I best adheres to these principles of:
---- Tolerate change,directory is at the same level of abstraction.
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.
I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers
I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, andservices that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,
albeit at a much higher level (SA Martin).-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!nME8OhiOxAzmzM3jzg6uXZU851dhWWD9JGB8ZRZIzyUzGkmCN-C6SSXrL59eA2KVIh-y-W0VycJSNb8aYcNnc3hpaHTfyQ$<https://urldefense.com/v3/__https:/mail.python.org/mailman/
One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.
Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf
---- (https://urldefense.com/v3/__https://www.cs.drexel.edu/*yfcai/papers/2019/tse2019.pdf__;fg!!Cn_UX_p3!nME8OhiOxAzmzM3jzg6uXZU851dhWWD9JGB8ZRZIzyUzGkmCN-C6SSXrL59eA2KVIh-y-W0VycJSNb8aYcNnc3jaresNFQ$ )
SEVERAL DESIGNS FOR COMPARISON
DESIGN I:
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py
Of course one could also
DESIGN II:
---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py
or probably even:
DESIGN III:
---- manage_the_etl_pipeline.py
---- extract_transform_load.py
Well, first of all, while there is no doubt as to Dijkstra’s contribution to computer science, I don’t think his description of scientific thought is correct. The acceptance of Einstein’s theory of relativity has nothing to do with internalconsistency or how easy or difficult to explain but rather repeatedly experimental results validating it.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 08:36:07 |
Calls: | 10,387 |
Calls today: | 2 |
Files: | 14,060 |
Messages: | 6,416,660 |