Forum: >>> Magnum BBS <<<

Re: Organizing modules and their code

From 2QdxY4RzWzUUiLuE@potatochowder.com@21:1/5 to transreductionist on Fri Feb 3 17:14:51 2023

On 2023-02-03 at 13:18:46 -0800,
transreductionist <transreductionist@gmail.com> wrote:

Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline. The discussion concerns the
Python language, which supports OOP as well as Structural/Functional approaches to programming.

I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that
this is one of those topics where people may have strong opinions one
way or the other. I am interested in those opinions.

Okay, I'll start: unless one of extract, transform, or load is already,
or will certainly at some point become, complex/complicated enough to be
its own architectural module with its own architectural substructure; or
you're constructing specific ETL pipelines for specific ETL jobs at the
times the jobs are defined; then I think you're overthinking it.

Note that I say that speaking as a notorious overthinker. ;-)

Keep It Simple: Put all four modules at the top level, and run with it
until you falsify it. Yes, I would give you that same advice no matter
what language you're using.

FWIW, I'm not a big fan of OO, but based on what little I know about
your ETL pipelines, I agree with you that it probably doesn't make a big difference at this level. Define solid (in pretty much any/every sense
of the word, capitalized or not) interfaces between your modules, and
write your code against those interfaces, whether OO or any other
paradigm.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From transreductionist@21:1/5 to All on Fri Feb 3 13:18:46 2023

Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline.
The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.

I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in those
opinions.

Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple versus
complex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.

I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (Software
Architecture: SA Martin). I think Design I best adheres to these principles of: ---- Tolerate change,
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.

I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers
directory is at the same level of abstraction.

I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, and
services that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,
albeit at a much higher level (SA Martin).

One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.

Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf

---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf)

SEVERAL DESIGNS FOR COMPARISON

DESIGN I:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py

Of course one could also

DESIGN II:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py

or probably even:

DESIGN III:

---- manage_the_etl_pipeline.py
---- extract_transform_load.py

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to transreductionist on Fri Feb 3 17:31:26 2023

On 2/3/2023 4:18 PM, transreductionist wrote:

Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline.

The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.

I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in those

opinions.

Well, you have pretty well stacked the deck to make DESIGN 1 the
obviously preferred choice. I don't think it has much to do with Python
per se, or even with OO vs imperative style.

As a practical matter, once you got into working with
extract_transform_load.py (for the other designs), I would expect that
you would start wanting to refactor it and eventually end up more like
DESIGN 1. So you might as well start out that way.

The reasons are 1) what you said about separation of concerns, 2) a
desire to keep each module or file relatively coherent and easy to read,
and 3, as you also suggested, making each of them easier to test.
Decoupling is important too (one of the SOLID prescriptions), but you
can violate that with any architecture if you don't think carefully
about what you are doing.

On the subject of OO, I think it is a very good approach to think about architecture and design in object terms - meaning conceptual objects
from the users' point of view. For example, here you have a pipeline (a metaphorical or userland object). It will need functionality to load, transform, and output data so logically it can be composed of a loader,
one or more transformers, and one or more output formatters (more
objects). You may also need a scheduler and a configuration manager
(more objects).

(*Please* let's not have any quibbling about "class" vs "object". We
are at a conceptual level here!)

When it comes to implementation, you can choose to implement those
userland objects with either imperative, OO, or functional techniques,
or a mixture.

Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple versus

complex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.

I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (Software

Architecture: SA Martin). I think Design I best adheres to these principles of:

---- Tolerate change,
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.

I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers

directory is at the same level of abstraction.

I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, and

services that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,

albeit at a much higher level (SA Martin).

One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.

Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf

---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf)

SEVERAL DESIGNS FOR COMPARISON

DESIGN I:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py

Of course one could also

DESIGN II:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py

or probably even:

DESIGN III:

---- manage_the_etl_pipeline.py
---- extract_transform_load.py

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to transreductionist on Fri Feb 3 23:49:49 2023

transreductionist <transreductionist@gmail.com> writes:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py

I don't make such de,er,cisions upfront.

I start out with one file. That would be your "manage_the_etl_
pipeline.py" I guess. Then, I write everything into that file.
I would split out a module from this when I'd see need for it.

The module "__init__.py" of tkinter, for example, has more than
4500 lines.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to Thomas Passin on Sat Feb 4 00:01:01 2023

Thomas Passin <list1@tompassin.net> writes:

As a practical matter, once you got into working with >extract_transform_load.py (for the other designs), I would expect that
you would start wanting to refactor it and eventually end up more like
DESIGN 1. So you might as well start out that way.

Upfront designs are more possible when someone already has
experience with similar projects. Then he can take some
"shortcuts" and look a bit into the future, as you suggest.

(*Please* let's not have any quibbling about "class" vs
"object". We are at a conceptual level here!)

Talking about classes vs. objects /is/ talking on a
conceptual level.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From transreductionist@21:1/5 to Thomas Passin on Fri Feb 3 16:08:36 2023

On Friday, February 3, 2023 at 5:31:56 PM UTC-5, Thomas Passin wrote:

On 2/3/2023 4:18 PM, transreductionist wrote:

Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the

pipeline. The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.

I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in those

opinions.

Well, you have pretty well stacked the deck to make DESIGN 1 the
obviously preferred choice. I don't think it has much to do with Python
per se, or even with OO vs imperative style.

As a practical matter, once you got into working with extract_transform_load.py (for the other designs), I would expect that
you would start wanting to refactor it and eventually end up more like DESIGN 1. So you might as well start out that way.

The reasons are 1) what you said about separation of concerns, 2) a
desire to keep each module or file relatively coherent and easy to read,
and 3, as you also suggested, making each of them easier to test.
Decoupling is important too (one of the SOLID prescriptions), but you
can violate that with any architecture if you don't think carefully
about what you are doing.

On the subject of OO, I think it is a very good approach to think about architecture and design in object terms - meaning conceptual objects
from the users' point of view. For example, here you have a pipeline (a metaphorical or userland object). It will need functionality to load, transform, and output data so logically it can be composed of a loader,
one or more transformers, and one or more output formatters (more
objects). You may also need a scheduler and a configuration manager
(more objects).

(*Please* let's not have any quibbling about "class" vs "object". We
are at a conceptual level here!)

When it comes to implementation, you can choose to implement those
userland objects with either imperative, OO, or functional techniques,
or a mixture.

Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple

versus complex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.

I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (Software

Architecture: SA Martin). I think Design I best adheres to these principles of:

---- Tolerate change,
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.

I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers

directory is at the same level of abstraction.

I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, and

services that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,

albeit at a much higher level (SA Martin).

One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.

Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf

---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf)

SEVERAL DESIGNS FOR COMPARISON

DESIGN I:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py

Of course one could also

DESIGN II:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py

or probably even:

DESIGN III:

---- manage_the_etl_pipeline.py
---- extract_transform_load.py

On point that I think is worth making ,and I forgot to make it, is that namespaces are ubiquitous in Python: Built-in, Global, Function, and Enclosing namespaces, as well as user namespaces, e.g. dictionaries, the SimpleNamespace, and DataClasses to
list just a few. Modules ARE namespaces. Namespaces organize programming constructs like classes, functions, variables, etc. into coherent groups of "things". To have a namespace that complects extract constructs with transform constructs, and load
constructs in one module seems un-pythonistic.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to 2QdxY4RzWzUUiLuE@potatochowder.com on Fri Feb 3 22:24:03 2023

On 2/3/2023 5:14 PM, 2QdxY4RzWzUUiLuE@potatochowder.com wrote:

Keep It Simple: Put all four modules at the top level, and run with it
until you falsify it. Yes, I would give you that same advice no matter
what language you're using.

In my recent message I supported DESIGN 1. But I really don't care much
about the directory organization. It's designing modules whose business
is to handle various kinds of operations that counts, not so much the
actual directory organization.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dn@21:1/5 to Thomas Passin on Sat Feb 4 18:24:15 2023

On 04/02/2023 16.24, Thomas Passin wrote:

On 2/3/2023 5:14 PM, 2QdxY4RzWzUUiLuE@potatochowder.com wrote:

Keep It Simple: Put all four modules at the top level, and run with it
until you falsify it. Yes, I would give you that same advice no matter
what language you're using.

In my recent message I supported DESIGN 1. But I really don't care much about the directory organization. It's designing modules whose business
is to handle various kinds of operations that counts, not so much the
actual directory organization.

+1 (and to comments made in preceding post)

With ETL the 'reasons to change' (SRP) come from different 'actors'. For example, the data-source may be altered either in format or by changing
the tool you'll utilise to access. Accordingly, the virtue of keeping it separate from other parts. If you have multiple data-sources, then each
should be separate for the same reason.

The transform is likely dictated by your client's specification. So,
another separation. Hence Design 1.

There is a strong argument for suggesting that we're going out of our
way to imagine problems or future-changes (which may never happen). If
this is (definitely?) a one-off, then why-bother? If permanence is
likely, (so many 'temporary' solutions end-up lasting years!) then
re-use can?should be considered.

Thus, when it comes to loading the data into your own DB; perhaps this
should be separate, because it is highly likely that the mechanisms you
build for loading will be matched by at least one 'someone else' wanting
to access the same data for the desired end-purposes. Accordingly, a
shareable module and/or class for that.

We can't see the code-structure, so some of the other parts of your
question(s) are too broad. Here's hoping you and Liskov have a good time together...

My preference is for (what I term) the 'circles' diagram (see copy at https://mahu.rangi.cloud/CraftingSoftware/CleanArchitecture.jpg). This illustrates the 'rule' that code handling the inner functionality not
know what happens at the more detailed/lower-level functional level of
the outer rings.

With ETL, there's precious little to embody various circles, but the
content of the outer ring is obvious. The "T" rules comprise the inner
"Use Case", even if you eschew "Entities" insofar as OOP-avoidance is concerned. This 'inversion', where the inner controls don't need to care
about the details of outer-ring implementation (is it an RDBMS, MySQL or Postgres; or is it some NoSQL system?) brings to life the "D" of SOLID,
ie Dependency Inversion.

You may pick-up some ideas or reassurance from "Making a Simple Data
Pipeline Part 1: The ETL Pattern" (https://www.codeproject.com/Articles/5324207/Making-a-Simple-Data-Pipeline-Part-1-The-ETL-Patte).

Let us know how it turns-out...
--
Regards,
=dn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to dn via Python-list on Sat Feb 4 01:01:45 2023

On 2/4/2023 12:24 AM, dn via Python-list wrote:

The transform is likely dictated by your client's specification. So,
another separation. Hence Design 1.

There is a strong argument for suggesting that we're going out of our
way to imagine problems or future-changes (which may never happen). If
this is (definitely?) a one-off, then why-bother? If permanence is
likely, (so many 'temporary' solutions end-up lasting years!) then
re-use can?should be considered.

With practice, it gets to be more automatic to set things up from the
beginning to more-or-less honor separation of concerns, decoupled
modules and APIs, and so forth. Doing this does not require a full, future-proof suite of alternative database adapters, for example, right
from the start. On top of everything else, you can't know the future
perfectly. And you can't know enough at the beginning to get every
design and architectural path optimal. You learn as you go.

I have a Tomcat application where I separated the output formatting from
the calculation of results. At the time I wrote only an XML formatter.
A decade later, here comes JSON, and customers are asking about it. I
was able to write a JSON formatter with the same API in about half an
hour, and now we have optional JSON output. Separating out the
formatting functionality with its own API was not an example of wasting
time with YAGNI (You Aren't Going To Need It), it was just plain good
practice that probably didn't even cost me any more development time -
since it simplified the calculation code.

OTOH, you may be just trying to learn how to do the bits and pieces. You
may be learning how to connect to the database. You may be learning how
to make a pipeline multithreaded. You have to experiment a thousand
ways, and in a hurry. Until you learn how to do the basic techniques,
sure, quick and dirty is fine. But it shouldn't be the way you design
your actual product, unless it's just for you and needs to be done
quickly, and will probably be ephemeral.

Why do I get the feeling that the OP was asking about a homework problem?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to All on Sat Feb 4 11:17:46 2023

You�re overthinking it. It doesn�t really matter. Having small chunks of codes in separate files can be hassle when trying to find out what the program does. Having one file with 2,000 lines in it can be a hassle. This is art / opinion, not science.

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of transreductionist <transreductionist@gmail.com>
Date: Friday, February 3, 2023 at 4:48 PM
To: python-list@python.org <python-list@python.org>
Subject: Organizing modules and their code
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline.
The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.

I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in those
opinions.

Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple versus
complex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.

I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (Software
Architecture: SA Martin). I think Design I best adheres to these principles of: ---- Tolerate change,
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.

I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers
directory is at the same level of abstraction.

I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, and
services that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,
albeit at a much higher level (SA Martin).

One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.

Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf

---- (https://urldefense.com/v3/__https://www.cs.drexel.edu/*yfcai/papers/2019/tse2019.pdf__;fg!!Cn_UX_p3!jcpCdxiLoPobR0IdlyJHwyPiNP4_iVC6dAMtg_HsLr5hStszx-WnYyZQHJ-4pJTOGsw4-6pEGJyDpSytZQqfpvATg06FMA$ )

SEVERAL DESIGNS FOR COMPARISON

DESIGN I:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py

Of course one could also

DESIGN II:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py

or probably even:

DESIGN III:

---- manage_the_etl_pipeline.py
---- extract_transform_load.py
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jcpCdxiLoPobR0IdlyJHwyPiNP4_iVC6dAMtg_HsLr5hStszx-WnYyZQHJ-4pJTOGsw4-6pEGJyDpSytZQqfpvBaJ2e2VA$<https://urldefense.com/v3/__https:/mail.python.org/mailman/
listinfo/python-list__;!!Cn_UX_p3!jcpCdxiLoPobR0IdlyJHwyPiNP4_iVC6dAMtg_HsLr5hStszx-WnYyZQHJ-4pJTOGsw4-6pEGJyDpSytZQqfpvBaJ2e2VA$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From transreductionist@21:1/5 to transreductionist on Sat Feb 4 14:18:35 2023

Thank you for all the helpful replies and consideration. I do hope for other opinions

I would rather say it is more like engineering than art. Whether it is a matter of overthinking, or under thinking, is another matter. I enjoyed Dijkstra's letter to his colleagues on the role of scientific thought in computer programming. It is located
at:

---- https://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/EWD447.html

It is my academic training in physics that makes me enjoy picking up an idea and examining it from all sides, and sharing thoughts with friends. Just inquisitive curiosity, and not a homework problem,. Thanks for the great link to the ETL site. That was
a good read. A few years back I built a prod ETL application in Golang using gRPC with a multiprocessing pipeline throughout. It handled GB of data and was fast.

This analogy came to me the other day. For me, I would rather walk into a grocery store where the bananas, apples, and oranges are separated in to their own bins, instead of one common crate.

On Friday, February 3, 2023 at 4:18:57 PM UTC-5, transreductionist wrote:

Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline.

The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.

I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in those

opinions.

Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple versus

complex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.

I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (Software

Architecture: SA Martin). I think Design I best adheres to these principles of:

---- Tolerate change,
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.

I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers

directory is at the same level of abstraction.

I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, and

services that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,

albeit at a much higher level (SA Martin).

One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.

Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf

---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf)

SEVERAL DESIGNS FOR COMPARISON

DESIGN I:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py

Of course one could also

DESIGN II:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py

or probably even:

DESIGN III:

---- manage_the_etl_pipeline.py
---- extract_transform_load.py

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Greg Ewing@21:1/5 to transreductionist on Sun Feb 5 13:26:53 2023

On 5/02/23 11:18 am, transreductionist wrote:

This analogy came to me the other day. For me, I would rather walk into a grocery store where the bananas, apples, and oranges are separated in to their own bins, instead of one common crate.

On the other hand, if the store has an entire aisle devoted to each
fruit, but only ever one crate of fruit in each aisle, one would think
they could make better use of their shelf space.

--
Greg

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to transreductionist on Sun Feb 5 15:23:20 2023

Well, first of all, while there is no doubt as to Dijkstra�s contribution to computer science, I don�t think his description of scientific thought is correct. The acceptance of Einstein�s theory of relativity has nothing to do with internal consistency
or how easy or difficult to explain but rather repeatedly experimental results validating it. Or, more precisely, not disproving it. See Feynmann: https://www.youtube.com/watch?v=0KmimDq4cSU

Engineering is simply maximizing the ratio: benefit / cost. Highly recommend To Engineer is Human by Henry Petroski.

Regarding the initial question: none of the suggested designs would work because they lack __init__.py file.

Once the __init__.py is added, the construct of the import statements within it will determine how the API looks. All three of Design I / Design II and Design III can be implemented with the same API. (I�m pretty sure that�s true. If it�s not, I�d be
interested in a counterexample).

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of transreductionist <transreductionist@gmail.com>
Date: Saturday, February 4, 2023 at 7:42 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: Organizing modules and their code
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Thank you for all the helpful replies and consideration. I do hope for other opinions

I would rather say it is more like engineering than art. Whether it is a matter of overthinking, or under thinking, is another matter. I enjoyed Dijkstra's letter to his colleagues on the role of scientific thought in computer programming. It is located
at:

---- https://urldefense.com/v3/__https://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/EWD447.html__;!!Cn_UX_p3!nME8OhiOxAzmzM3jzg6uXZU851dhWWD9JGB8ZRZIzyUzGkmCN-C6SSXrL59eA2KVIh-y-W0VycJSNb8aYcNnc3jd5Pi2fw$<https://urldefense.com/v3/__https:/www.cs.
utexas.edu/users/EWD/transcriptions/EWD04xx/EWD447.html__;!!Cn_UX_p3!nME8OhiOxAzmzM3jzg6uXZU851dhWWD9JGB8ZRZIzyUzGkmCN-C6SSXrL59eA2KVIh-y-W0VycJSNb8aYcNnc3jd5Pi2fw$>

It is my academic training in physics that makes me enjoy picking up an idea and examining it from all sides, and sharing thoughts with friends. Just inquisitive curiosity, and not a homework problem,. Thanks for the great link to the ETL site. That was
a good read. A few years back I built a prod ETL application in Golang using gRPC with a multiprocessing pipeline throughout. It handled GB of data and was fast.

This analogy came to me the other day. For me, I would rather walk into a grocery store where the bananas, apples, and oranges are separated in to their own bins, instead of one common crate.

On Friday, February 3, 2023 at 4:18:57 PM UTC-5, transreductionist wrote:

Here is the situation. There is a top-level module (see designs below) containing code, that as the name suggests, manages an ETL pipeline. A directory is created called etl_helpers that organizes several modules responsible for making up the pipeline.

The discussion concerns the Python language, which supports OOP as well as Structural/Functional approaches to programming.

I am interested in opinions on which design adheres best to standard architectural practices and the SOLID principles. I understand that this is one of those topics where people may have strong opinions one way or the other. I am interested in those

opinions.

Allow me to give my thoughts. First, I don't think there would be much difference if I was using OOP for the functionality, or using a structural paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's comments on simple versus

complex, would be a simpler implementation. In this case there is no reason to create a construct with state. So let's assume the code is structural and not OOP.

I would go with Design I. Succinctly stated, Design I supports readability and maintainability at least as well, if not better than the other designs. The goal of the SOLID principles are the creation of mid-level software structures that (Software

Architecture: SA Martin). I think Design I best adheres to these principles of:

---- Tolerate change,
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.

I could point to the Single Responsibility Principle which is defined as (SA Martin): a module should be responsible to one, and only one, actor. It should satisfy the Liskov Substitution Principle as well. Further, each module in the etl_helpers

directory is at the same level of abstraction.

I could also mention that as Dijkstra stressed, at every level, from the smallest function to the largest component, software is like a science and, therefore, is driven by falsifiability. Software architects strive to define modules, components, and

services that are easily falsifiable (testable). To do so, they employ restrictive disciplines similar to structured programming,

albeit at a much higher level (SA Martin).

One can point to multiple reasons why Design I might be preferred, but what are the compelling reasons, if there are any, that would suggest another design was superior.

Finally, let me reference an interesting research paper I read recently that seems to support the other designs as anti-patterns: Architecture_Anti-patterns_Automatically.pdf

---- (https://urldefense.com/v3/__https://www.cs.drexel.edu/*yfcai/papers/2019/tse2019.pdf__;fg!!Cn_UX_p3!nME8OhiOxAzmzM3jzg6uXZU851dhWWD9JGB8ZRZIzyUzGkmCN-C6SSXrL59eA2KVIh-y-W0VycJSNb8aYcNnc3jaresNFQ$ )

SEVERAL DESIGNS FOR COMPARISON

DESIGN I:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract.py
---- transform.py
---- load.py

Of course one could also

DESIGN II:

---- manage_the_etl_pipeline.py
---- etl_helpers
---- extract_transform_load.py

or probably even:

DESIGN III:

---- manage_the_etl_pipeline.py
---- extract_transform_load.py

-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!nME8OhiOxAzmzM3jzg6uXZU851dhWWD9JGB8ZRZIzyUzGkmCN-C6SSXrL59eA2KVIh-y-W0VycJSNb8aYcNnc3hpaHTfyQ$<https://urldefense.com/v3/__https:/mail.python.org/mailman/
listinfo/python-list__;!!Cn_UX_p3!nME8OhiOxAzmzM3jzg6uXZU851dhWWD9JGB8ZRZIzyUzGkmCN-C6SSXrL59eA2KVIh-y-W0VycJSNb8aYcNnc3hpaHTfyQ$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Greg Ewing@21:1/5 to Gerard on Mon Feb 6 13:15:06 2023

On 6/02/23 4:23 am, Weatherby,Gerard wrote:

Well, first of all, while there is no doubt as to Dijkstra’s contribution to computer science, I don’t think his description of scientific thought is correct. The acceptance of Einstein’s theory of relativity has nothing to do with internal

consistency or how easy or difficult to explain but rather repeatedly experimental results validating it.

I don't think Dijkstra was claiming that what he was talking about
was a *complete* description of scientific thought, only that the
ability to separate out independent concerns is an important part
of it, and that was something he saw his colleagues failing to do.

--
Greg

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet
- Fred Blogs
  Mon Sep 15 00:03:12 2025
  from Uk via SSH
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	08:36:07
Calls:	10,387
Calls today:	2
Files:	14,060
Messages:	6,416,660

Re: Organizing modules and their code

Who's Online

Recent Visitors

System Info