Forum: >>> Magnum BBS <<<

Programming language similarity

From Derek Jones@21:1/5 to All on Mon Apr 25 00:00:40 2022

All,

There has been remarkably little work that tries to measure
programming language similarity.

Yes, there are many multi-language runtime benchmark comparisons, and
people extract data from Wikipedia to made dubious claims.

Does anybody know of other kinds of attempts at measuring language
similarity?

Here is one approach https://shape-of-code.com/2022/04/24/programming-language-similarity-based-on-their-traits/
[That seems awfully simplistic. Fortran and PL/I both have FORMAT statements that look
superficially similar but the semantics are very different. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Derek Jones@21:1/5 to All on Mon Apr 25 08:59:44 2022

John,

https://shape-of-code.com/2022/04/24/programming-language-similarity-based-on-their-traits/
[That seems awfully simplistic. Fortran and PL/I both have FORMAT statements that look
superficially similar but the semantics are very different. -John]

Many keywords have different meanings, e.g., the do keyword in Fortran/C.

Even binary operators differ, binary plus for string concatenation.

The blog post uses a token based approach, which does not require
lots of time to gather the data.

A semantics based approach requires lots of head scratching. I made a
start by collecting information on function definitions (mostly forms
of argument passing). The semantic traits I looked at tended to have a
small number of characteristics, so some form of aggregating is needed
to create significant differences.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Fernando@21:1/5 to All on Mon Apr 25 04:24:38 2022

Hi Derek,

Your repository is very nice! Can I use the "language info" part in the class on programming language paradigms? It will be nice to give students some idea about the number of keywords in different programming languages, for
instance.

By the way, perhaps you should consider also comparing the languages with regards to the static and the dynamic aspects of their type systems, e.g.: typing discipline (static, dynamic, gradual?), type verification (inference, annotations, mixed?), type enforcement (weak, strong), static type equivalence (nominal, structural, mixed?), etc. That might lead to very different trees. For instance, in your keyword tree, Java and JavaScript are close, but they
are very different semantically.

Does anybody know of other kinds of attempts at measuring language

similarity?

About that: I don't know of other studies. There is the article on Wikipedia (Programming Languages Comparison), but it does not cite a paper with a comparative study.

Regards,

Fernando

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jan Ziak@21:1/5 to Derek Jones on Mon Apr 25 06:00:12 2022

On Monday, April 25, 2022 at 4:49:03 AM UTC+2, Derek Jones wrote:

All,

There has been remarkably little work that tries to measure
programming language similarity.

Yes, there are many multi-language runtime benchmark comparisons, and
people extract data from Wikipedia to made dubious claims.

Does anybody know of other kinds of attempts at measuring language similarity? ...

Just some "food for thought" on a conceptually similar topic:

Denis Roegel: A brief survey of 20th century logical notations (https://hal.inria.fr/hal-02340520/document)

-atom

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Meshach Mitchell@21:1/5 to All on Mon Apr 25 12:06:02 2022

I could see how that could be interesting as an academic pursuit, but I
think the dearth of exploration here is most likely because pretty much
anyone in a position to do that already knows that every turing complete language is equivalent. The comparison, therefore, would be a comparison of placement of syntactic sugar. I have trouble visualizing a real-world use
for such a comparison, by which I mean, what is the problem that I would be able to solve by knowing which languages are similar? In the current environment, anywhere you would work already has a whole tech stack already mapped out.

I have actually thought about this, and vaguely remember looking up
articles on the subject. The article you linked is interesting, but I agree with your analysis; semantic similarity has some value but IMO what really matters is "supported patterns". ie. what a language provides "for free".
Now., TINSTAAFL, so there is no real "free" but there is some optimization
done by a language [compiler, interpreter] to support statements
represented in the grammar. An example that comes to mind is in javascript
(I know, I *know*, but I have a family, and we need to eat.) Early implementations of async in js used the *Promise* object to implement asynchronous execution, but newer versions of the language use *async* and *await* keywords. The former piggy-backs on the existing OO architecture,
while the latter, implemented as keywords, is available to lower level abstraction and optimization.

We've been doing this long enough that a number of "higher level" patterns
have emerged. The aforementioned asynchronous (threaded, maybe?) execution
is one. *Events* also come to mind, which are generally implemented as good old-fashioned polling under the hood or function registration and
hash-lookup. What is actually happening in the machine translates to vastly different computation cost, and seems to me to be non-trivial. I think a meaningful categorization could be done based on this idea of language "provisions" over language semantics, and some deeper analysis of how
exactly a language [compiler, interpreter] implements what necessarily
boils down to syntactic sugar.

To answer your actual question, No, I don't know of other attempts, but I
can understand the scarcity. Hope my thoughts have some value.

-- Meshach Mitchell

On Sun, Apr 24, 2022 at 10:49 PM Derek Jones <derek@nospam-knosof.co.uk>
wrote:

All,

There has been remarkably little work that tries to measure
programming language similarity.

Yes, there are many multi-language runtime benchmark comparisons, and
people extract data from Wikipedia to made dubious claims.

Does anybody know of other kinds of attempts at measuring language similarity?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Derek Jones@21:1/5 to All on Mon Apr 25 19:35:43 2022

Fernando,

Your repository is very nice! Can I use the "language info" part in the class on programming language paradigms? It will be nice to give students some idea

Please do. The code is under a GPL license.

about the number of keywords in different programming languages, for instance.

I was surprised by the diversity of words used.

By the way, perhaps you should consider also comparing the languages with regards to the static and the dynamic aspects of their type systems, e.g.: typing discipline (static, dynamic, gradual?), type verification (inference, annotations, mixed?), type enforcement (weak, strong), static type equivalence
(nominal, structural, mixed?), etc. That might lead to very different trees.

I looked into building a tree based on allowed implicit types, with
the hope of coming up with a measure of strong/week typing.

A list of implicit conversions performed by a language seems like a
good start. But this approach makes Fortran 77 look like it's strongly
typed; there are fewer implicit conversions than other languages
because it supports fewer types, e.g., no enums or pointers. C's
relatively large number of integer types, and the corresponding
implicit conversions, make it look weakly typed compared to languages
with fewer integer types (and hence fewer implicit conversions).

The list of characteristics you list might be combined in some
meaningful way, such that a type 'distance' tree could be constructed.
Lots of careful reading of language specifications would be needed to
figure out the details.

About that: I don't know of other studies. There is the article on Wikipedia (Programming Languages Comparison), but it does not cite a paper with a comparative study.

Some of the Yes/No classifications on this page are somewhat surprising
(at least to me) https://en.wikipedia.org/wiki/Comparison_of_programming_languages

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Derek Jones@21:1/5 to All on Mon Apr 25 20:51:30 2022

Jan,

Denis Roegel: A brief survey of 20th century logical notations (https://hal.inria.fr/hal-02340520/document)

This is an interesting collection of decisions made by authors
over 120 years.

What makes somebody choose a particular set of symbols.
My guess is that their past experience is a major factor,
i.e., the use of symbols they had previously been exposed to.

Of course it could be something as mundane as the characters
available on their typewriter, or their printer of the journal
the work was published in.

Then again, academics do love to do their own thing. Perhaps
the decisions are based on the need to be different.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to Derek Jones on Mon Apr 25 14:58:12 2022

On Monday, April 25, 2022 at 1:54:58 PM UTC-7, Derek Jones wrote:

(snip)

What makes somebody choose a particular set of symbols.
My guess is that their past experience is a major factor,
i.e., the use of symbols they had previously been exposed to.

Early Fortran was limited by the number of characters available
on the IBM 026 keypunch. They redefined some of the punch
codes with different symbols for scientific use, as that was
easier than designing a whole new machine.

Much of that was then fixed with EBCDIC in S/360, where
an 8 bit code allowed, and pretty much required, that they be
separated. In any case, the characters (with new punches)
were kept. (And new compilers have an option to accept
the old punch codes.)

I do remember punching ALGOL programs on the 026, where
you had to use the multipunch key, along with big charts on
the wall, to get the needed characters.

In any case, character set limitations stay with us long after
the reason for the limitation has gone.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Derek Jones@21:1/5 to All on Tue Apr 26 00:50:23 2022

gah4,

In any case, character set limitations stay with us long after
the reason for the limitation has gone.

More than you probably wanted to know about character set
history still being with us https://archive.org/details/mackenzie-coded-char-sets

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Centurion
  Sun Jun 8 00:08:04 2025
  from Berea, Ohio via Telnet
- Centurion
  Sat Jun 7 21:52:22 2025
  from Berea, Ohio via Telnet
- Tnmoc
  Sat Jun 7 13:44:20 2025
  from Milton Keynes via Telnet
- Tnmoc
  Sat Jun 7 13:40:01 2025
  from Milton Keynes via Telnet
- Plume
  Sat Jun 7 11:13:29 2025
  from Uk via SSH
- Gwylbert
  Sat Jun 7 08:57:45 2025
  from Sydney, Nsw via Telnet
- Centurion
  Sat Jun 7 04:30:40 2025
  from Berea, Ohio via Telnet
- Plume
  Fri Jun 6 22:32:36 2025
  from Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	486
Nodes:	16 (1 / 15)
Uptime:	148:46:35
Calls:	9,659
Calls today:	1
Files:	13,708
Messages:	6,168,027

Programming language similarity

Who's Online

Recent Visitors

System Info