• Programming language similarity

    From Derek Jones@21:1/5 to All on Mon Apr 25 00:00:40 2022
    All,

    There has been remarkably little work that tries to measure
    programming language similarity.

    Yes, there are many multi-language runtime benchmark comparisons, and
    people extract data from Wikipedia to made dubious claims.

    Does anybody know of other kinds of attempts at measuring language
    similarity?

    Here is one approach https://shape-of-code.com/2022/04/24/programming-language-similarity-based-on-their-traits/
    [That seems awfully simplistic. Fortran and PL/I both have FORMAT statements that look
    superficially similar but the semantics are very different. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek Jones@21:1/5 to All on Mon Apr 25 08:59:44 2022
    John,

    https://shape-of-code.com/2022/04/24/programming-language-similarity-based-on-their-traits/
    [That seems awfully simplistic.  Fortran and PL/I both have FORMAT statements that look
    superficially similar but the semantics are very different. -John]

    Many keywords have different meanings, e.g., the do keyword in Fortran/C.

    Even binary operators differ, binary plus for string concatenation.

    The blog post uses a token based approach, which does not require
    lots of time to gather the data.

    A semantics based approach requires lots of head scratching. I made a
    start by collecting information on function definitions (mostly forms
    of argument passing). The semantic traits I looked at tended to have a
    small number of characteristics, so some form of aggregating is needed
    to create significant differences.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Fernando@21:1/5 to All on Mon Apr 25 04:24:38 2022
    Hi Derek,

    Your repository is very nice! Can I use the "language info" part in the class on programming language paradigms? It will be nice to give students some idea about the number of keywords in different programming languages, for
    instance.

    By the way, perhaps you should consider also comparing the languages with regards to the static and the dynamic aspects of their type systems, e.g.: typing discipline (static, dynamic, gradual?), type verification (inference, annotations, mixed?), type enforcement (weak, strong), static type equivalence (nominal, structural, mixed?), etc. That might lead to very different trees. For instance, in your keyword tree, Java and JavaScript are close, but they
    are very different semantically.

    Does anybody know of other kinds of attempts at measuring language
    similarity?

    About that: I don't know of other studies. There is the article on Wikipedia (Programming Languages Comparison), but it does not cite a paper with a comparative study.

    Regards,

    Fernando

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jan Ziak@21:1/5 to Derek Jones on Mon Apr 25 06:00:12 2022
    On Monday, April 25, 2022 at 4:49:03 AM UTC+2, Derek Jones wrote:
    All,

    There has been remarkably little work that tries to measure
    programming language similarity.

    Yes, there are many multi-language runtime benchmark comparisons, and
    people extract data from Wikipedia to made dubious claims.

    Does anybody know of other kinds of attempts at measuring language similarity? ...

    Just some "food for thought" on a conceptually similar topic:

    Denis Roegel: A brief survey of 20th century logical notations (https://hal.inria.fr/hal-02340520/document)

    -atom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Meshach Mitchell@21:1/5 to All on Mon Apr 25 12:06:02 2022
    I could see how that could be interesting as an academic pursuit, but I
    think the dearth of exploration here is most likely because pretty much
    anyone in a position to do that already knows that every turing complete language is equivalent. The comparison, therefore, would be a comparison of placement of syntactic sugar. I have trouble visualizing a real-world use
    for such a comparison, by which I mean, what is the problem that I would be able to solve by knowing which languages are similar? In the current environment, anywhere you would work already has a whole tech stack already mapped out.

    I have actually thought about this, and vaguely remember looking up
    articles on the subject. The article you linked is interesting, but I agree with your analysis; semantic similarity has some value but IMO what really matters is "supported patterns". ie. what a language provides "for free".
    Now., TINSTAAFL, so there is no real "free" but there is some optimization
    done by a language [compiler, interpreter] to support statements
    represented in the grammar. An example that comes to mind is in javascript
    (I know, I *know*, but I have a family, and we need to eat.) Early implementations of async in js used the *Promise* object to implement asynchronous execution, but newer versions of the language use *async* and *await* keywords. The former piggy-backs on the existing OO architecture,
    while the latter, implemented as keywords, is available to lower level abstraction and optimization.

    We've been doing this long enough that a number of "higher level" patterns
    have emerged. The aforementioned asynchronous (threaded, maybe?) execution
    is one. *Events* also come to mind, which are generally implemented as good old-fashioned polling under the hood or function registration and
    hash-lookup. What is actually happening in the machine translates to vastly different computation cost, and seems to me to be non-trivial. I think a meaningful categorization could be done based on this idea of language "provisions" over language semantics, and some deeper analysis of how
    exactly a language [compiler, interpreter] implements what necessarily
    boils down to syntactic sugar.

    To answer your actual question, No, I don't know of other attempts, but I
    can understand the scarcity. Hope my thoughts have some value.

    -- Meshach Mitchell


    On Sun, Apr 24, 2022 at 10:49 PM Derek Jones <derek@nospam-knosof.co.uk>
    wrote:

    All,

    There has been remarkably little work that tries to measure
    programming language similarity.

    Yes, there are many multi-language runtime benchmark comparisons, and
    people extract data from Wikipedia to made dubious claims.

    Does anybody know of other kinds of attempts at measuring language similarity?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek Jones@21:1/5 to All on Mon Apr 25 19:35:43 2022
    Fernando,

    Your repository is very nice! Can I use the "language info" part in the class on programming language paradigms? It will be nice to give students some idea

    Please do. The code is under a GPL license.

    about the number of keywords in different programming languages, for instance.

    I was surprised by the diversity of words used.

    By the way, perhaps you should consider also comparing the languages with regards to the static and the dynamic aspects of their type systems, e.g.: typing discipline (static, dynamic, gradual?), type verification (inference, annotations, mixed?), type enforcement (weak, strong), static type equivalence
    (nominal, structural, mixed?), etc. That might lead to very different trees.

    I looked into building a tree based on allowed implicit types, with
    the hope of coming up with a measure of strong/week typing.

    A list of implicit conversions performed by a language seems like a
    good start. But this approach makes Fortran 77 look like it's strongly
    typed; there are fewer implicit conversions than other languages
    because it supports fewer types, e.g., no enums or pointers. C's
    relatively large number of integer types, and the corresponding
    implicit conversions, make it look weakly typed compared to languages
    with fewer integer types (and hence fewer implicit conversions).

    The list of characteristics you list might be combined in some
    meaningful way, such that a type 'distance' tree could be constructed.
    Lots of careful reading of language specifications would be needed to
    figure out the details.

    About that: I don't know of other studies. There is the article on Wikipedia (Programming Languages Comparison), but it does not cite a paper with a comparative study.

    Some of the Yes/No classifications on this page are somewhat surprising
    (at least to me) https://en.wikipedia.org/wiki/Comparison_of_programming_languages

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek Jones@21:1/5 to All on Mon Apr 25 20:51:30 2022
    Jan,

    Denis Roegel: A brief survey of 20th century logical notations (https://hal.inria.fr/hal-02340520/document)

    This is an interesting collection of decisions made by authors
    over 120 years.

    What makes somebody choose a particular set of symbols.
    My guess is that their past experience is a major factor,
    i.e., the use of symbols they had previously been exposed to.

    Of course it could be something as mundane as the characters
    available on their typewriter, or their printer of the journal
    the work was published in.

    Then again, academics do love to do their own thing. Perhaps
    the decisions are based on the need to be different.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Derek Jones on Mon Apr 25 14:58:12 2022
    On Monday, April 25, 2022 at 1:54:58 PM UTC-7, Derek Jones wrote:

    (snip)

    What makes somebody choose a particular set of symbols.
    My guess is that their past experience is a major factor,
    i.e., the use of symbols they had previously been exposed to.

    Early Fortran was limited by the number of characters available
    on the IBM 026 keypunch. They redefined some of the punch
    codes with different symbols for scientific use, as that was
    easier than designing a whole new machine.

    Much of that was then fixed with EBCDIC in S/360, where
    an 8 bit code allowed, and pretty much required, that they be
    separated. In any case, the characters (with new punches)
    were kept. (And new compilers have an option to accept
    the old punch codes.)

    I do remember punching ALGOL programs on the 026, where
    you had to use the multipunch key, along with big charts on
    the wall, to get the needed characters.

    In any case, character set limitations stay with us long after
    the reason for the limitation has gone.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek Jones@21:1/5 to All on Tue Apr 26 00:50:23 2022
    gah4,

    In any case, character set limitations stay with us long after
    the reason for the limitation has gone.

    More than you probably wanted to know about character set
    history still being with us https://archive.org/details/mackenzie-coded-char-sets

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)