Forum: >>> Magnum BBS <<<

New meet old

From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to All on Sun Mar 31 19:55:30 2024

Someone created a framework for evaluating LLM's ability
to write Cobol.

https://bloop.ai/blog/evaluating-llms-on-cobol

For those that do not bother reading the entire article,
then the conclusion at the bottom is:

<quote>
GPT-4 - the best-performing model - generates a correct solution for
10.27% of problems. Compare this to HumanEval, where it solves 67% of
problems. CodeLlama, one of the best open-source coding models, fares
even worse, with the 34b variant only clocking 2%. COBOLEval is hard.

Looking at the failure cases, we can see that state-of-the-art LLMs
struggle to generate COBOL that even compiles. Only 47.94% of GPT-4
generated solutions compile with GnuCOBOL.
</quote>

Arne

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (0 / 16)
Uptime:	169:43:09
Calls:	10,385
Calls today:	2
Files:	14,057
Messages:	6,416,555