Some code I happen to be working with serializes Perl data structures to JSON. Among other things, this is used for data communication for a
web-based terminal app and hence, it's a bit performance critical. Part
of this code is a loop
for (keys(%$v)) {
}
which is used to serialize Perl hashes. Serializing to JSON supports two modes, a plain one which just generates long strings and a pretty one
whose output is supposed to be human-readable and hence, includes
structural indentation.
Due to the nature of Perl hash traversal, the
order of keys is unpredictable and likely to change between different invocations of the same script. For pretty-printing, that's undesirable because its confusing to humans. Hence, the pretty-printer should sort these keys. Due to performance considerations (according to a quick test, a
Perl method call is about 1.5 times slower than a plain subroutine
call), I didn't want to add a general method call here.
What I did instead was the following: Define a subroutine
sub my_keys
{
keys(%{$_[0]})
}
and use that in place of keys in the loop head. This gets resolved down
to the glob at compile time and then invokes whatever is in the
subroutine slot of the glob at the time of execution. The top-level
pretty printer data addition method then does
local *my_keys = sub {
sort(keys(%{$_[0]}))
};
This cause the formatting method several layers deeper in the callchain
to invoke this subroutine and hence, sort they keys, when being used from
a pretty-printer object.
On Monday, November 7, 2022 at 11:06:02 PM UTC+1, Rainer Weikusat wrote:
Do you mean you don't use any of the JSON modules available in core Perl or CPAN?
Rainer Weikusat <rweikusat@talktalk.net> wrote:
} Some code I happen to be working with serializes Perl data structures to
} JSON. Among other things, this is used for data communication for a
} web-based terminal app and hence, it's a bit performance critical. Part
} of this code is a loop
}
} for (keys(%$v)) {
} }
}
} which is used to serialize Perl hashes. Serializing to JSON supports two
} modes, a plain one which just generates long strings and a pretty one
} whose output is supposed to be human-readable and hence, includes
} structural indentation.
}
} Due to the nature of Perl hash traversal, the
} order of keys is unpredictable and likely to change between different
} invocations of the same script.
Wouldn't the simple
for (sort keys(%$v))
{
}
do the job, or am I missing something?
- saves a signifcant amount of work (>> 1000 LOC)
On Tuesday, November 8, 2022 at 12:43:46 PM UTC+1, Rainer Weikusat wrote:
- saves a signifcant amount of work (>> 1000 LOC)
Cpanel::JSON::XS is more than 2500 lines of Perl and 5000 lines of XS. But, as the saying goes, YMMV.
"E. Choroba" <choroba@matfyz.cz> writes:
On Tuesday, November 8, 2022 at 12:43:46 PM UTC+1, Rainer Weikusat wrote: >>> - saves a signifcant amount of work (>> 1000 LOC)
Cpanel::JSON::XS is more than 2500 lines of Perl and 5000 lines of XS. But, as the saying goes, YMMV.
That J::Random::Bored::Sysadmins::Handoptimized::C::Module for solving a fairly simple task is insanely huge doesn't mean solving the task will require a comparable amount of code.
Rainer Weikusat <rwei...@talktalk.net> writes:
"E. Choroba" <cho...@matfyz.cz> writes:
On Tuesday, November 8, 2022 at 12:43:46 PM UTC+1, Rainer Weikusat wrote: >>> - saves a signifcant amount of work (>> 1000 LOC)
Cpanel::JSON::XS is more than 2500 lines of Perl and 5000 lines of XS. But, as the saying goes, YMMV.
That J::Random::Bored::Sysadmins::Handoptimized::C::Module for solving a fairly simple task is insanely huge doesn't mean solving the task will require a comparable amount of code.To add some context to that: The core of the code I'm using consists of
a JSON parser (I claim to be complete and correct) of 288 lines of code
and a JSON serializer of 210. It beggars belief that someone managed to
spend more than 7500 lines of code just on that.
On Sunday, November 13, 2022 at 8:19:23 PM UTC+1, Rainer Weikusat wrote:
Rainer Weikusat <rwei...@talktalk.net> writes:
"E. Choroba" <cho...@matfyz.cz> writes:To add some context to that: The core of the code I'm using consists of
On Tuesday, November 8, 2022 at 12:43:46 PM UTC+1, Rainer Weikusat wrote: >> >>> - saves a signifcant amount of work (>> 1000 LOC)
Cpanel::JSON::XS is more than 2500 lines of Perl and 5000 lines of XS. But, as the saying goes, YMMV.
That J::Random::Bored::Sysadmins::Handoptimized::C::Module for solving a >> > fairly simple task is insanely huge doesn't mean solving the task will
require a comparable amount of code.
a JSON parser (I claim to be complete and correct) of 288 lines of code
and a JSON serializer of 210. It beggars belief that someone managed to
spend more than 7500 lines of code just on that.
Try running your code against the test suite of the module to see how complete and correct it is. Or try to benchmark it to see which one is faster.
And regardless of the results, it would be great if you could
share your code with the rest of the world.
To add some context to that: The core of the code I'm using consists of
a JSON parser (I claim to be complete and correct) of 288 lines of code
and a JSON serializer of 210. It beggars belief that someone managed to
spend more than 7500 lines of code just on that.
On 11/13/2022 14:03, Rainer Weikusat wrote:
I kinda agree. I didn't bother checking when I needed a parser for JSON, soTo add some context to that: The core of the code I'm using consists of >>> a JSON parser (I claim to be complete and correct) of 288 lines of code >>> and a JSON serializer of 210. It beggars belief that someone managed to >>> spend more than 7500 lines of code just on that.
I wrote my own JSON to Perl hash parser, and indented JSON printer and Perl to JSON converter and each of them is a few hundred Perl lines. Doing the same
for HTML is a lot more complicated, but I did that too. While I was at it,
I wrote an ICS to JSON converter. It's amazing how close to Perl hashes that JSON is (I'd be surprised if the JSON authors didn't model it after Perl).
I tend to depend as little as possible on other people's code for my own projects. Plus it's fun to do it yourself.
On Monday, November 14, 2022 at 3:10:33 AM UTC+1, $Bill wrote:
On 11/13/2022 14:03, Rainer Weikusat wrote:
I kinda agree. I didn't bother checking when I needed a parser for JSON, so >> I wrote my own JSON to Perl hash parser, and indented JSON printer and Perl >> to JSON converter and each of them is a few hundred Perl lines. Doing the same
To add some context to that: The core of the code I'm using consists of >> >>> a JSON parser (I claim to be complete and correct) of 288 lines of code >> >>> and a JSON serializer of 210. It beggars belief that someone managed to >> >>> spend more than 7500 lines of code just on that.
for HTML is a lot more complicated, but I did that too. While I was at it, >> I wrote an ICS to JSON converter. It's amazing how close to Perl hashes that >> JSON is (I'd be surprised if the JSON authors didn't model it after Perl). >>
I tend to depend as little as possible on other people's code for my own
projects. Plus it's fun to do it yourself.
If you want to see what your code missed, you can try running the test
suite of Cpanel::JSON::XS against your code. If such things never
occur in the JSONs you need to process, you're lucky.
"E. Choroba" <choroba@matfyz.cz> writes:
On Monday, November 14, 2022 at 3:10:33 AM UTC+1, $Bill wrote:
On 11/13/2022 14:03, Rainer Weikusat wrote:
I kinda agree. I didn't bother checking when I needed a parser for JSON, so >>> I wrote my own JSON to Perl hash parser, and indented JSON printer and Perl >>> to JSON converter and each of them is a few hundred Perl lines. Doing the same
To add some context to that: The core of the code I'm using consists of >>> >>> a JSON parser (I claim to be complete and correct) of 288 lines of code >>> >>> and a JSON serializer of 210. It beggars belief that someone managed to >>> >>> spend more than 7500 lines of code just on that.
for HTML is a lot more complicated, but I did that too. While I was at it, >>> I wrote an ICS to JSON converter. It's amazing how close to Perl hashes that
JSON is (I'd be surprised if the JSON authors didn't model it after Perl). >>>
I tend to depend as little as possible on other people's code for my own >>> projects. Plus it's fun to do it yourself.
If you want to see what your code missed, you can try running the test
suite of Cpanel::JSON::XS against your code. If such things never
occur in the JSONs you need to process, you're lucky.
Chances are that the JSON specification is smaller than this module and
it's really not difficult to implement. The only thing that's a bit
hairy are Unicode surrogates. There are simply no "such things" which
could occur in it (assuming the usual assumptions about whitespace are
not being made).
To elaborate on this a little: A JSON-something is composed of a
sequence of typed token with optional whitespace between them. The type
of a token can always be determined by examining its first
character. All tokens except strings, numbers and literals are composed
of a single character. There are three literals, true, false and null. A string always starts and ends with a ". A number always ends with a
digit. Hence, lexical analysis generally works as follows:
1. Skip over horizontal whitespace.
2. Look at the next character to determine the type of the next
token. There are three possible cases here:
2a) The next character cannot start a token => error.
2b) The token type is not valid in the given context => error.
2c) Process the valid token.
Token start characters and associated types are:
{ object start
} object end
[ array start
] array end
" string
, item separator
: key-value separtor
f
n
t literal
-
0
1
2
3
4
5
6
7
8
9 number
A JSON document is a JSON value. A JSON value is either
1. A literal.
2. A string.
3. A number.
4. An array.
5. An object.
An array is possibly empty, a comma-separated list of JSON values
enclosed by [].
An object is a possibly empty, comma-sepratated list of key-value pairs enclosed by {}.
A key-value pair is a token sequence <string><colon><JSON value>.
And that's it (minus the inner string syntax).
In comp.lang.perl.misc, Rainer Weikusat <rweikusat@talktalk.net> wrote:
To elaborate on this a little: A JSON-something is composed of a
sequence of typed token with optional whitespace between them. The type
of a token can always be determined by examining its first
character. All tokens except strings, numbers and literals are composed
of a single character. There are three literals, true, false and null. A
string always starts and ends with a ". A number always ends with a
digit. Hence, lexical analysis generally works as follows:
1. Skip over horizontal whitespace.
2. Look at the next character to determine the type of the next
token. There are three possible cases here:
2a) The next character cannot start a token => error.
2b) The token type is not valid in the given context => error.
2c) Process the valid token.
Token start characters and associated types are:
{ object start
Tab damage.
} object end
[ array start
] array end
" string
, item separator
: key-value separtor
f
n
t literal
-
0
1
2
3
4
5
6
7
8
9 number
Seems to me you've got at least one bug here.
A JSON document is a JSON value. A JSON value is either
1. A literal.
2. A string.
3. A number.
4. An array.
5. An object.
An array is possibly empty, a comma-separated list of JSON values
enclosed by [].
An object is a possibly empty, comma-sepratated list of key-value pairs
enclosed by {}.
JSON is a lot stricter about commas than many other new languages, like
Perl. It's not hard to imagine someone getting it wrong based on a spec
as limited as yours. [1,2,3] -> okay [1,2,3,] -> invalid
A key-value pair is a token sequence <string><colon><JSON value>.
And that's it (minus the inner string syntax).
What dragons could be lurking in there?
Eli the Bearded <*@eli.users.panix.com> writes:
Seems to me you've got at least one bug here.I don't think so.
In comp.lang.perl.misc, Rainer Weikusat <rweikusat@talktalk.net> wrote:
Eli the Bearded <*@eli.users.panix.com> writes:
Seems to me you've got at least one bug here.I don't think so.
:r! getarticle-nntp '<eli$2211161808@qaz.wtf>' | tail -1 | /usr/games/rot13 as written the spec above does not allow floating point numbers
/^-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-][0-9]+)?$/
Source: https://www.json.org/json-en.html (errors made in translation
are mine).
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (3 / 13) |
Uptime: | 08:01:15 |
Calls: | 10,387 |
Calls today: | 2 |
Files: | 14,058 |
Messages: | 6,416,655 |