Make source code readable #24

FSMaxB · 2016-09-11T17:56:17Z

Let's face it, cJSON's source code is looking quite horrible in the current form. The way it is it's bad for potential contributors because it is hard to read and it also makes tracking changes with git harder.

I would volunteer to reformat the code if you let me. I will follow the format you like, but here's a suggestion:

Currently:

/* Parse the input text to generate a number, and populate the result into item. */
static const char *parse_number(cJSON *item,const char *num)
{
    double n=0,sign=1,scale=0;int subscale=0,signsubscale=1;

    if (*num=='-') sign=-1,num  ;   /* Has sign? */
    if (*num=='0') num  ;           /* is zero */
    if (*num>='1' && *num<='9') do  n=(n*10.0) (*num   -'0');   while (*num>='0' && *num<='9'); /* Number? */
    if (*num=='.' && num[1]>='0' && num[1]<='9') {num  ;        do  n=(n*10.0) (*num   -'0'),scale--; while (*num>='0' && *num<='9');}  /* Fractional part? */
    if (*num=='e' || *num=='E')     /* Exponent? */
    {   num  ;if (*num==' ') num  ; else if (*num=='-') signsubscale=-1,num  ;      /* With sign? */
        while (*num>='0' && *num<='9') subscale=(subscale*10) (*num   - '0');   /* Number? */
    }

    n=sign*n*pow(10.0,(scale subscale*signsubscale));   /* number =  /- number.fraction * 10^ /- exponent */

    item->valuedouble=n;
    item->valueint=(int)n;
    item->type=cJSON_Number;
    return num;
}

After reformatting:

/* Parse the input text to generate a number, and populate the result into item. */
static const char *parse_number(cJSON *item,const char *num) {
    double n = 0;
    double sign = 1;
    double scale = 0;
    int subscale = 0;
    int signsubscale = 1;

    /* Has sign? */
    if (*num == '-') {
        sign = -1;
        num  ;
    }

    /* is zero */
    if (*num == '0') {
        num  ;
    }
    if ((*num >= '1') && (*num <= '9')) {
        do {
            n = (n * 10.0)   (*num   - '0');
        } while ((*num >= '0') && (*num <= '9')); /* Number? */
    }
    /* Fractional part? */
    if ((*num == '.') && (num[1] >= '0') && (num[1] <= '9')) {
        num  ;
        do {
            n = (n * 10.0)   (*num   - '0');
            scale--;
        } while ((*num >= '0') && (*num <= '9'));
    }
    /* Exponent? */
    if ((*num == 'e') || (*num == 'E')) {
        num  ;
        /* With sign? */
        if (*num == ' ') {
            num  ;
        } else if (*num == '-') {
            signsubscale = -1;
            num  ;
        }
        while ((*num >= '0') && (*num <= '9')) {/* Number? */
            subscale = (subscale * 10)   (*num   - '0');
        }
    }

    /* number =  /- number.fraction * 10^ /- exponent */
    n = sign * n * pow(10.0, (scale   subscale * signsubscale));

    item->valuedouble = n;
    item->valueint = (int) n;
    item->type = cJSON_Number;
    return num;
}

FSMaxB · 2016-09-11T18:01:12Z

The most important guidelines I want to follow:

No oneliners
- no oneline loops
- no oneline ifs
- no multiple variable declarations in one line
use parenthesis for every control structure
don't rely on operator precedence rules but make it obvious how an expression is evaluated eg.: (a > b) && (c < d) instead of a>b && c<d

DaveGamble · 2016-09-11T18:45:43Z

It seems I have an allergy to whitespace.
I find the first version a lot easier to read - it takes up less space on the screen and all the nested structures are very simple patterns.
To my eyes, the second version looks much longer and more complicated.

Is this just me?

hustanhuang · 2016-09-12T00:49:52Z

A person who is quite familiar with these code won't have much trouble reading the first version.

But it would be a tough task for a beginner to read them.

This repository was recommended to me a few months ago and I really love it. But I do have to say that I spent a lot of time reformatting it during my reading.
(luckily I've got my AStyleFormatter)

FSMaxB · 2016-09-12T02:23:02Z

I just noticed that I'm not the first person to suggest a change of the formatting style, @blakemcbride did it before me in #21.

@DaveGamble I don't think it's just you, but you're in the minority by a large margin.

But let me explain why a change of the formatting is required in my opinion.

Consistency

When reading code it is important to be able to see patterns rightaway without having to manually parse them.
Therefore code should be at least consistent in the way it is formatted.

As long as it's consistent, the coding style can be one of many but it's important to stick to that.
That is why most projects follow one style and at least most bigger ones have an actual document that describes how code should look like.
Some examples:

This is just to show you that most projects out there do have their own style that they stick to, I don't particularly agree with some of them though in some of the details.

But there's more to consider.

Why multiple lines instead of one

I think most people think of source code in terms of lines, so usually you want one line to only do one thing.
But there is another thing to consider here: What happens if you make changes to your oneliners?
Most source code management tools like git work with code in terms of lines.
They are much better at merging changes if they have been made in different lines.
Even tools like diff usually look at code as a bunch of separate lines.

Let's take this example, if for some reason num should be replaced with num (only for demonstration purposes):
Before the change:

if (*num=='e' || *num=='E')     /* Exponent? */
{   num  ;if (*num==' ') num  ; else if (*num=='-') signsubscale=-1,num  ;      /* With sign? */
    while (*num>='0' && *num<='9') subscale=(subscale*10) (*num   - '0');   /* Number? */
}

After the change:

if (*num=='e' || *num=='E')     /* Exponent? */
{   num  ;if (*num==' ')   num; else if (*num=='-') signsubscale=-1,  num;      /* With sign? */
    while (*num>='0' && *num<='9') subscale=(subscale*10) (*  num - '0');   /* Number? */
}

Now the diff looks like this:

2,3c2,3
< {   num  ;if (*num==' ') num  ; else if (*num=='-') signsubscale=-1,num  ;      /* With sign? */
<     while (*num>='0' && *num<='9') subscale=(subscale*10) (*num   - '0');   /* Number? */
---
> {   num  ;if (*num==' ')   num; else if (*num=='-') signsubscale=-1,  num;      /* With sign? */
>     while (*num>='0' && *num<='9') subscale=(subscale*10) (*  num - '0');   /* Number? */

Can you see what has changed?

Now let's take the reformatted version and do the same change:
Before the change:

/* Exponent? */
if ((*num == 'e') || (*num == 'E')) {
    num  ;
    /* With sign? */
    if (*num == ' ') {
        num  ;
    } else if (*num == '-') {
        signsubscale = -1;
        num  ;
    }
    while ((*num >= '0') && (*num <= '9')) {/* Number? */
        subscale = (subscale * 10)   (*num   - '0');
    }
}

After the change:

/* Exponent? */
if ((*num == 'e') || (*num == 'E')) {
      num;
    /* With sign? */
    if (*num == ' ') {
          num;
    } else if (*num == '-') {
        signsubscale = -1;
          num;
    }
    while ((*num >= '0') && (*num <= '9')) {/* Number? */
        subscale = (subscale * 10)   (*  num - '0');
    }
}

Look at the diff now:

3c3
<     num  ;
---
>       num;
6c6
<         num  ;
---
>           num;
9c9
<         num  ;
---
>           num;
12c12
<         subscale = (subscale * 10)   (*num   - '0');
---
>         subscale = (subscale * 10)   (*  num - '0');

And once two people start changing the same line, things get really ugly.
It's much easier to do the merging if the changes are in separate lines, which is why it is a good idea to write only one thing per line.

Why use parenthesis for every control structure

Some style guides prefer not to use parenthesis for control structures, OpenSSL and GNU do this as far as I know.

I still don't think this is a good idea.
This can mislead people for example.

/*
 * this could mislead people into thinking that both function
 * calls are covered by the condition.
 */
if (condition)
    do_something();
    do_something_outside_the_condition();

And stuff like this can happen by accident, sometimes even with severe security implications, see Goto Fail.

Also if you want to add one line to such kind of if statement, you have to actually change 3 lines.
This will also be visible in the diff.

Before the change:

if (condition)
    do_something();

After the change:

if (condition) {
    do_something();
    do_something_else()
}

Diff:

1c1
< if (condition)
---
> if (condition) {
2a3,4
>     do_something_else()
> }

If parentheses are used in the first place, the diff looks like this:

2a3
>     do_something_else()

Another argument for always using parentheses is because it is a more uniform and easier convention to follow.

Why you shouldn't rely on operator precedence

For the example code I reformatted in my earlier comment it might not look like it's necessary, but take a look at this example:

if (a||b && c||d) {
    do_something();
}

This looks like the programmer intended it to be:

if ((a || b) && (c || d)) {
    do_something();
}

But in reality it is:

if (a || (b && c) || d) {
    do_something();
}

You can avoid these kinds of errors by just always specifying your intent clearly.

I hope this explains some of the reasoning behind what I was saying earlier.

DaveGamble · 2016-09-12T18:28:28Z

Max, would you be up for taking this on? Could I make you a collaborator for the repo?

DaveGamble · 2016-09-12T18:34:52Z

NB: I agree that:

if (something)
    do_something();

is entirely dangerous. I'm less convinced that:
if (something) do_something();

is problematic.

On all other points, I think I can largely agree.

blakemcbride · 2016-09-12T23:38:19Z

if (something) do_something();
Is problematic because you can't put a break point on do_something()

On Mon, Sep 12, 2016 at 1:34 PM, Dave Gamble [email protected]
wrote:

NB: I agree that:

if (something)
do_something();

is entirely dangerous. I'm less convinced that:
if (something) do_something();

is problematic.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGKwAA1RoN28hS3be6IJVjiF3wc_nERDks5qpZtOgaJpZM4J6D9d
.

blakemcbride · 2016-09-12T23:49:05Z

On Mon, Sep 12, 2016 at 1:12 PM, Philip Taylor [email protected]
wrote:

....

Subjectively, I think cJSON's style is (in my opinion) clearly ugly, and
such an obvious lack of taste makes me question the judgement of the
developers and the quality of the code and makes me assume there are deeper
problems that aren't as obvious. Maybe that's not fair, but it's how I tend
to react when reading code like this.

Dave's response to my original comment was basically "It's my code. I do
what the hell I like with it. It's too bad if you don't like it."

My guess is he writes it like he does to make the point that he does what
he wants and isn't controlled by anyone. Do you know people like that?

I just moved on.

Blake

DaveGamble · 2016-09-12T23:57:50Z

I'm recanting on my position, having had a few more requests. Given that, your reply feels rather uncalled for. It wasn't my intention to ruin your day by not immediately acquiescing to a single request, however reasonable you felt it was.

Is there some kind of game in the open source world where you write unkind comments to people volunteering their free time, that I'm not in on?

DaveGamble · 2016-09-13T00:16:13Z

In practical terms, this comes down to making the library better. White space alone does not achieve that. On the other hand, collaboration does. Max has made excellent and valuable contributions, and hence I'm very interested to hear his input. The statistics suggest that I'm responsible for the majority of fixes and features. Perhaps that could change, although for a library with such limited scope, it's not entirely clear what that would mean. So there's balance to strike between making the code accessible to newcomers vs leaving it as something I am able to maintain for the much larger majority of users who use it without needing to trace the code.

If there were numerous issues we would see more bug reports. If people were having to debug the library, they might just mention it. Perhaps there needs to be more focus on reporting parse errors (something I've added recently), or on compliance (upcoming commit improves conformance benchmark score).

blakemcbride · 2016-09-13T00:44:09Z

On Mon, Sep 12, 2016 at 6:57 PM, Dave Gamble [email protected]
wrote:

I'm recanting on my position, having had a few more requests. Given that,
your reply feels rather uncalled for. It wasn't my intention to ruin your
day by not immediately acquiescing to a single request, however reasonable
you felt it was.

Is there some kind of game in the open source world where you write unkind
comments to people volunteering their free time, that I'm not in on?

Admittedly, I was rude. I apologize. I appreciate your work. At least
one of my contributions took more than two years of much more than 40 hours
per week. Followed by many years of enhancements. It includes a 350 page
and a 250 page manual. So, I feel your pain. (
https://github.com/blakemcbride/Dynace )

Blake

FSMaxB · 2016-09-13T01:34:23Z

Let's return to the discussion on hand.

Yes, I'd be glad to do the reformatting.

About if (condition) do_something;:
It is going against the principle of always using parenthesis for control structures. And it makes the diff look worse if a change is made. Also as blakemcbride mentioned, you cannot set a breakpoint in a debugger. Actually I haven't thought of this aspect at all.

I also found another potential problem:

#define multiline_macro(parameter) do_something(parameter);\
    do_some_more();

if (condition) multiline_macro(parameter);

Although there are no macros like that in cJSON and it's better to write macros that are surrounded by curly braces, using the curly braces for every condition is still a good safeguard.

FSMaxB · 2016-09-13T01:45:46Z

So let's discuss the style that we are going for. My suggestion:

if. while, do while, for

if (condition) {
    do_something();
}

for (int i = 0; i < 10; i  ) {
    do_something();
}

do {
    something();
} while (condition);

while (condition) {
    do_something();
}

switch case:

switch (number) {
    case '0':
        do_something();
        break;
    case '1':
    case '2'":
        do_something_else();
        break;
    default:
        whatever();
}

assignments and declarations:
```c
double one = 1;
double two = 2;
double three = one   two;

Tabs/spaces?
Currently cJSON is using both, but mostly tabs, so we should stick with tabs I guess? (I personally prefer tabs because it leaves the decision of how wide the indentation is up to the individual programmer).

conditions:

bool condition = (a || b) && (c || d);

But if you prefer less spaces this can be done of course.

Anything you'd like me to do differently?

Also you were writing that you are currently working on some more changes, so should I wait until you're done with that before I do the reformatting?

fgimian · 2016-09-13T12:12:23Z

If I could please also add just a few minor suggestions:

I think it's better to switch to 4 space soft indent instead of tabs (makes it easier to read on GitHub and will always look the same in any editor)
I personally prefer that the opening brace is on the following line (and think Dave does too)

e.g.

if (condition)
{
    do_something();
}

for (int i = 0; i < 10; i  )
{
    do_something();
}

do
{
    something();
} while (condition);

while (condition)
{
    do_something();
}

Another little point for me is:

don't rely on operator precedence rules

I would argue that we SHOULD rely on operator precedence, particularly for logical operators.

e.g.

    if (*num >= '1' && *num <= '9')
    {
        do
        {
            n = (n * 10.0)   (*num   - '0');
        } while (*num >= '0' && *num <= '9'); /* Number? */
    }

Apart from that, I personally feel that @FSMaxB has made some good suggestions here.

Kindest Regards
Fotis

benlachman · 2016-09-13T15:45:12Z

Ah we've arrived at the point where we get to niggle about age old disputes. Personally, I dislike the added (and IMO useless) whitespace that braces on their one line create. But, I think this sort of thing, as well as the indentation style, should default to @DaveGamble's preference since there are tons of people on both sides of small style nits like this and no generally agreed norm.

fgimian · 2016-09-14T10:58:42Z

Agreed, I can adapt to both styles. My suggestions were based on my knowledge of Dave's codin style (I work with him) 😄

FSMaxB · 2016-09-14T13:45:03Z

@DaveGamble As soon as you make your decision I can start reformatting.

Let's break it up in some boolean choices to make it is easier:

A) Parenthesis for functions and control structures:

if (condition) {
    do_something();
}

if (condition)
{
    do_something();
}

B) Operators

double a=b c;

double a = b   c;

C) Function parameters

function(a, b, c);

function(a,b,c);

D) Indentation

Tabs
Spaces (how many?)

And a list of other things that should be changed from my proposal of course (if there are any).

fgimian · 2016-09-15T05:51:50Z

Thanks @FSMaxB 😄

@DaveGamble just an fyi; in our codebase, I'm personally using A2, B2, C1, D 4 spaces 😄

blakemcbride · 2016-09-15T10:53:07Z

It is my opinion that going back and forth on various coding styles is antithetical to the problem. Any of the standard styles is a vast improvement over what is there now, and no one is ever going to come to a consensus as to the best style. Let the author pick his favorite standard style and run with it, period.

FSMaxB · 2016-09-27T12:13:30Z

Ping @DaveGamble

DaveGamble · 2016-09-27T12:59:19Z

@FSMaxB did you reach a decision as to whether you'd accept becoming a collaborator and taking this task on?

DaveGamble · 2016-09-27T13:01:31Z

Alternatively, if anyone wants to generate a pull request, I'm open to it :)

FSMaxB · 2016-09-27T13:13:56Z

Yes, I'd like to do this as soon as you decide on the coding style.

DaveGamble · 2016-09-27T13:33:42Z

A2, B2, C1, D4 please. :)

Collaborator request sent :)

vikerian · 2016-09-27T13:34:57Z

I'm working on it, btw i found something "hardly understandable" in
source code, i made comments like /* [email protected] .....*/ to that places.

Viker

On 09/27/2016 03:01 PM, Dave Gamble wrote:

Alternatively, if anyone wants to generate a pull request, I'm open to
it :)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#24 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AIiwOFLmFakQnmxE7VxP0lVHxKuOaszIks5quROsgaJpZM4J6D9d.

S pozdravem Vilem Kebrt
email: [email protected]
tel: 420 604 550 132

DaveGamble · 2016-09-27T13:36:50Z

@vikerian @FSMaxB are you both happy to interact on this?

FSMaxB · 2016-09-27T13:45:53Z

I'll start working on this. It's probably best to do function by function commits so it is easier to merge later. I'll create a WIP pull request.

FSMaxB · 2016-09-27T13:47:33Z

Is it Ok if I add comments if I think something needs more explanation?

DaveGamble · 2016-09-27T13:49:39Z

Of course! That would be great! :)

FSMaxB · 2016-09-27T13:50:24Z

@vikerian: If you want to contribute, why don't you start from the bottom of the file and I do it from the top? If we do one commit per function, we can merge it later.

vikerian · 2016-09-27T14:24:16Z

already started from top.
damn. ok. i will pull actual and start from bottom :)

vikerian · 2016-09-27T14:55:27Z

pull created (2 functions from bottom), doing it in work, so not enough time to make more, will continue :-)

fgimian · 2016-09-29T05:36:38Z

clang-format will help you greatly guys 😄

Here's the config I'm using which I believe aligns with Dave's request:

---
# We'll use defaults from the LLVM style, but with 4 columns indentation.
BasedOnStyle: LLVM

# Maximum line length
ColumnLimit: 99

# Indentation settings
IndentWidth: 4

# Ensure that opening braces are always on their own line
BreakBeforeBraces: Allman

# Ensure that individual cases in a switch are indented
IndentCaseLabels: true

# Ensure that class keywords public, private and protected at all the way to the
AccessModifierOffset: -4

# Ensure that arguments for long lines start on the following line
AlignAfterOpenBracket: AlwaysBreak

# Don't allow one line functions
AllowShortFunctionsOnASingleLine: None

FSMaxB · 2016-09-29T06:04:40Z

I've been trying to use ClangFormat and uncrustify, but it messes up the comments and I take the reformatting as an opportunity to read and understand the code again and add more comments.

FSMaxB · 2016-09-30T17:33:37Z

Status:

Main file (done in Reformatting code #29)
tests
cJSON_Utils

fgimian · 2016-10-01T06:08:02Z

Looks awesome so far @FSMaxB ! 😄

DaveGamble · 2016-10-09T18:49:07Z

Awesome work @FSMaxB !!! :D:D

FSMaxB · 2016-10-10T02:55:51Z

I will continue with the tests and cJSON_Utils as soon as I find the time.

FSMaxB · 2016-10-29T15:54:06Z

Done, all the C code has been reformatted.

FSMaxB mentioned this issue Sep 11, 2016

Does not compile using the Makefile with GCC 6.2.1 #25

Closed

FSMaxB mentioned this issue Sep 27, 2016

Reformatting code #29

Merged

FSMaxB mentioned this issue Sep 29, 2016

More functions from main file #28

Closed

FSMaxB mentioned this issue Oct 2, 2016

Code formatting problematic #21

Closed

garlick mentioned this issue Oct 5, 2016

small experiment with cJSON flux-framework/flux-core#833

Closed

FSMaxB closed this as completed Oct 29, 2016

FSMaxB added the enhancement label Nov 7, 2016

FSMaxB mentioned this issue Sep 5, 2017

All uses of goto removed. #199

Closed

Make source code readable #24

Make source code readable #24

Comments

FSMaxB commented Sep 11, 2016 • edited Loading

FSMaxB commented Sep 11, 2016

DaveGamble commented Sep 11, 2016

hustanhuang commented Sep 12, 2016

FSMaxB commented Sep 12, 2016 • edited Loading

Consistency

Why multiple lines instead of one

Why use parenthesis for every control structure

Why you shouldn't rely on operator precedence

DaveGamble commented Sep 12, 2016

DaveGamble commented Sep 12, 2016 • edited Loading

blakemcbride commented Sep 12, 2016

blakemcbride commented Sep 12, 2016

DaveGamble commented Sep 12, 2016

DaveGamble commented Sep 13, 2016

blakemcbride commented Sep 13, 2016

FSMaxB commented Sep 13, 2016

FSMaxB commented Sep 13, 2016

fgimian commented Sep 13, 2016 • edited Loading

benlachman commented Sep 13, 2016

fgimian commented Sep 14, 2016

FSMaxB commented Sep 14, 2016

A) Parenthesis for functions and control structures:

B) Operators

C) Function parameters

D) Indentation

fgimian commented Sep 15, 2016 • edited Loading

blakemcbride commented Sep 15, 2016 • edited Loading

FSMaxB commented Sep 27, 2016

DaveGamble commented Sep 27, 2016 • edited Loading

DaveGamble commented Sep 27, 2016

FSMaxB commented Sep 27, 2016

DaveGamble commented Sep 27, 2016

vikerian commented Sep 27, 2016

DaveGamble commented Sep 27, 2016

FSMaxB commented Sep 27, 2016

FSMaxB commented Sep 27, 2016

DaveGamble commented Sep 27, 2016

FSMaxB commented Sep 27, 2016

vikerian commented Sep 27, 2016

vikerian commented Sep 27, 2016

fgimian commented Sep 29, 2016

FSMaxB commented Sep 29, 2016

FSMaxB commented Sep 30, 2016 • edited Loading

fgimian commented Oct 1, 2016

DaveGamble commented Oct 9, 2016

FSMaxB commented Oct 10, 2016

FSMaxB commented Oct 29, 2016

FSMaxB commented Sep 11, 2016 •

edited

Loading

FSMaxB commented Sep 12, 2016 •

edited

Loading

DaveGamble commented Sep 12, 2016 •

edited

Loading

fgimian commented Sep 13, 2016 •

edited

Loading

fgimian commented Sep 15, 2016 •

edited

Loading

blakemcbride commented Sep 15, 2016 •

edited

Loading

DaveGamble commented Sep 27, 2016 •

edited

Loading

FSMaxB commented Sep 30, 2016 •

edited

Loading