Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline Comparison? #26

Open
mrlooi opened this issue Dec 25, 2017 · 39 comments
Open

Baseline Comparison? #26

mrlooi opened this issue Dec 25, 2017 · 39 comments

Comments

@mrlooi
Copy link

mrlooi commented Dec 25, 2017

Is there a baseline for comparing the learned model e.g. a benchmark software to evaluate against? It would be useful for us to know how effective the learning algorithm actually is.

For example, what do you mean by "Won the App LV x?" Does it mean that if the model beat the app even once, it counts as a win even if it loses the other times?

I downloaded your "best model" and "newest model", and played both networks against grhino AI (level 2). Sadly, both networks got destroyed by grhino on multiple tries. If you have a benchmark of levels to beat before grhino, that would be really helpful

@evalon32
Copy link

In README, it says the "App" is this: https://itunes.apple.com/ca/app/id574915961. I'm not familiar with it and don't have an iOS device, but I'm guessing it's not that strong.
For what it's worth, I've also been testing the networks against grhino, with similar results. I've had RAZ beat ghrino L2 once, but only because it got lucky. That said, I think it's a good sign that RAZ can now tell that its position gradually deteriorates (the evaluation goes relatively smoothly from 0 to -1 over the course of the game). Earlier, it often had no idea. It also used to lose consistently to grhino L1; now it usually wins (sadly, it's usually because grhino L1 blunders in a won position).

@mokemokechicken
Copy link
Owner

Hi @vincentlooi

Is there a baseline for comparing the learned model e.g. a benchmark software to evaluate against?

I use iOS app of https://itunes.apple.com/ca/app/id574915961 as the benchmark.
The app has 1 ~ 99 levels.

For example, what do you mean by "Won the App LV x?" Does it mean that if the model beat the app even once, it counts as a win even if it loses the other times?

Yes.
"Won the App LV x?" means the model won the level at least once (regardless of the number of losses).

I downloaded your "best model" and "newest model", and played both networks against grhino AI (level 2). Sadly, both networks got destroyed by grhino on multiple tries. If you have a benchmark of levels to beat before grhino, that would be really helpful

I didn't know grhino.
And I confirmed that the newest model loses grhino Lv2...

@mokemokechicken
Copy link
Owner

Hi @evalon32

In README, it says the "App" is this: https://itunes.apple.com/ca/app/id574915961. I'm not familiar with it and don't have an iOS device, but I'm guessing it's not that strong.

The app has levels of 1~99.
Maybe the lv29 is not so strong.

For what it's worth, I've also been testing the networks against grhino, with similar results. I've had RAZ beat ghrino L2 once, but only because it got lucky.

I would like you to tell me, what is RAZ? (I couldn't search it in google ...)

That said, I think it's a good sign that RAZ can now tell that its position gradually deteriorates (the evaluation goes relatively smoothly from 0 to -1 over the course of the game)

I also think that is a good feature.
In my newest model, the evaluation often plummets.

@evalon32
Copy link

evalon32 commented Dec 26, 2017

I would like you to tell me, what is RAZ? (I couldn't search it in google ...)

Oh sorry, RAZ = reversi-alpha-zero :)

@mokemokechicken
Copy link
Owner

RAZ = reversi-alpha-zero :)

Oh, I see! (^^

@mokemokechicken
Copy link
Owner

FYI:

  • the App LV29 vs grhino Lv2: LV29 won 2 times and lost 0 times.
  • the App LV29 vs grhino Lv3: LV29 won 0 times and lost 1 time.

@evalon32
Copy link

I just had the newest model play a match of 10 games vs grhino L2 (took forever, since I don't have a GPU).
It won 2 out of 5 as black and 2 out of 5 as white. Getting exciting!

@mokemokechicken
Copy link
Owner

That's good!

took forever, since I don't have a GPU

FYI:
I am also evaluating on Mac(not have a GPU),
optimized TensorFlow(1.4) is about 3~5 times faster than normal pip CPU version.
https://www.tensorflow.org/install/install_sources

@mrlooi
Copy link
Author

mrlooi commented Jan 2, 2018

I managed to make some progress in training the model. I played the model against grhino lv2 5 times: 4 wins, 1 loss. Still lost vs grhino lv3 though. I also played the model against the newest/best model in your download script, and had a win rate of ~85% over roughly 25 games.

I managed to train this model over the course of a week from scratch (on 1080 GPU), by constantly removing old data (data older than 1-2 days) manually from the data/play_data folder each time while the model keeps self-playing.

The current training method in your script trains on all data in the folder regardless of when the data was created, which means training per epoch iteration will always become longer as self-play generates more and more data. I'm not sure if this is necessary, since old data reflects older policy and not necessarily the newest policy, and hence could be redundant at the cost of more training steps and potentially overfitting. Perhaps it might be a good idea to weight the data based on how recently it was played i.e. how much the data reflects the latest policy, or consider turning the data into a fixed size buffer (perhaps 250k-300k samples) that discards old data as new ones are generated

EDIT:
Just beat grhino lv3! The model now beats grhino lv2 almost every time, getting exciting

@mokemokechicken
Copy link
Owner

mokemokechicken commented Jan 4, 2018

@vincentlooi

Thank you for sharing exciting information!

EDIT: Just beat grhino lv3! The model now beats grhino lv2 almost every time, getting exciting

That's great!!

I managed to train this model over the course of a week from scratch (on 1080 GPU), by constantly removing old data (data older than 1-2 days) manually from the data/play_data folder each time while the model keeps self-playing.

Nice try!
I also think it is one of the important hyperparameter.
The max sample number of training data can be changed by PlayDataConfig#{nb_game_in_file,max_file_num} (used here ).

I will change the parameter in my training.
In my environment, the number of training data files generated by self-play is about 100/day (500 games/day).
So, it seems better to set max_file_num around 300 (currently 2000).

@apollo-time
Copy link

what is the best reversi game?
I have not iPhone but have Mac.
My model beats all of Android Reversi and Windows App Reversi.

@mokemokechicken
Copy link
Owner

@apollo-time

I use GRhino by docker on mac.
FYI: https://github.com/mokemokechicken/grhino-docker

@gooooloo
Copy link
Contributor

gooooloo commented Jan 8, 2018

@mokemokechicken @vincentlooi @evalon32 When playing with GRhino, besides the "level" setting, what is your "open book varation" setting? I am playing Ubuntu GRhino with my model, and want to do a (indirect) comparsion with yours. Thanks.

@apollo-time
Copy link

apollo-time commented Jan 9, 2018

My model (black) beats GRhino lv5 with open book variation "Low" and randomness 0 now.
I make web player html, but I haven't any server to run tensorflow model.

@mokemokechicken
Copy link
Owner

@gooooloo My open book variation is "Low".

@gooooloo
Copy link
Contributor

gooooloo commented Jan 9, 2018

@mokemokechicken gotcha. Thanks.

@apollo-time
Copy link

apollo-time commented Jan 10, 2018

I see "Online Reversi" on the Microsoft Store is very excellent.
My model beats level 2 hardly. (2018/01/10)
My model beats level 3 hardly now. (2018/01/11)

@gooooloo
Copy link
Contributor

gooooloo commented Jan 12, 2018

Hi everyone, I found http://www.orbanova.com/nboard/ is very strong. Also it supports many levels to play with. Would be a good baseline to compare with.

@mokemokechicken
Copy link
Owner

@gooooloo it's great! Thank you very much!

@mokemokechicken
Copy link
Owner

I implemented NBoard Protocol.

@gooooloo
Copy link
Contributor

@mokemokechicken Just a report, my model beats Lv99 using 800 simulations per move setting. See https://play.lobi.co/video/17f52b6e921be174057239d39d239b6061d3c1c9. The AlphaGoZero method works. I am also using 800 simulations per move when self play. I keep the evaluator alive, with best model replacing condition: ELO rating >= 150 among 400 games( with ELO rating we are counting draw games in) . I am using 2 historical boards as Neural Network input, which means a shape of 588.

Besides, when playing with the App, I found using 40 or 100 simulations per move setting is already quite strong. The 100 sims setting beats Lv98 easily. But Lv99 is more difficult than Lv98, I tested 40/100/400 sims and all of them loses, until I changed to 800 sims.

@mokemokechicken
Copy link
Owner

@gooooloo

Great! Congratulation!!

I am surprised to hear from this report!

800 simulations per move

After all, in order to be strong, it may be necessary to use large "simulations per move" in self-play, isn't it?
I am feeling that "simulations per move" decides the model's upper strength.

2 historical boards as Neural Network input, which means a shape of 5 * 8 * 8

It is very interesting.
Why do you use history?
Do you think it brought good effects?

@gooooloo
Copy link
Contributor

gooooloo commented Jan 22, 2018

@mokemokechicken it is halfly because of your great implementation. So thank you :)

After all, in order to be strong, it may be necessary to use large "simulations per move" in self-play, isn't it?

I also think so. At first, I was using 100 sims per move. I wanted a fast self play speed. After about 100k steps( batch_size = 3072 ),it seemed got stuck and not improving. Then changed to 800 sims. Then at about 200k steps, it has become quite strong. My final model beating lv99 is at 300k steps.

I think what is also worthy mentioning is that, although I changed to 800 sims, I didn't make the overall selfplay too much slower. I did this by separating MCTS and Neural Network to different processes. They communicate via named pipes. Then I can run several MCTS processes and only 1 Neural Network process at the same time. This idea is borrowed from this repo (Thanks @Akababa ). By doing this, I make full use of GPU and CPU. Although a simple game get slower due to 800 sims, but multi-games parallelization saves back a lot. ---- I am mentioning this because I think in AlphaGoZero method, self play speed does matter.

2 historical boards as Neural Network input, which means a shape of 5 * 8 * 8

Why do you use history?

Because I happened to see this reddit post from David Silver @ DeepMind. This is the quote:

it is useful to have some history to have an idea of where the opponent played recently - these can act as a kind of attention mechanism (i.e. focus on where my opponent thinks is important)

I use this implementation from the beginning and didn't test the 3 * 8 * 8 shape, so I don't have the experience to say. But I believe it is possible to bring an "attention" chance ( by subtracting the previous board ). Maybe it helps.

At last, I am using 6GPU: 5 Tesla P40(1 for optimaztion, 4 for self play) 1 Tesla M40(for evaluator). Maybe it is mostly because of the computation force...

@mokemokechicken
Copy link
Owner

@gooooloo

Thank you for your reply.

I think what is also worthy mentioning is that, although I changed to 800 sims, I didn't make the overall selfplay too much slower. I did this by separating MCTS and Neural Network to different processes. They communicate via named pipes.

Great. I think it is the best implementation.

I believe it is possible to bring an "attention" chance ( by subtracting the previous board ). Maybe it helps.

I see.
I could not think of that possibility. It is very interesting.

At last, I am using 6GPU: 5 Tesla P40(1 for optimaztion, 4 for self play) 1 Tesla M40(for evaluator). Maybe it is mostly because of the computation force...

That's very powerful!! :)

@apollo-time
Copy link

apollo-time commented Jan 23, 2018

@gooooloo Um... Is really useful history?
When use history, the player can not play on the one board state.
I see some games as chess have must play from some board state that it is not initial state.

@gooooloo
Copy link
Contributor

@apollo-time do you mean the first step of game? As the AlphaGoZero paper mentions, all-zero board are used if there is not enough history boards.

8 feature planes Xt consist of binary values indicating the presence of the current player’s stones (Xti = 1 if intersection i contains a stone of the player’s colour at time-step t; 0 if the intersection is empty, contains an opponent stone, or if t < 0)

"t < 0" is the case here.

@apollo-time
Copy link

@gooooloo No, I mean that some game can play from some board state that is not initial state, for example Chess Puzzles.

@apollo-time
Copy link

@gooooloo can u beat windows online reversi game level 5?

@gooooloo
Copy link
Contributor

@apollo-time

No, I mean that some game can play from some board state that is not initial state, for example Chess Puzzles.

I see. I don't consider that case.

can u beat windows online reversi game level 5?

I don't have a windows system( I will try to find one ). But I can't beat NBoard's Novello 20 level. ( I can beat 10 level though with 1600 sims per move). Nor the NTest 30 level.

@apollo-time
Copy link

@gooooloo thanks, My question is same with Cassandra120's

@gooooloo
Copy link
Contributor

@apollo-time

can u beat windows online reversi game level 5?

I just played with it, using the same model and simulations_per_move(which is 800) with the Lv99 game, and I win online reversi game level 5 ( 2:0 ), lose to level 6 ( 1:3 ).

@apollo-time
Copy link

apollo-time commented Jan 24, 2018

@gooooloo My model(simulations_per_move=800) beats online reversi game level 4 now, and my model don't use history.
But do you feel the model improved continuously?

@gooooloo
Copy link
Contributor

@apollo-time I had another new generation model the day before yesterday, but not getting any better model these two days. Let's wait for some more days and see.

@AranKomat
Copy link

@gooooloo

After about 100k steps( batch_size = 3072 ),it seemed got stuck and not improving.

That's also the case in AlphaZero. The performance more or less stagnated after that point. But they achieved an already strong performance (with difference board game) at 100k iters not only due to using 800 sims/move but also due to their large architecture and large buffer. Also, they did one iteration of update for each 30 or so games (3M games after 100k iters), which may not be the case in the implementation of @mokemokechicken, Zeta36 and Akababa.

How was your case? Did you use "normal" setting instead of "mini" of config?

@gooooloo
Copy link
Contributor

@AranKomat

... due to their large architecture ...

my config (the network architecture are same as @mokemokechicken 's original implementation) :

class ModelConfig:
    cnn_filter_num = 256
    cnn_filter_size = 3
    res_layer_num = 10
    l2_reg = 1e-4
    value_fc_size = 256
    input_size = (5,8,8) 
    policy_size = 8*8 1

... and large buffer

mine:

class PlayDataConfig:
    def __init__(self):
        self.nb_game_in_file = 50
        self.max_file_num = 1000

class TrainerConfig:
    def __init__(self):
        self.batch_size = 3072
        self.epoch_to_checkpoint = 1
        self.epoch_steps = 100
        self.save_model_steps = 800
        self.lr_schedule = (
            (0.2,    1500),  # means being 0.2 until 1500 steps.
            (0.02,   20000),
            (0.002,  100000),
            (0.0002, 9999999999)
        )

I also change sampling method. I do this because I found in my case(much more play data), @mokemokechicken 's original implementation takes too long waiting for all loaded data got trained at least once before new play data got loaded and before new candidate model got generated.

    def generate_train_data(self, batch_size):
        while True:
            x = []

            for _ in range(batch_size):
                n = randint(0, data_size - 1)
                # sample the nth data and append to x

            yield x

    def train_epoch(self, epochs):
        tc = self.config.trainer
        self.model.model.fit_generator(generator=self.generate_train_data(tc.batch_size),
                                       steps_per_epoch=tc.epoch_steps,
                                       epochs=epochs)
        return tc.epoch_steps * epochs


    def training(self):
        while True:
            self.update_learning_rate()
            steps = self.train_epoch(self.config.trainer.epoch_to_checkpoint)
            self.total_steps  = steps

            if last_save_step   self.config.trainer.save_model_steps <= self.total_steps:
                self.save_current_model_as_to_eval()
                last_save_step = self.total_steps

            self.load_play_data()

So basically, I am using "normal" config, but changes a lot of things.
Other configs are listed as below if you are interested:

class PlayConfig:
    def __init__(self):
        self.simulation_num_per_move = 800
        self.c_puct = 5
        self.noise_eps = 0.25
        self.dirichlet_alpha = 0.4
        self.change_tau_turn = 10
        self.virtual_loss = 3
        self.prediction_queue_size = 8
        self.parallel_search_num = 8
        self.v_resign_check_min_n = 100
        self.v_resign_init = -0.9
        self.v_resign_delta = 0.01
        self.v_resign_disable_prop = 0.1
        self.v_resign_false_positive_fraction_t_max = 0.05
        self.v_resign_false_positive_fraction_t_min = 0.04

@AranKomat
Copy link

@gooooloo Thanks so much for detailed information. Looks like you don't have self.search_threads for multi-threading. Did you find multi-processing only to be sufficient? It's impressive that your sampling method enabled you to finish 200k iters with your large architecture. Looks like Akababa's multiprocessing is very powerful. But I've failed to see how many self-play games you've finished up til 100~200k iters. Have you tracked the number of games?

@mokemokechicken
Copy link
Owner

@gooooloo @apollo-time @evalon32 @vincentlooi @AranKomat

I created Performance Reports for sharing our achievements, and linked from the top of readme.
I would be grateful if you would post it.

@gooooloo
Copy link
Contributor

@AranKomat

Have you tracked the number of games?

No I have not. I wish I had.

@gooooloo
Copy link
Contributor

Hi everyone, my codes getting the model are here: https://github.com/gooooloo/reversi-alpha-zero, if you are interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants