Potential problem with L2P1 msg_data consumption #100

fei-g · 2021-03-25T02:52:05Z

Problem description:

Some noc1 requests getting into L2 pipe1 contain msg_data (e.g. nc_store, atomics, interrupt forward), but the pipeline does not always consume the data immediately. While the request being pushed into mshr, the msg_data is not stored in mshr array, but being left in the noc1 buffer. At this moment, if another request gets into L2 pipe1, and also trying to consume msg_data, it would read out the msg_data of the previous request.

Is this problem real and triggerable?

Only two types of request may contain msg_data and has the potential to be pushed in mshr, atomic operation and nc store, which may cause the msg_data pending at the buffer.

atomic operation

An atomic operation is divided into two internal requests in L2: xx_P1 and xx_P2. In phase1 it will invalidate all sharers if the line is in S/M state; in phase2 it will read msg_data and do the arithmetic computation. L2 stalls the first stage between phase1 and phase2, and won't ack the msg header until it reaches phase2. Thus, no new requests can be processed by the pipe when a msg_data in the atomic operation is pending. A request in mshr can be recovered between phase1 and phase2, though, and that request may even be a nc_store. But in this case it was the nc_store what arrived at L2 first and made the msg_data pending, we'll discuss that in the next case.

So the conclusion is: due to the late ask of the msg header, we've already prevented new operations being consumed during the time when a msg_data in atomic operation is pending. @morenes also wrote tests to issue lots of consecutive atomic operations from multiple threads to try to trigger this problem, but we saw nothing happened. It's kind of verified that we are fine in this case.

non-cacheable store

If the target line of the nc_store is already in S/M state, L2 will firstly invalidate all the sharers and push the nc_store into mshr. At this time the msg_data is pending, and bad thing may happen if the pipe receives another request which carries msg_data as well. However, with Ariane core this case could not be triggered, because the non-cacheable region is fixed, and the line in the nc space will never be stored in L2.

The plan would be either try to build a test with sparc core, sending nc_store to a previous cacheable address, or using other device sending nc_store to cacheable region.

This fix

The idea of this fix is to figure out when the msg_data is pending. It would raise a flag msg_data_pending when a request is supposed to consume msg_data, but not and in reality being pushed into mshr. When that flag is set, the pipeline would not accept new request which also carries msg_data.

It's verified that this fix would not break the system. But we need further tests to make sure this problem would actually be triggered and this fix could solve the problem.

When an op goes into mshr in L2P1, we didn't store the msg_data in mshr, thus the data left in l2_pipe1_buf_in may be consumed by later ops. This fix stalls the pipe when msg_data is waiting to be consumed in the buf_in.

Conflicts: piton/design/chip/tile/l2/rtl/l2_pipe1_ctrl.v.pyv

fei-g added 4 commits March 24, 2021 17:24

Fix the noc1 msg_data consumption in L2P1

253e023

When an op goes into mshr in L2P1, we didn't store the msg_data in mshr, thus the data left in l2_pipe1_buf_in may be consumed by later ops. This fix stalls the pipe when msg_data is waiting to be consumed in the buf_in.

wrong stage signal used

7f6fa45

Add valid check

9aca28c

Conflicts: piton/design/chip/tile/l2/rtl/l2_pipe1_ctrl.v.pyv

Cleanup code and add comments

83d5a1d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential problem with L2P1 msg_data consumption #100

Potential problem with L2P1 msg_data consumption #100

fei-g commented Mar 25, 2021 •

edited

Loading

Potential problem with L2P1 msg_data consumption #100

Are you sure you want to change the base?

Potential problem with L2P1 msg_data consumption #100

Conversation

fei-g commented Mar 25, 2021 • edited Loading

Problem description:

Is this problem real and triggerable?

atomic operation

non-cacheable store

This fix

fei-g commented Mar 25, 2021 •

edited

Loading