Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transmitter #128

Merged
merged 10 commits into from
Oct 17, 2020
Merged

Transmitter #128

merged 10 commits into from
Oct 17, 2020

Conversation

CansWang
Copy link
Collaborator

Add control bits and phase interpolator to the tx_top for top-level integration



// input tx.clk_in_pi,
input tx.clk_async, // Asynchronous clock for phase measurement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest putting these debug signals into an interface, as we have done for the analog core. You could start by copying dragonphy2/vlog/chip_src/analog_core/acore_debug_intf.sv to dragonphy2/vlog/chip_src/tx_16t1/tx_debug_intf.sv and then modifying the interface as needed. I think the signals that you need are mostly the PI- and input divider-related signals from acore_debug_intf, plus a few extras.

You'll notice that acore_debug_intf has two modports, acore and dcore. I think you would rename the acore modport to tx, so that in the tx_top I/O list, you could just write:

module tx_top (
   ...
   tx_debug_intf.tx tx,
   ...
); 

instead of having to write out all of the tx.* signals individually. Within the module definition for tx_top, I don't think that you would have to change anything since you can refer to interface members using the same "dot" syntax.

This should make it easier to update the list of debug signals, and should also reduce the likelihood of typos. It also fits in nicely with the way that we have set up JTAG debug registers (each interface has a corresponding Markdown file in dragonphy2/md.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sure. I will dig into that and make corresponding changes.

I suggest putting these debug signals into an interface, as we have done for the analog core. You could start by copying dragonphy2/vlog/chip_src/analog_core/acore_debug_intf.sv to dragonphy2/vlog/chip_src/tx_16t1/tx_debug_intf.sv and then modifying the interface as needed. I think the signals that you need are mostly the PI- and input divider-related signals from acore_debug_intf, plus a few extras.

You'll notice that acore_debug_intf has two modports, acore and dcore. I think you would rename the acore modport to tx, so that in the tx_top I/O list, you could just write:

module tx_top (
   ...
   tx_debug_intf.tx tx,
   ...
); 

instead of having to write out all of the tx.* signals individually. Within the module definition for tx_top, I don't think that you would have to change anything since you can refer to interface members using the same "dot" syntax.

This should make it easier to update the list of debug signals, and should also reduce the likelihood of typos. It also fits in nicely with the way that we have set up JTAG debug registers (each interface has a corresponding Markdown file in dragonphy2/md.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- here's a short reference on SystemVerilog interfaces: https://www.doulos.com/knowhow/systemverilog/systemverilog-tutorials/systemverilog-interfaces-tutorial/. I think that you can mostly copy & modify acore_debug_intf, but this background might be helpful to better understand that code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the helpful materials.

output reg clk_prbsgen, // Output clock for 16-bit prbs generator
output reg dout // Data output
input wire mdll_clk, // Clock from MDLL
input wire ext_clk, // Clock from external source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ext_clk also goes to analog_core, please check with @sjkim85 that the additional loading by tx_top on this clock signal is OK.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ext_clk also goes to analog_core, please check with @sjkim85 that the additional loading by tx_top on this clock signal is OK.

Sure, I will check with sungjin asap, thanks for the reminder.

wire clk_qb; // q, i, qb, ib spaced evenly within a clock cycle
wire clk_ib;

wire [3:0] tx.clk_interp_slice; // Output from the phase interpolator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using a "dot" in signal name declarations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using a "dot" in signal name declarations

Noted. Will modify this.

tx.rstb = ~ rst;

//Global clock gating
assign clk_prbsgen_reg = cke ? 1b'0 : clk_prbsgen; //Output clock gating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two notes here:

  1. It is not allowed to assign to a reg -- you can only assign to a wire. I think you could solve this problem by changing clk_prbsgen_reg to a wire (and probably change the name, too, since it is not a reg). This might help to clear things up: https://www.verilogpro.com/verilog-reg-verilog-wire-systemverilog-logic/.
  2. In general, I would advise against adding logic in the clock path like this, since it may be problematic for closing timing in synthesis, and also because it can introduce clock glitches. I prefer to use cke in the sequential logic that uses the clock, e.g.
always @(posedge clk) begin
    if (rst) begin
        q <= 0;  // reset output
    end else if (cke) begin
        q <= d;  // clock enabled
    end else begin
        q <= q;  // clock disabled; do nothing
    end
end

Automated clock gating in synthesis should recognize this pattern and insert clock gating circuit if it is advantageous from a power perspective. I don't have a great reference for this, but you could look at Fig. 2 and Fig. 3 in this paper to get a better sense of what I'm talking about: https://ieeexplore-ieee-org.stanford.idm.oclc.org/stamp/stamp.jsp?tp=&arnumber=5733874

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tutorial, this helps me clear a lot of confusion...

It appears to me that this could generate an extra DFF on the data path, which I would be a little hesitant to do so since this could be problematic in timing.
The sequential logic of the Tx is mainly connected by explicit wire definitions, the DFFs in this Tx are explicitly defined and connected with other blocks...
I will go through the code with emphasis on the cke and rst once more.

Two notes here:

  1. It is not allowed to assign to a reg -- you can only assign to a wire. I think you could solve this problem by changing clk_prbsgen_reg to a wire (and probably change the name, too, since it is not a reg). This might help to clear things up: https://www.verilogpro.com/verilog-reg-verilog-wire-systemverilog-logic/.
  2. In general, I would advise against adding logic in the clock path like this, since it may be problematic for closing timing in synthesis, and also because it can introduce clock glitches. I prefer to use cke in the sequential logic that uses the clock, e.g.
always @(posedge clk) begin
    if (rst) begin
        q <= 0;  // reset output
    end else if (cke) begin
        q <= d;  // clock enabled
    end else begin
        q <= q;  // clock disabled; do nothing
    end
end

Automated clock gating in synthesis should recognize this pattern and insert clock gating circuit if it is advantageous from a power perspective. I don't have a great reference for this, but you could look at Fig. 2 and Fig. 3 in this paper to get a better sense of what I'm talking about: https://ieeexplore-ieee-org.stanford.idm.oclc.org/stamp/stamp.jsp?tp=&arnumber=5733874

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two coding styles should be logically equivalent (Fig. 2 vs. Fig. 3 in that paper), so I don't think the style I'm suggesting should cause an extra DFF to be added. The difference is that, rather than applying clock gating logic to clk_prbsgen within tx_top, you would directly wire clk_prbsgen to an output of tx_top. Then in the digital_core logic that uses clk_prbsgen, I would implement sequential logic in a way that is compatible with automated clock gating.

Combining several comments, I'm basically suggesting that you remove lines 94-98 in tx_top (the three clock gating statements and the two statements related to residue_clk_halfrate and residue_clk_prbsgen)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two coding styles should be logically equivalent (Fig. 2 vs. Fig. 3 in that paper), so I don't think the style I'm suggesting should cause an extra DFF to be added. The difference is that, rather than applying clock gating logic to clk_prbsgen within tx_top, you would directly wire clk_prbsgen to an output of tx_top. Then in the digital_core logic that uses clk_prbsgen, I would implement sequential logic in a way that is compatible with automated clock gating.

Combining several comments, I'm basically suggesting that you remove lines 94-98 in tx_top (the three clock gating statements and the two statements related to residue_clk_halfrate and residue_clk_prbsgen)

Yeah, I will remove them. There is a little confusion from my previous comments,
For the data path I have in the Tx top, the DFF is instantiated directly using ff_c.sv in analog_core/gates, which doesn't come with a gating/rst option. To adopt this style, I need to revise the ff_c.sv

Originally, I did not want to revise the ff_c.sv, so I just gate the input clock separately.

I will remove these gating codes and revise some of the always () statements to follow the recommendations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm creating the confusion here... for the code you have right now, the gated clock signal clk_prbsgen_reg goes straight to the output of tx_top and does not appear to be used within tx_top. So I'm just saying that if you make clk_prbsgen the output of tx_top and remove the gated version, I don't think that you have to change anything else. By the way, if you did want to create a new DFF with a clock enable, please do not modify ff_c, as it is used within analog_core, but instead make a copy and give the new module a different name. But from my previous comments, I don't think this is necessary; I think you can just drive clk_prbsgen as an output of tx_top, because the gating for that clock will happen in digital_core.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, I will remove them.

I was trying to make sure that nothing comes in and nothing comes out by enabling the gating signal.
Sorry for the misunderstanding
Thank you for the elaborations

assign clk_prbsgen_reg = cke ? 1b'0 : clk_prbsgen; //Output clock gating
assign tx.ext_clk = cke ? 1b'0 : ext_clk; // Input external clock gating
assign tx.mdll_clk = cke ? 1b'0 : mdll_clk; // Input external clock gating
assign residue_buf1 = 1 ? 1b'0 : residue_clk_halfrate; // This buf act as a buffer to filter any potential interference of the floating internal clock node.
Copy link
Contributor

@sgherbst sgherbst Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't quite follow what the purpose of these two lines is -- it look like residue_clk_halfrate and residue_clk_prbsgen are driven, not floating. Could you explain what this code is doing?

Copy link
Collaborator Author

@CansWang CansWang Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The differential signal is generated by calling the 16 to 4 and 4 to 1 mux once again with inverting inputs ~din. The extra clock divider signal will be floating, just to ensure there are no uncertain connections, I put a gating on it.

Sorry, I don't quite follow what the purpose of these two lines is -- it look like residue_clk_halfrate and residue_clk_prbsgen are driven, not floating. Could you explain what this code is doing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I usually think of a "floating" signal as being connected to one or more inputs, but zero outputs. However, it looks like residue_clk_halfrate is driven by qr_mux_4t1_1.ck_b2 and residue_clk_prbsgen is driven by hr_mux_16t4_1.clk_b2. In addition, residue_clk_halfrate is connected to the input hr_mux_16t4_1.clk_hr.

For residue_clk_prbsgen at least, if you meant that it isn't connected to any input, you could use the () syntax to indicate that it is unused:

hr_16t4_mux_top hr_mux_16t4_1 (
    ...
    .clk_b2(),  // was residue_clk_prbsgen
    ...
);

.ck_b2(clk_halfrate), // Divided quarter-rate clock for 16 to 4 mux
.data(dout) // Final data output
.data(dout_p) // Final data output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check, I think dout_p and dout_n don't yet have the output drivers or termination added yet, right?

Copy link
Collaborator Author

@CansWang CansWang Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I haven't instantiated the output buffer and termination on it yet.

Just to check, I think dout_p and dout_n don't yet have the output drivers or termination added yet, right?


//Global clock gating
assign clk_prbsgen_reg = cke ? 1b'0 : clk_prbsgen; //Output clock gating
assign tx.ext_clk = cke ? 1b'0 : ext_clk; // Input external clock gating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think gating circuitry is needed ext_clk and mdll_clk, since they go directly to the input_divider, which already handles clock selection and can be disabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I didn't realize that. I will think twice about whether it is necessary to have this divider or a simpler div_b2 dff will be enough. Thanks for the note!

I don't think gating circuitry is needed ext_clk and mdll_clk, since they go directly to the input_divider, which already handles clock selection and can be disabled.

endgenerate

// Clock input divider
input_divider indiv (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention to @sjkim85 that you want to use his input_divider block. I think that he is already making some updates so that phase_interpolator can be used as a standalone block, so please check if similar updates are needed to use input_divider in the transmitter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention to @sjkim85 that you want to use his input_divider block. I think that he is already making some updates so that phase_interpolator can be used as a standalone block, so please check if similar updates are needed to use input_divider in the transmitter.

Hmmm, I didn't realize that. I will think twice about whether it is necessary to have this divider or a simpler div_b2 dff will be enough. Thanks for the note!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the question is whether input_divider.sv can be run through synthesis and PnR like a "normal" digital block, or whether a bunch of special settings are required, in which case it would be better for you to get the input_divider from @sjkim85 as black box layout. I'm not sure which is the case, so it would be best to check with him directly.

input wire logic clk_async,
output wire logic clk_encoder,
input wire logic ctl_valid,
// Control and input of the phase interpolator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to delete obsolete code rather than commenting it out. We can always restore the old code from the git history, but it can be confusing to read/review code when a lot of it is commented out. I would typically only leave in a commented block of code if there was a remark explaining it (i.e., "uncomment this code to create an interactive plot").

wire [3:0] qr_data_n; // Output of 16 to 4 mux, negative
wire residue_clk_halfrate;
// wire residue_clk_prbsgen;
wire residue_buf1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

residue_buf1 and residue_buf2 don't appear to be used anywhere

wire clk_halfrate; // Input clock for 16 to 4 mux
wire clk_q; // The clock inout must follow this order, (rising edge order) Q->I->QB->IB-Q
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clk_q, clk_i, clk_qb, and clk_ib don't appear to be used anywhere


// Global reset
wire rstb;
rstb = ~ rst;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an assign is needed here, i.e. assign rstb=~rst;

);
// portion 4 checked | Yes
assign tx.pi_out_meas[k] = (tx.sel_meas_pi[k] ? clk_interp_slice[k] : clk_interp_sw[k]) & tx.en_meas_pi[k];
assign en_unit_pi[k] = ~tx.enb_unit_pi[k];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

en_unit_pi doesn't appear to be connected to anything -- in analog_core, each en_unit_pi[k] is wired to the en_unit pin of an instance of phase_interpolator. is that not needed here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I missed one connection.
en_unit_pi added.


// output reg tx.inbuf_out_meas,
output wire clk_prbsgen, // Output clock for 16-bit prbs generator
output reg dout_p, // Data output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dout_p and dout_n should both be wire rather than reg

@@ -2,6 2,8 @@

module div_b2 (
input wire clkin,
input wire rst,
Copy link
Contributor

@sgherbst sgherbst Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you have go back through your design adding cke, because you can already shut down the transmitter clock via the en signal of the input_divider. For rst, if you keep it, then please delete the initial block that sets clkout to zero.

@@ -10,8 12,14 @@ initial begin
end

always@(posedge clkin) begin
#0.3;
if (rst) begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the previous comments, I think that you would just want this to be

if (rst) begin
    clkout <= 0;
end else begin
    clkout <= ~clkout;
end

By the way, blocking (=) vs. <= (non-blocking) assignments can be confusing. In general, if you're implementing sequential logic like this, then you should use the <= non-blocking assignment, while combination logic would use = (blocking). There is some more information on this here: http://courses.csail.mit.edu/6.111/f2007/handouts/L06.pdf

.clk_IB(tx.clk_interp_slice[3]),
.din(qr_data_p), // Quarter-rate data from half-rate 16 to 4 mux
.rst(rst),
.cke(cke),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as earlier comment -- I don't think you have to go back through your design to add cke here, because you can already shut down the TX clock via the input_divider

// Input clock signal measurement
logic inbuf_out_meas; // to output buffer

// logic en_TDC_phase_reverse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in tx_top -- please delete obsolete code rather than commenting it out, since it makes the code harder to read & review

Copy link
Contributor

@sgherbst sgherbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@sgherbst sgherbst merged commit fa8c737 into master Oct 17, 2020
CansWang pushed a commit that referenced this pull request Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants