-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transmitter #128
Transmitter #128
Conversation
…and add control bits
…and add control bits
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
|
||
|
||
// input tx.clk_in_pi, | ||
input tx.clk_async, // Asynchronous clock for phase measurement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest putting these debug signals into an interface, as we have done for the analog core. You could start by copying dragonphy2/vlog/chip_src/analog_core/acore_debug_intf.sv
to dragonphy2/vlog/chip_src/tx_16t1/tx_debug_intf.sv
and then modifying the interface as needed. I think the signals that you need are mostly the PI- and input divider-related signals from acore_debug_intf
, plus a few extras.
You'll notice that acore_debug_intf
has two modports
, acore
and dcore
. I think you would rename the acore
modport to tx
, so that in the tx_top
I/O list, you could just write:
module tx_top (
...
tx_debug_intf.tx tx,
...
);
instead of having to write out all of the tx.*
signals individually. Within the module definition for tx_top
, I don't think that you would have to change anything since you can refer to interface members using the same "dot" syntax.
This should make it easier to update the list of debug signals, and should also reduce the likelihood of typos. It also fits in nicely with the way that we have set up JTAG debug registers (each interface has a corresponding Markdown file in dragonphy2/md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sure. I will dig into that and make corresponding changes.
I suggest putting these debug signals into an interface, as we have done for the analog core. You could start by copying
dragonphy2/vlog/chip_src/analog_core/acore_debug_intf.sv
todragonphy2/vlog/chip_src/tx_16t1/tx_debug_intf.sv
and then modifying the interface as needed. I think the signals that you need are mostly the PI- and input divider-related signals fromacore_debug_intf
, plus a few extras.You'll notice that
acore_debug_intf
has twomodports
,acore
anddcore
. I think you would rename theacore
modport totx
, so that in thetx_top
I/O list, you could just write:module tx_top ( ... tx_debug_intf.tx tx, ... );instead of having to write out all of the
tx.*
signals individually. Within the module definition fortx_top
, I don't think that you would have to change anything since you can refer to interface members using the same "dot" syntax.This should make it easier to update the list of debug signals, and should also reduce the likelihood of typos. It also fits in nicely with the way that we have set up JTAG debug registers (each interface has a corresponding Markdown file in
dragonphy2/md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- here's a short reference on SystemVerilog interfaces: https://www.doulos.com/knowhow/systemverilog/systemverilog-tutorials/systemverilog-interfaces-tutorial/. I think that you can mostly copy & modify acore_debug_intf
, but this background might be helpful to better understand that code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing the helpful materials.
output reg clk_prbsgen, // Output clock for 16-bit prbs generator | ||
output reg dout // Data output | ||
input wire mdll_clk, // Clock from MDLL | ||
input wire ext_clk, // Clock from external source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ext_clk
also goes to analog_core
, please check with @sjkim85 that the additional loading by tx_top
on this clock signal is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
ext_clk
also goes toanalog_core
, please check with @sjkim85 that the additional loading bytx_top
on this clock signal is OK.
Sure, I will check with sungjin asap, thanks for the reminder.
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
wire clk_qb; // q, i, qb, ib spaced evenly within a clock cycle | ||
wire clk_ib; | ||
|
||
wire [3:0] tx.clk_interp_slice; // Output from the phase interpolator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using a "dot" in signal name declarations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using a "dot" in signal name declarations
Noted. Will modify this.
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
tx.rstb = ~ rst; | ||
|
||
//Global clock gating | ||
assign clk_prbsgen_reg = cke ? 1b'0 : clk_prbsgen; //Output clock gating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two notes here:
- It is not allowed to
assign
to areg
-- you can only assign to awire
. I think you could solve this problem by changingclk_prbsgen_reg
to awire
(and probably change the name, too, since it is not areg
). This might help to clear things up: https://www.verilogpro.com/verilog-reg-verilog-wire-systemverilog-logic/. - In general, I would advise against adding logic in the clock path like this, since it may be problematic for closing timing in synthesis, and also because it can introduce clock glitches. I prefer to use
cke
in the sequential logic that uses the clock, e.g.
always @(posedge clk) begin
if (rst) begin
q <= 0; // reset output
end else if (cke) begin
q <= d; // clock enabled
end else begin
q <= q; // clock disabled; do nothing
end
end
Automated clock gating in synthesis should recognize this pattern and insert clock gating circuit if it is advantageous from a power perspective. I don't have a great reference for this, but you could look at Fig. 2 and Fig. 3 in this paper to get a better sense of what I'm talking about: https://ieeexplore-ieee-org.stanford.idm.oclc.org/stamp/stamp.jsp?tp=&arnumber=5733874
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tutorial, this helps me clear a lot of confusion...
It appears to me that this could generate an extra DFF on the data path, which I would be a little hesitant to do so since this could be problematic in timing.
The sequential logic of the Tx is mainly connected by explicit wire
definitions, the DFFs in this Tx are explicitly defined and connected with other blocks...
I will go through the code with emphasis on the cke and rst
once more.
Two notes here:
- It is not allowed to
assign
to areg
-- you can only assign to awire
. I think you could solve this problem by changingclk_prbsgen_reg
to awire
(and probably change the name, too, since it is not areg
). This might help to clear things up: https://www.verilogpro.com/verilog-reg-verilog-wire-systemverilog-logic/.- In general, I would advise against adding logic in the clock path like this, since it may be problematic for closing timing in synthesis, and also because it can introduce clock glitches. I prefer to use
cke
in the sequential logic that uses the clock, e.g.always @(posedge clk) begin if (rst) begin q <= 0; // reset output end else if (cke) begin q <= d; // clock enabled end else begin q <= q; // clock disabled; do nothing end endAutomated clock gating in synthesis should recognize this pattern and insert clock gating circuit if it is advantageous from a power perspective. I don't have a great reference for this, but you could look at Fig. 2 and Fig. 3 in this paper to get a better sense of what I'm talking about: https://ieeexplore-ieee-org.stanford.idm.oclc.org/stamp/stamp.jsp?tp=&arnumber=5733874
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the two coding styles should be logically equivalent (Fig. 2 vs. Fig. 3 in that paper), so I don't think the style I'm suggesting should cause an extra DFF to be added. The difference is that, rather than applying clock gating logic to clk_prbsgen
within tx_top
, you would directly wire clk_prbsgen
to an output of tx_top
. Then in the digital_core
logic that uses clk_prbsgen
, I would implement sequential logic in a way that is compatible with automated clock gating.
Combining several comments, I'm basically suggesting that you remove lines 94-98 in tx_top
(the three clock gating statements and the two statements related to residue_clk_halfrate
and residue_clk_prbsgen
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the two coding styles should be logically equivalent (Fig. 2 vs. Fig. 3 in that paper), so I don't think the style I'm suggesting should cause an extra DFF to be added. The difference is that, rather than applying clock gating logic to
clk_prbsgen
withintx_top
, you would directly wireclk_prbsgen
to an output oftx_top
. Then in thedigital_core
logic that usesclk_prbsgen
, I would implement sequential logic in a way that is compatible with automated clock gating.Combining several comments, I'm basically suggesting that you remove lines 94-98 in
tx_top
(the three clock gating statements and the two statements related toresidue_clk_halfrate
andresidue_clk_prbsgen
)
Yeah, I will remove them. There is a little confusion from my previous comments,
For the data path I have in the Tx top, the DFF is instantiated directly using ff_c.sv
in analog_core/gates
, which doesn't come with a gating/rst option. To adopt this style, I need to revise the ff_c.sv
Originally, I did not want to revise the ff_c.sv, so I just gate the input clock separately.
I will remove these gating codes and revise some of the always ()
statements to follow the recommendations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I'm creating the confusion here... for the code you have right now, the gated clock signal clk_prbsgen_reg
goes straight to the output of tx_top
and does not appear to be used within tx_top
. So I'm just saying that if you make clk_prbsgen
the output of tx_top
and remove the gated version, I don't think that you have to change anything else. By the way, if you did want to create a new DFF with a clock enable, please do not modify ff_c
, as it is used within analog_core
, but instead make a copy and give the new module a different name. But from my previous comments, I don't think this is necessary; I think you can just drive clk_prbsgen
as an output of tx_top
, because the gating for that clock will happen in digital_core
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, I will remove them.
I was trying to make sure that nothing comes in and nothing comes out by enabling the gating signal.
Sorry for the misunderstanding
Thank you for the elaborations
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
assign clk_prbsgen_reg = cke ? 1b'0 : clk_prbsgen; //Output clock gating | ||
assign tx.ext_clk = cke ? 1b'0 : ext_clk; // Input external clock gating | ||
assign tx.mdll_clk = cke ? 1b'0 : mdll_clk; // Input external clock gating | ||
assign residue_buf1 = 1 ? 1b'0 : residue_clk_halfrate; // This buf act as a buffer to filter any potential interference of the floating internal clock node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't quite follow what the purpose of these two lines is -- it look like residue_clk_halfrate
and residue_clk_prbsgen
are driven, not floating. Could you explain what this code is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The differential signal is generated by calling the 16 to 4 and 4 to 1 mux once again with inverting inputs ~din
. The extra clock divider signal will be floating, just to ensure there are no uncertain connections, I put a gating on it.
Sorry, I don't quite follow what the purpose of these two lines is -- it look like
residue_clk_halfrate
andresidue_clk_prbsgen
are driven, not floating. Could you explain what this code is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I usually think of a "floating" signal as being connected to one or more inputs, but zero outputs. However, it looks like residue_clk_halfrate
is driven by qr_mux_4t1_1.ck_b2
and residue_clk_prbsgen
is driven by hr_mux_16t4_1.clk_b2
. In addition, residue_clk_halfrate
is connected to the input hr_mux_16t4_1.clk_hr
.
For residue_clk_prbsgen
at least, if you meant that it isn't connected to any input, you could use the ()
syntax to indicate that it is unused:
hr_16t4_mux_top hr_mux_16t4_1 (
...
.clk_b2(), // was residue_clk_prbsgen
...
);
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
.ck_b2(clk_halfrate), // Divided quarter-rate clock for 16 to 4 mux | ||
.data(dout) // Final data output | ||
.data(dout_p) // Final data output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check, I think dout_p
and dout_n
don't yet have the output drivers or termination added yet, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I haven't instantiated the output buffer and termination on it yet.
Just to check, I think
dout_p
anddout_n
don't yet have the output drivers or termination added yet, right?
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
|
||
//Global clock gating | ||
assign clk_prbsgen_reg = cke ? 1b'0 : clk_prbsgen; //Output clock gating | ||
assign tx.ext_clk = cke ? 1b'0 : ext_clk; // Input external clock gating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think gating circuitry is needed ext_clk
and mdll_clk
, since they go directly to the input_divider
, which already handles clock selection and can be disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I didn't realize that. I will think twice about whether it is necessary to have this divider or a simpler div_b2 dff will be enough. Thanks for the note!
I don't think gating circuitry is needed
ext_clk
andmdll_clk
, since they go directly to theinput_divider
, which already handles clock selection and can be disabled.
endgenerate | ||
|
||
// Clock input divider | ||
input_divider indiv ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention to @sjkim85 that you want to use his input_divider
block. I think that he is already making some updates so that phase_interpolator
can be used as a standalone block, so please check if similar updates are needed to use input_divider
in the transmitter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention to @sjkim85 that you want to use his
input_divider
block. I think that he is already making some updates so thatphase_interpolator
can be used as a standalone block, so please check if similar updates are needed to useinput_divider
in the transmitter.
Hmmm, I didn't realize that. I will think twice about whether it is necessary to have this divider or a simpler div_b2 dff will be enough. Thanks for the note!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the question is whether input_divider.sv
can be run through synthesis and PnR like a "normal" digital block, or whether a bunch of special settings are required, in which case it would be better for you to get the input_divider
from @sjkim85 as black box layout. I'm not sure which is the case, so it would be best to check with him directly.
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
input wire logic clk_async, | ||
output wire logic clk_encoder, | ||
input wire logic ctl_valid, | ||
// Control and input of the phase interpolator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to delete obsolete code rather than commenting it out. We can always restore the old code from the git history, but it can be confusing to read/review code when a lot of it is commented out. I would typically only leave in a commented block of code if there was a remark explaining it (i.e., "uncomment this code to create an interactive plot").
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
wire [3:0] qr_data_n; // Output of 16 to 4 mux, negative | ||
wire residue_clk_halfrate; | ||
// wire residue_clk_prbsgen; | ||
wire residue_buf1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
residue_buf1
and residue_buf2
don't appear to be used anywhere
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
wire clk_halfrate; // Input clock for 16 to 4 mux | ||
wire clk_q; // The clock inout must follow this order, (rising edge order) Q->I->QB->IB-Q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clk_q
, clk_i
, clk_qb
, and clk_ib
don't appear to be used anywhere
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
|
||
// Global reset | ||
wire rstb; | ||
rstb = ~ rst; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an assign
is needed here, i.e. assign rstb=~rst;
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
); | ||
// portion 4 checked | Yes | ||
assign tx.pi_out_meas[k] = (tx.sel_meas_pi[k] ? clk_interp_slice[k] : clk_interp_sw[k]) & tx.en_meas_pi[k]; | ||
assign en_unit_pi[k] = ~tx.enb_unit_pi[k]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
en_unit_pi
doesn't appear to be connected to anything -- in analog_core
, each en_unit_pi[k]
is wired to the en_unit
pin of an instance of phase_interpolator
. is that not needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I missed one connection.
en_unit_pi
added.
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
|
||
// output reg tx.inbuf_out_meas, | ||
output wire clk_prbsgen, // Output clock for 16-bit prbs generator | ||
output reg dout_p, // Data output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dout_p
and dout_n
should both be wire
rather than reg
@@ -2,6 2,8 @@ | |||
|
|||
module div_b2 ( | |||
input wire clkin, | |||
input wire rst, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you have go back through your design adding cke
, because you can already shut down the transmitter clock via the en
signal of the input_divider
. For rst
, if you keep it, then please delete the initial
block that sets clkout
to zero.
@@ -10,8 12,14 @@ initial begin | |||
end | |||
|
|||
always@(posedge clkin) begin | |||
#0.3; | |||
if (rst) begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the previous comments, I think that you would just want this to be
if (rst) begin
clkout <= 0;
end else begin
clkout <= ~clkout;
end
By the way, blocking (=
) vs. <=
(non-blocking) assignments can be confusing. In general, if you're implementing sequential logic like this, then you should use the <=
non-blocking assignment, while combination logic would use =
(blocking). There is some more information on this here: http://courses.csail.mit.edu/6.111/f2007/handouts/L06.pdf
vlog/chip_src/tx_16t1/tx_top.sv
Outdated
.clk_IB(tx.clk_interp_slice[3]), | ||
.din(qr_data_p), // Quarter-rate data from half-rate 16 to 4 mux | ||
.rst(rst), | ||
.cke(cke), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as earlier comment -- I don't think you have to go back through your design to add cke
here, because you can already shut down the TX clock via the input_divider
// Input clock signal measurement | ||
logic inbuf_out_meas; // to output buffer | ||
|
||
// logic en_TDC_phase_reverse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as in tx_top
-- please delete obsolete code rather than commenting it out, since it makes the code harder to read & review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Add control bits and phase interpolator to the tx_top for top-level integration