In the field of network communication, in ATM switches, core routers, Gigabit Ethernet and various gateway devices, the system data rate and clock rate continue to increase, and the operating frequency of the corresponding processor is also getting higher and higher; data, voice, and image transmission The speed is far higher than 500Mbps, and backplanes of hundreds of megabytes and even gigabytes are becoming more and more common. The increase in the speed of digital systems means that the rise and fall time of the signal is as short as possible, and a series of high-speed design problems caused by the increase in the frequency and edge rate of the digital signal have become more and more prominent. When the signal interconnection delay is greater than 20% of the edge signal flip time, the signal wires on the board will show the transmission line effect, and this design becomes a high-speed design. The emergence of high-speed problems has brought greater challenges to hardware design. There are many designs that are correct from a logical point of view. If they are not handled properly in the actual PCB design, the entire design will fail. This situation is increasingly pursuing high-speed networks. The field of communications is even more obvious. Experts predict that in terms of hardware circuit design costs in the future, the cost of logic function design will be greatly reduced, and the cost associated with high-speed design will account for 80% or more of the total cost. The high-speed problem has become one of the important factors for the success of the system design.
Signal overshoot, undershoot, reflection, ringing, crosstalk, etc. caused by high-speed problems will seriously affect the normal sequence of the system. The reduction of system sequence margin forces people to pay attention to various phenomena that affect the sequence and quality of digital waveforms. When the timing becomes harsh due to the increase in speed, no matter how thorough the system principle is understood in advance, any ignorance and simplification may bring serious consequences to the system. In high-speed design, the impact of timing issues is more critical. This article will specifically discuss timing analysis and simulation strategies in high-speed design.
1 Timing analysis and simulation of common clock synchronization
In high-speed digital circuits, the transmission of data is generally controlled by the clock to send and receive data signals in an orderly manner. The chip can only send and receive data according to the prescribed timing. Excessive signal delay or improper signal delay matching may cause signal timing violation and functional confusion. In a low-speed system, phenomena such as interconnection delay and ringing are negligible, because in this low-speed system the signal has enough time to reach a stable state. However, in a high-speed system, the edge rate increases, the system clock rate increases, the signal transmission time between devices and the synchronization preparation time are shortened, and the equivalent capacitance and inductance on the transmission line will also cause delay and distortion in the digital conversion of the signal. Coupled with the signal delay mismatch and other factors, it will affect the setup and hold time of the chip, resulting in the chip's inability to send and receive data correctly and the system's failure to work normally.
The so-called common clock synchronization means that in the process of data transmission, the driving end and the receiving end on the bus share the same clock source, and the same clock buffer (CLOCK BUFFER) sends out an in-phase clock to complete the data transmission and Receive. Figure 1 shows a schematic diagram of a typical common clock synchronous data sending and receiving work. In Figure 1, the crystal oscillator CRYSTAL generates an output signal CLK_IN that reaches the clock distributor CLOCK BUFFER. After the CLOCK BUFFER distributes and buffers, it sends out two in-phase clocks, one is CLKB, which is used for data output of DRIVER; the other is CLKA, which is used for sampling latch Data sent from DRIVER to RECEIVER. The clock CLKB arrives at the DRIVER after a flight time of Tflt_CLKB (FLIGHT TIME). The internal data of the DRIVER is latched by CLKB and appears on the output port of the DRIVER after TCO_DATA time. The output data then reaches the input port of RECEIVER after a flight time Tflt_DATA; On the input port of RECEIVER, use another clock CLKA generated by CLOCK BUFFER (the delay is the CLKA clock flight time, that is, Tflt_CLKA) to sample and latch this batch of data from the DRIVER, thereby completing the data transmission of one clock cycle of COMMON CLOCK Process.
The above process shows that the data arriving at RECEIVER is sampled by the rising edge of the next cycle of the clock. According to this, two necessary conditions for data transmission can be obtained: 1. The data at the input of RECEIVER generally has the required setup time Tsetup. It means that the data must be valid before the minimum time value of the clock. The data signal should arrive at the input end before the clock signal, so that the inequality satisfied by the setup time can be obtained; 2. In order to successfully latch the data into the device, The data signal must remain valid for a long enough time at the input of the receiving chip to ensure that the signal is correctly latched by the clock sample. This period of time is called the hold time. The delay of CLKA must be less than the data invalid time (INVALID). The inequality satisfied by the hold time can be obtained.
1.1 Timing analysis of data establishment time
According to the first condition, the data signal must arrive at the receiving end before the clock CLKA in order to latch the data correctly. In the common clock bus, the function of the first clock cycle is to latch the data to the output of DRIVER, and the second clock cycle to latch the data to the inside of RECEIVER, which means that the time for the data signal to reach the input of RECEIVER should be It is sufficiently earlier than the clock signal CLKA. In order to meet this condition, it is necessary to determine the delay for the clock and data signals to reach the RECEIVER and to ensure that the receiver set-up time requirements are met. Any amount of time longer than the required set-up time is the set-up time timing margin Tmargin. In the timing diagram of Figure 1, all arrow lines indicate the delays generated by data signals and clock signals inside the chip or on the transmission line. The arrow lines below indicate the total delay from the first clock edge valid to the data reaching the RECEIVER input., The arrow line on the top represents the total delay of the received clock CLKA. The total delay from when the first clock edge is valid to when the data arrives at the RECEIVER input is:
TDATA_DELAY=TCO_CLKB+Tflt_CLKB+TCO_DATA+Tflt_DATA
The total delay of the next cycle of the receiving clock CLKA is:
TCLKA_DELAY=TCYCLE+TCO_CLKA+Tflt_CLKA
To meet the data establishment time, there must be:
TCLKA_DELAY_MIN-TDATA_DELAY_MAX-Tsetup-Tmargin>0
After unfolding and considering factors such as clock jitter, Tjitter, we get:
TCYCLE+(TCO_CLKA_MIN-TCO_CLKB_MAX)+ (Tflt_CLKA_MIN-Tflt_CLKB_MAX)-TCO_DATA_MAX-Tflt_DATA_SETTLE_DELAY_MAX-Tjitter-Tsetup-Tmargin>0 (1)
In formula (1), TCYCLE is a clock cycle of the clock; in the first bracket is the maximum phase difference between the clock chip CLOCK BUFFER output clock CLKA and CLKB, which is called output-output skew in the manual; in the second bracket It is the maximum delay difference between the two clocks CLKA and CLKB output by the CLOCK BUFFER chip to reach the RECEIVER and the DRIVER respectively.
In formula (1), TCO_DATA refers to the time interval from the clock triggering to when the data appears on the output port and reaching the threshold of the test voltage Vmeas (or VREF) under certain test load and test conditions. The size of TCO_DATA is related to the internal logic delay of the chip. Time, buffer OUTPUT BUFFER characteristics, output load conditions are directly related, TCO can be found in the chip data sheet.
According to formula (1), there are actually only two adjustable parts: Tflt_CLKB_MIN-Tflt_CLKB_MAX and Tflt_DATA_SETTLE_DELAY_MAX. In terms of satisfying the setup time alone, Tflt_CLKA_MIN should be as large as possible, while Tflt_CLKB_MAX and Tflt_DATA_SETTLE_DELAY_MAX should be as small as possible. In essence, it is required that the receiving clock comes later and the data comes earlier.
1.2 Timing analysis of data retention time
In order to successfully latch the data inside the device, the data signal must remain valid at the input of the receiving chip for a long enough time to ensure that the signal is correctly latched by the clock sample. This period of time is called the hold time. In the common clock bus, the receiving end buffer uses the second clock edge to latch the data, and at the same time the driver end latches the next data to the data sending end. Therefore, in order to meet the holding time of the receiving end, it is necessary to ensure that valid data is latched in the receiving end flip-flop before the next data signal arrives. This requires that the delay of the receiving clock CLKA is less than the delay of the receiving data signal.
And the data delay:
TDATA_DELAY=TCO_CLKB+Tflt_CLKB+TCO_DATA+Tflt_DATA_SWITCH_DELAY
If you want to meet the data retention time, you must have:
TDATA_DELAY_MIN-TCLKA_DELAY_MAX-Thold-Tmargin>0
Expand, organize and consider factors such as clock jitter Tjitter, the following relationship can be obtained:
(TCO_CLKB_MIN-TCO_CLKA_MAX)+(Tflt_CLKB_MIN-Tflt_CLKA_MAX)+TCO_DATA_MIN+Tflt_DATA_SWITCH_DELAY_MIN-Thold-Tmargin-Tjitter>0 2
In formula (2), the first bracket is still the maximum phase difference between the clock chip CLOCK BUFFER output clock; the second bracket continues to be understood as the two clocks CLKA and CLKB output by the clock chip reach RECEIVER and DRIVER respectively To meet the data retention time, there are only two actually adjustable parts, namely Tflt_CLKB_MIN-Tflt_CLKA_MAX and Tflt_DATA_SWITCH_DELAY_MIN. From the perspective of satisfying the hold time alone, Tflt_CLKB_MIN and Tflt_DATA_SWITCH_DELAY_MIN should be as large as possible, and Tflt_CLKA_MAX should be as small as possible. In other words, if you want to meet the hold time, you must make the receiving clock come early, and the data must be invalid later.
In order to receive data correctly, the establishment time and hold time of the data must be considered comprehensively, that is, both (1) and (2) are satisfied at the same time. Analyzing these two inequalities, it can be seen that there are only three ways to adjust: sending clock delay, receiving clock delay, and data delay. The adjustment scheme can be carried out as follows: First, assume that the transmission clock delay is strictly equal to the receive clock delay, that is, Tflt_CLKA_MIN-Tflt_CLKB_MAX =0 and Tflt_CLKB_MIN-Tflt_CLKA_MAX =0 (the timing deviation caused by the assumptions of these two equations will be considered later), And then the data delay range can be obtained through simulation. If the data delay has no solution, return to the above two equations to adjust the send clock delay or receive clock delay. The following is an example of the GLINK bus common clock synchronization data transmission and reception in a broadband network switch: First, assume that the delay of the sending clock is strictly equal to the delay of the receiving clock, and then determine the delay range of the data, and substitute the parameters, (1) and (2) respectively Becomes:
1.5-Tflt_DATA_SETTLE_DELAY_MAX-Tmargin>0
0.5+Tflt_DATA_SWITCH_DELAY_MIN-Tmargin>0
Under the inequality prompt, combined with the actual PCB layout, determine Tflt_DATA_SETTLE_DELAY_MAX<1.1;tflt_data_switch_delay_min>-0.1, and the remaining 0.4ns margin is allocated to the time difference and Tmargin of the two clocks. Extract the topology in SPECCTRAQUEST and perform signal integrity simulation to determine the line length and topology of each segment. Perform full scan simulation on this structure (a total of 12 combinations), and get Tflt_DATA_SETTLE_DELAY_MAX=1.0825 Tflt_DATA_SWITCH_DELAY_MIN =-0.0835004, which meets the determined 1.1 and
-0.1 range index. From this, the constraint rules of the GLINK bus data line can be drawn: 1. The delay from the matching resistance to the sending end should not be greater than 0.1ns;
2. The data line must be matched in 0.1ns, that is, each data line must be between 0.65ns and 0.75ns. With the above constraint rules, wiring can be guided.
Next, consider the impact of the hard regulations "Tflt_CLKA_MIN-Tflt_CLKB_MAX=0 and Tflt_CLKB_MIN-Tflt_CLKA_MAX=0. Constrain the send clock and the receive clock to have the same length in advance (matching with 0.02ns in actual operation). In the CADENCE environment, the clock simulation is performed, and the result is: |Tflt_CLKA_MIN-Tflt_CLKB_MA interconnection X|<0.2 and |tflt_clkb_min-tflt_clka_max|<0.2 . It can be seen that the margin left for tmargin is 0.2ns.
The final simulation results are: 1. The delay between the matching resistance and the transmitting end should not be greater than 0.1ns; 2. The data line is matched at 0.1ns, that is, each data line must be between 0.65ns and 0.75ns; 3. Sending clock and receiving The clock is matched with equal length of 0.02ns; 4. Tmargin=0.2ns. With the above topology template and constraint rules, SPECCTRAQUEST or ALLEGRO can be imported into CONSTRAINS MANAGER. After these design constraint rules are set up, you can use the auto-router for rule-driven automatic wiring or manual line adjustment.
2 Source synchronization timing relationship and simulation examples
The so-called source synchronization means that the clock gating signal CLK is sent by the drive chip along with the sending data, and it does not use an independent clock source like the common clock synchronization. In the source synchronous data transmission and reception, the data is first sent to the receiving end, and then sent to the receiving end after a short time gating the clock for sampling and latching this batch of data. The schematic diagram is shown in Figure 2. The timing analysis of source synchronization is simpler than that of public clock synchronization, and the analysis method is very similar. The analysis formula is directly given below:
Setup time: Tvb_min+(Tflt_clk_min-Tflt_data_settle_delay_max)-Tsetup-Tmargin>0
Hold time: Tva_min+(Tflt_data_switch_delay min-Tflt_clk _max)-Thold-Tmargin>0
Among them, Tvb is the setup time of the driver, which indicates how much time the driver data is valid before the clock is valid; Tva is the hold time of the sender, which indicates the time the driver data remains valid after the clock is valid; other parameters have the same meaning as before. Now take the very common TBI interface in the communication circuit as an example to introduce the source synchronization timing analysis and simulation process. TBI interface mainly includes sending clock and 10bit sending data, two receiving clocks and 10bit receiving data. RBC0 and RBC1 are two receiving clocks. In Gigabit Ethernet, these two clocks have a frequency of 62.5MHz and a difference of 180°. The rising edges of the two clocks are used to latch data in turn. According to the timing parameters of the data sheet, substituting into the above formula can be obtained:
2.5+ Tflt_clk _min-Tflt_data__settle_delay_max -1-Tmargin>0
1.5+ Tflt_data__switch_delay min-Tflt_clk _max -0.5-Tmargin>0
Imitate the aforementioned analysis method: Assuming that the flight time of the clock and data signal lines are strictly equal, that is, the clock and the data are completely matched, and then analyze the impact of their mismatch. The above formula becomes
1.5-Tmargin>0
1-Tmargin>0
It can be seen that there is a large margin for both the setup time and the hold time. After simulation, it is found that the data and the clock are exactly the same length (take 0.02ns matching as an example), and there is still a 0.3ns difference, that is,
Tflt_clk_min-Tflt_data_settle_delay_max <0.3< p="">
Tflt_data_switch_delay min-Tflt_clk_max <0.3< p="">
Take Tmargin=0.5ns to get the match between the clock and the data as 0.2ns, that is, the length match between the data and the clock should not exceed 0.2ns.
In the actual simulation, the signal integrity of the clock and data is analyzed and simulated first, and a better received waveform can be obtained through proper termination matching. Figure 3 is a set of different simulation waveform comparisons of passive end matching and active end matching clock lines, from which it can be seen that the signal integrity simulation is necessary first.
In the common clock synchronization, data transmission and reception must be completed within one clock cycle. At the same time, the delay of the device and the delay of PCB traces also limit the maximum theoretical operating frequency of the common clock bus. Therefore, common clock synchronization is generally used for transmission rates lower than 200MHz to 300MHz. For transmissions higher than this rate, source synchronization technology should generally be introduced. Source synchronization technology works in a relative clock system, using data and clock parallel transmission, the transmission rate is mainly determined by the time difference between the data and the clock signal, so that the system can achieve a higher transmission rate. Through the signal integrity analysis, timing analysis and simulation of the broadband Ethernet switch host and daughter card board, the author greatly shortens the design cycle of the product, and effectively solves the signal integrity, timing and other aspects of the high-speed design through analysis and simulation. The problem, which fully guarantees the design quality and design speed, truly achieves a single pass of the PCB board. The main board and daughter card board have been debugged and successfully transferred to production.