Research Journal of Recent Sciences _________________________________________________ ISSN 2277-2502 Vol. 3(7), 39-43, July (2014) Res.J.Recent Sci. International Science Congress Association 39 Clock Mesh Synthesis with Blockage Placement for Stub and Mesh Wire MinimisationKoundinya C. and J Selva KumarVLSI Design, SRM University, Chennai, Tamil Nadu, INDIA Electronics and Communication Dept.,SRM University, Chennai,Tamil Nadu, INDIA Available online at: www.isca.in , www.isca.me Received 11th April 2014, revised 26th May 2014, accepted 28th May 2014Abstract A method for optimizing clock mesh is proposed which diminishes the power in network significantly while considering timing slack. The proposed paper implements the methodology by introducing placement blockages during the IC design flow. Stub wire minimization is achieved by using register placement. Placement of blockages in order to further reduce the power dissipation along with optimized power density and guaranteed non negative timing slack is the main essence of the paper. The advantages of the proposed method when implemented over ISCAS’89 benchmark circuits are reduced power dissipation—26% on an average when analogized to previous works and significant reduction in wire length by 50% when compared to earlier uniform mesh works. Further, reduction of timing slack (1.2% of the clock period) provides added benefit. Keywords: Clock mesh, CTS, tree hierarchy, Iterative placement. Introduction In real time designs where high performance and low skew variations are required, recurrence in clock structures is preferred. Spines within clock tree or cross links among these clock structures are implemented to minimize clock skew in local areas while methods like clock mesh sizing of clock distribution networks for high performance, clock distribution network for microprocessors 2 and the design and analysis of the clock distribution network for a 1.2 GHz Alpha microprocessor assures low skew variation in global level. The shorting of all the sink registers to the horizontal and vertical metal lines induces redundancy which minimizes the skew at the cost of high power consumption compared to other clock structures. Design automation methods have been evolved in clock mesh synthesis and optimization due to the popularity of clock mesh structures in high performance designs. In method proposed in combinatorial algorithms for fast clock mesh optimization, insertion and sizing off buffer drivers as well as reduction of mesh were proposed for power reduction. The method proposed in, meshworks: An efficient framework for planning, synthesis and optimization of clock mesh networks involves reduction of mesh wires under skew constraints. In Timing slack aware incremental register placement with non-uniform grid generation for clock mesh synthesis and in non-uniform clock mesh optimization with linear programming buffer insertioniterative movement of the mesh wires is introduced in order to reduce the stub wires from sink to the mesh. In the paper, an algorithm for routing with capacitance/distance constraints for clock distribution in microprocessors, connections similar to steiner tree between register and mesh are developed to reduce the stub wire length. The novel approach proposed in this paper is to synthesize clock mesh combines traditional physical design flow of placement and clock network synthesis through iterative placement. In this paper, the placement of blockages is done while taking the timing slack into account such that the mesh wire along with the stub wire is reduced significantly thereby improving the power consumption of the overall design. The rest of the paper is categorized as follows. In section II, the history of clock mesh is discussed. Later in section III, the proposed method is presented. The results are summarized in section IV. The paper is concluded in section V. Preliminaries: Clock meshes are preferred to minimize the crosstalk in physical IC design. Optimization is done by using the full redundancy of the mesh tracks. The two major factors when defining a clock mesh are total wire length which in turn has an impact over the total power and the clock skew which is caused due to both internal and environmental variations. In section II-A the wire length of the mesh is discussed and in section II-B skew variations and timing slack are presented. Wirelength Determination in Clock Mesh: Clock mesh network is illustrated in the figure 1. The mesh wire length is contributed by the stub wire length and grid wire length. These play an important role in the dynamic power dissipation of the Research Journal of Recent Sciences ______ _ Vol. 3(7), 39-43, July (2014) International Science Congress Association design. The total wire length depends directly on the switching capacitance in the clock network. Thus various methods have been developed in order to reduce the power consumption in clock networks. The works done in combinatorial algori fast clock mesh optimization, meshworks: An efficient framework for planning, synthesis and optimization of clock mesh networks minimizes mesh wire and the methods implemented in, An algorithm for routing with capacitance/distance constraints fo r clock distribution in microprocessors, Non- uniform clock mesh optimization with linear programming buffer insertion diminish stub wire. The method proposed in this work implements placement blockages along with the method proposed by J.Lu et.al, which both stub wire and mesh wire length. Analysis of Clock Skew in Mesh Networks: is estimated as follows: maxmaxstubstubmeshbufskewskewWhere bufskew, maxmesh and max stub stub introduced by the buffer drivers of mesh, the maximum delay from a buffer driver on the mesh to the stub wire tapping point, and the utmost delay from a tapping point of a stub wire to the sink register, respectively. According to (1), the contradicting feature is if low skew is required then a denser mesh is opted which in turn increases the power consumption. Thus a tradeoff is required between power and skew. This can be maintained by using the method proposed by J.Lu, et.al10. Timing Constraints: As shown in figure 1 a local data path consisting of two registers R (initial and combinational block is considered.ifPMin and the minimum and maximum propagation delays on the combinational block respectively. CQ represents initial register ) clock-to-output delay, whereas f is setup time of clock delays to the registers R and R are represented by parameters and t , respectively, and T represents the clock period. By complying with the setup and hold timing constraints the timing analysis is performed for each data path Setup: ifPMaxCQ - Hold: ifPMinCQ The placement and routing is done after choosing the time period T, which guarantees the setup and hold time constraints. The slackness in the setup time constraint is con more vulnerable as the hold timing violations can be fixed by _ ________________________________ ______________ Congress Association design. The total wire length depends directly on the switching capacitance in the clock network. Thus various methods have been developed in order to reduce the power consumption in combinatorial algori thms for meshworks: An efficient framework for planning, synthesis and optimization of clock minimizes mesh wire and the methods An algorithm for routing with r clock distribution in uniform clock mesh optimization with diminish stub wire. The method proposed in this work implements placement blockages along with the method proposed by J.Lu et.al, which reduces Analysis of Clock Skew in Mesh Networks: Skew of the mesh (1) max stub are the skew introduced by the buffer drivers of mesh, the maximum delay from a buffer driver on the mesh to the stub wire tapping point, and the utmost delay from a tapping point of a stub wire to the contradicting feature is if low skew is required then a denser mesh is opted which in turn increases the power consumption. Thus a tradeoff is required between power and skew. This can be maintained by using the method proposed As shown in figure 1 a local data path Ri and (finaland a and ifPMax represent the minimum and maximum propagation delays on the represents initial register is setup time of . The are represented by , respectively, and T represents the clock By complying with the setup and hold timing constraints the timing analysis is performed for each data path if - (2) (3) The placement and routing is done after choosing the time period T, which guarantees the setup and hold time constraints. The slackness in the setup time constraint is con sidered to be more vulnerable as the hold timing violations can be fixed by inserting delay on a data path. The timing slack path in timing analysis can be evaluated as follows ifPMaxCQif Figure - Register to register timing path illustration Proposed Method The proposed method modifies the work done in Lu,et.al., timing analysis is performed and placement blockages are generated. This method considers the usage of uniform mesh and thus the wire minimization is done by placing blockages. An algorithm is then formulated depending on the timing constraints to place the blockages. Finally, top level tree is generated by inserting buffer drivers. T method are discussed through the following sections. Feasible Moving Regions Generation: implements iterative placement of registers towards the clock mesh. The timing slacks are taken into consideration in order to e nsure functional correctness of the design. The feasible moving region of each register is based on the timing slack to implement the clock mesh. The feasible moving region of a register generated on the assumption that the other registers are unmove d. The register placement is done later after the placement blockages are generated depending on the feasible moving regions of the registers. The timing slack of any path is associated with the physical paths between the registers. Thus the movement of t affects the timing slack of the entire design i.e. only the initial register’s fan- out and the final register’s fan Thus the path delay ifPMax is composed of three parts as illustrated in figure 4. fiiffoifPMax fo is the delay of the wire from output of the initial register to the input of the th fan- out of the register the wire delay bet ween the input of i and the fan- in gate input of register between the input of the fan- in gate to the input of register fi. ______________ _________ ISSN 2277-2502 Res. J. Recent Sci. 40 inserting delay on a data path. The timing slack if on each path in timing analysis can be evaluated as follows (4) - 1 Register to register timing path illustration The proposed method modifies the work done in Lu,et.al., 10. A timing analysis is performed and placement blockages are considers the usage of uniform mesh and thus the wire minimization is done by placing blockages. An algorithm is then formulated depending on the timing constraints to place the blockages. Finally, top level tree is generated by inserting buffer drivers. T he stages of the proposed method are discussed through the following sections. Feasible Moving Regions Generation: This method implements iterative placement of registers towards the clock mesh. The timing slacks are taken into consideration in order to nsure functional correctness of the design. The feasible moving region of each register is based on the timing slack to implement the clock mesh. The feasible moving region of a register is generated on the assumption that the other registers are d. The register placement is done later after the placement blockages are generated depending on the feasible The timing slack of any path is associated with the physical paths between the registers. Thus the movement of t he registers affects the timing slack of the entire design i.e. only the initial out and the final register’s fan -in are affected. is composed of three parts as (5) is the delay of the wire from output of the initial register i out of the register i . The gates and ween the input of th fan-out gate of register in gate input of register f is if. The delay in gate to the input of register f is Research Journal of Recent Sciences ______ _ Vol. 3(7), 39-43, July (2014) International Science Congress Association Thus using (5) the timing slack if in (4) can be rewritten as follows: fiiffoCQif In order to ensure functional correctness the timing slack of each register should be non negative. Figure-4 Delay compositions Delays CQfoand fi are monotonically increasing functions as they are dependent on the wire lengths i.e. the fan-out and fan- in wire length, respectively. Considering a non negative slack, the maximum wirelength for every fan- out gate and maximum wirelength fan- in gate can be calculated for register moving region generation is illustrated in fig. 5. A rectangular region with a radius of fo is generated such that the manhattan distance of any point on the register is equal to the radius Thus as long as the register stays within the rectangular boundary the timing slack of the path is satisfied. This process is repeated for all the fan-ins and fan- outs associated with the register i. After the creation of the feasible moving region, it is not necessary that the moving of the register always guarantees minimum timing slack due to the following reasons: the delay is generally affected by the total wirelength but here only the radius of the rectangular region is considered and when one register m oves it changes the feasible regions of the other registers. In case of some registers the timing slack is best when unmoved. The iterative placement blockages presented in following section considers the feasible moving region generation and then generate s the placement blockages. Generation of Placement Blockages: The placement blockages are inserted so as not to allow free movement of registers due to several factors. Some of them include, aggressive movement may introduce overlapping and it may negatively affect the timing slack of the data path. In order to over comings a flow is proposed for placing blockages: i. Stub wire minimization problem as illustrated in , Integrated clock mesh synthesis with incremental register placement iterative register placement results10 . ii. If the registers introduces negative effect on the timing slack then a _ ________________________________ ______________ Congress Association in (4) can be rewritten as (6) In order to ensure functional correctness the timing slack of are monotonically increasing functions as they are dependent on the wire lengths foand fi in wire length, respectively. Considering a non negative slack, the maximum wirelength fo out gate and maximum wirelength fi for each in gate can be calculated for register . The feasible moving region generation is illustrated in fig. 5. A rectangular is generated such that the manhattan distance of any point on the register is equal to the radius fo. long as the register stays within the rectangular boundary the timing slack of the path is satisfied. This process is outs associated with the After the creation of the feasible moving region, it is not necessary that the moving of the register always guarantees minimum timing slack due to the following reasons: the delay is generally affected by the total wirelength but here only the radius of the rectangular region is considered and when one oves it changes the feasible regions of the other registers. In case of some registers the timing slack is best when unmoved. The iterative placement blockages presented in following section considers the feasible moving region s the placement blockages. The placement blockages are inserted so as not to allow free movement of registers due to several factors. Some of them include, aggressive movement may introduce overlapping and it may negatively affect the timing slack of the data path. In order to over come these short comings a flow is proposed for placing blockages: i. Stub wire , Integrated clock mesh synthesis with incremental register placement is used to obtain . ii. If the moving of registers introduces negative effect on the timing slack then a placement blockage is introduced. iii. If the feasible regions of the two regions are overlapping then depending on the overlapping constraints placement blocks are introduced. iv. given two registers do not have any kind of intersection then the placement blockages are eliminated. Figure - Feasible moving region of a register Creation of Clock Mesh and placement: comprises of mesh network formed by horizontal and vertical placement of grids along with stub wires for connecting the registers to the grids. In this paper a uniform mesh is generated in order to minimize the complexity due to placement b lockages. The main objective of the mesh is to minimize the global skew. The mesh tracks are created on the chip area where mesh track represent the placement of the mesh wire on the chip. The figure 6 illustrates the placement of the mesh tracks as well moving regions. hi and vi represent the horizontal and vertical locations where mesh can be placed, respectively. The clock mesh implemented on s13207 benchmark circuit is illustrated in figure 7. The cloc k mesh is implemented over 6*6 mesh track. Experimental Results To drive the mesh grid a top level clock tree is generated using H- tree algorithm. The proposed method flow is implemented using encounter 11.13 of cadence placement, routing, timing analysis and power analysis. The benchmark circuits over which the method in this paper is implemented are the five largest circuits from ISCAS’89 benchmark. ______________ _________ ISSN 2277-2502 Res. J. Recent Sci. 41 placement blockage is introduced. iii. If the feasible regions of the two regions are overlapping then depending on the overlapping constraints placement blocks are introduced. iv. If given two registers do not have any kind of intersection then the placement blockages are eliminated. - 5 Feasible moving region of a register Creation of Clock Mesh and placement: The clock mesh comprises of mesh network formed by horizontal and vertical placement of grids along with stub wires for connecting the registers to the grids. In this paper a uniform mesh is generated in order to minimize the complexity due to placement lockages. The main objective of the mesh is to minimize the The mesh tracks are created on the chip area where mesh track represent the placement of the mesh wire on the chip. The figure 6 illustrates the placement of the mesh tracks as well as feasible represent the horizontal and vertical locations where mesh can be placed, respectively. The clock mesh implemented on s13207 benchmark circuit is k mesh is implemented over 6*6 To drive the mesh grid a top level clock tree is generated using tree algorithm. The proposed method flow is implemented cadence in order to perform initial placement, routing, timing analysis and power analysis. The benchmark circuits over which the method in this paper is implemented are the five largest circuits from ISCAS’89 Research Journal of Recent Sciences ______________________________________________________________ ISSN 2277-2502Vol. 3(7), 39-43, July (2014) Res. J. Recent Sci. International Science Congress Association 42 Figure-6 Mesh tracks and feasible moving regions Figure-7 Clock mesh placement of s13207 benchmark circuit with 6*6 mesh Table-1 Wire length comparison between proposed and optimized method in non -uniform clock mesh optimization with linear programming buffer insertionCircuit Proposed method in clock mesh optimization with linear programmingProposed method Improvement Grid Stub wire (µm) Mesh (µm) Grid Stub wire (µm) Mesh (µm) Stub Mesh Total s13207 8*8 3281 4848 8*8 489 3538 85.3% 27.06% 50.4% S15850 8*8 2226 4062 8*8 288 2285 87.06% 43.7% 59.08% S35932 12*12 10112 10871 12*12 1085 8157 89.2% 24.1% 55.9% S38417 12*12 8839 10794 12*12 1392 8332 84.2% 22.8% 50.4% S38584 11*11 8533 12668 11*11 774 10794 90.9% 14.7% 45.4% Table-2 Capacitance and clock skew Circuit Proposed method in clock mesh optimization with linear programmingProposed method Skew (ps) Cap (fF) Skew (ps) Cap (fF) S13207 0.7 5656 0.4 3127 S15850 0.8 4935 0.7 1837 S35932 0.7 13169 1.7 7112 S38417 1.4 12791 1.0 7280 S38584 1.1 13531 0.9 7428 Research Journal of Recent Sciences ______________________________________________________________ ISSN 2277-2502Vol. 3(7), 39-43, July (2014) Res. J. Recent Sci. International Science Congress Association 43 The proposed placement of blockages flow is implemented using C. The LP formulation as proposed in, Integrated clock mesh synthesis with incremental register placement was implemented using GLPK in Linux machine10. The results obtained are compared with the results obtained in proposed method in clock mesh optimization with linear programming after performing the iterative register placement for uniform meshes. Since the proposed flow doesn’t consider the non uniform mesh placement only the uniform mesh analysis of the proposed method in, Non-uniform clock mesh optimization with linear programming buffer insertion is considered. In order to demonstrate the reduction in the power consumption and the wire length these comparisons are performed. The results demonstrated in table I illustrate the comparison of the bench mark circuits with uniform clock mesh. The stub wire reduction is by 89% and the mesh wirelength by 25%. The overall reduction on an average is by 53.2%. The power and slack comparisons are shown in table II. The reduction in power is synchronous to the length of the wire reduced. The switching capacitance as well implies the power consumption in the circuit. Compared to the method used in, Non-uniform clock mesh optimization with linear programming buffer insertion 7 the total capacitance is reduced by 43.5% . The overall skew on an average is reduced by 0.3ps using the proposed method. Conclusion A clock mesh synthesis along with placement flow is proposed in the paper. This method considers power density as well as timing slack and creates placement blockages for iterative register placement for a uniform clock mesh. This method is able to reduce the power by reducing the switching capacitance by wirelength reduction. The proposed method discusses methodology flow on uniform meshes, further it can be extended to non uniform clock mesh and yet powerful algorithm for register placement can be formulated which allows simultaneous movement of registers. The results propose effective flow and can be integrated into industrial designs. References 1.Desai M., Cvijetic R. and Jensen J., Sizing of clock distribution networks for high performance cpu chips, in Proc. ACM/IEEE DAC, 389–394 (1996)2.Restle P.J., Mcnamara T.G., Webber D.A., Camporese P.J., Eng K.F., Jenkins K.A., Member S., Allen D.H., Rohn M.J., Quaranta M.P., Boerstler D.W., Alpert C.J., Carter C.A., Bailey R.N., Petrovick J.G., Krauter B.L. and Mccredie B.D., A clock distribution network for microprocessors, IEEE J. Solid-State Circuits, 36(5), 792–799 (2001)3.Xanthopoulos J.T., Bailey D., Gangwar A., Gowan M., Jain A. and Prewitt B., The design and analysis of the clock distribution network for a 1.2 GHz Alpha microprocessor, in Proc. IEEE ISSCC, 402–403 (2001) 4.Venkataraman G., Feng Z., Hu J. and Li P., Combinatorial algorithms for fast clock mesh optimization, IEEE Trans. Very Large Scale Integr. Syst., 18(1), 131–141 (2010)5.Rajaram A. and Pan D., MeshWorks: An efficient framework for planning, synthesis and optimization of clock mesh networks, in Proc. ASPDAC, 250–257 (2008) 6.Lu J., Mao X. and Taskin B., Timing slack aware incremental register placement with non-uniform grid generation for clock mesh synthesis, in Proc. ISPD, 131–138 (2011) 7.Guthaus M.R., Wilke G. and Reis R., Non-uniform clock mesh optimization with linear programming buffer insertion, in Proc. ACM/IEEE DAC 74–79 (2010) 8.Shelar R.S., An algorithm for routing with capacitance/distance constraints for clock distribution in microprocessors, in Proc. ISPD, 141–148 (2009)9.Lu J., Aksehir Y. and Taskin B., Register on MEsh (ROME): A novel approach for clock mesh network synthesis, in Proc. IEEE ISCAS, 1219–1222 (2011) 10.Lu J., Mao X. and Taskin B., Integrated clock mesh synthesis with incremental register placement, in Proc. IEEE, 217–227 (2012) 11.Sumitha J., Routing Algorithms in Networks, Res. J. Recent Sci.,3(ISC-2013), 1-3 (2014)