Stopwatch Tutorial Is Finally Posted!lobster Productions



This simple tutorial shows you how to use StopWatch, a diagnostic tool in C#, to easily determine how long it takes (part of) your application to run. This is a simple way to check the speed of.

Introduction

Stopwatch Tutorial Is Finally Posted!lobster ProductionsPosted!lobster
  1. It's finally done! Thanks to TVPaint Animation 11, save automatically the files you are using. Combine this feature to the Auto Backup and you ensure yourself to never ever lose your work in case of crash system or power cut. Save options are now so numerous.
  2. Stopwatch class. Stopwatch class may seem like primitive class that makes some date math behind the scenes but it actually doesn't hold true. With Stopwatch class it is possible to make very accurate measurements if operating system and computer hardware support high-resolution performance counter.

On this page, we will present a stopwatch design. It is similar to the designin the Xilinx ISE tutorial. Wewill tackle it 'the MyHDL way' and take it from spec to implementation.

This is an extensive example, and we will use it to present all aspects of aMyHDL-based design flow. It's also a relatively advanced. If you havedifficulties understanding the material on this page, consider reading thefirst chapters of themanual or theearlier examples in this Cookbook first.

Specification

Compared to the design in the Xilinx ISE tutorial, our design is somewhatsimplified. The intention is not to avoid complexity, but merely to make thecode and the explanations better fit on a single web page. In particular, ourstopwatch will only have three digits: two digits for the seconds, and one forthe tenths of a second. Also, we will not consider clock generation issues andsimply assume that a 10Hz clock is available.

The interface of the stopwatch design looks as follows:

Architecture

A stopwatch system is naturally partitioned as follows:

  • a subsystem that counts time, expressed as digits in bcd (binary coded decimal) code
  • a subsystem that displays the count, by converting each bcd digit to a 7 segment led display

A natural partitioning often works best, and that's how we will approach thedesign. We will first design a time counter and then a bcd to led convertor.

Time counter design

Approach

Stopwatch Tutorial Is Finally Posted!lobster Productions

One of the goals of the MyHDL project is to promote the use of modern softwaredevelopment techniques for hardware design. One such technique is the conceptof unit testing, a cornerstone of extreme programming (XP).

Unit testing means writing a dedicated test for each building block of adesign, and aggregating all tests in a regression test suite using a unit testframework. Moreover, the XP idea is to write the unit test first, before theactual implementation. This makes sure that the test writer concentrates on allaspects of the high level specification, without being influenced by lowerlevel implementation details.

At the start of an implementation, the existing unit test will fail, and itwill continue to do so until a valid implementation is achieved. The unit testthus serves as a metric for completion. Moreover, to see the unit test fail onincomplete or invalid designs enhances the confidence in the test qualityitself. This is of crucial importance when making design changes later on.

Unit test

To write a unit test for building block, we need two things: the specificationand the interface. The specification was described in previous sections. Theinterface of the time counter looks as follows:

The actual implementation is left open for now. We will first write the test, using the interface.

The following code is the unit test for the time counter subsystem:

Posted!lobster

dut is the design under test. clkgen is a clock generator. action definesthe stopwatch state, based on a rising edge on either of the input signalsstartstop or reset. counter maintains the expected time count. monitoris the actual test: it asserts that the actual time count from the designequals the expected time count. Finally, stimulus defines a number of testcases for the stopwatch. Note that it has an inner for loop over signals, asa concise way to define test patterns. This is straightforward in Python. Butthink for a moment on how you would do it in Verilog or VHDL.

Also in stimulus, note the yield clock.negedge statement. This statementsynchronizes signal changes with the falling clock edge. This is needed toavoid race conditions when signals change 'simultaneously' with the risingclock edge. This is commonly done in digital tests. As you can expect, thisstatement was not present in the first version of the test: it was added afterthe test was run against the implementation and found to fail occasionally,even when the implementation was believed to be correct. This shows that inpractice there may be a good reason why a test needs to be adapted to geteverything working. But it in any case it is better to start with a 'general'unit test that is not influenced by an implementation.

Our unit test is now ready to run. We could actually run it directly against animplementation. However, we will use it via the unit testing frameworkpy.test instead. The framework provides the following functionality:

  • it redefines the Python assert statement for extensive error reporting
  • it looks up and runs each method whose name starts with 'test_'
  • it looks up test modules by searching for modules whose name starts with 'test_'

There's a lot more to say about py.test and you are probably also curiouswhere to get it from. You can find that info further on this page, in thesection More about py.test.

Design

The following is an implementation of the time counter, in file TimeCount.py:

py.test confirms that this is a valid implementation:

bcd to led convertor design

Approach

For the design of the bcd to led convertor , we will follow a similar approachas before. We will write a unit test first, and then use it to complete thedesign.

We first put the encoding data in a separate module, seven_segment.py, tomake it reusable. The appropriate data structure for the encoding is adictionary:

Unit test

This is the unit test, in test_bcd2led.py:

This test asserts that the led output from the design matches the appropriateencoding for a digit.

Design

Here is an implementation, in bcd2led.py:

Note how we derive the tuple code from the encoding dictionary. We need atuple because that's the data structure that the Verilog convertor supports.It maps tuple indexing to a case statement to support ROM inferencing bysynthesis tools.

When we run py.test, we get the following output:

Note that when run with no arguments, py.test finds and runs all testmodules. This is done recursively through all subdirectories, making itstraightforward to run a full regression test suite.

Top level design

The top-level design in StopWatch.py is just an assembly of the previouslydesigned modules:

Implementation

Automatic conversion to Verilog

To go to an implementation, we first convert the design to Verilogautomatically, using MyHDL's toVerilog function:

The resulting Verilog code is included in full:

Note how the Verilog convertor expands the hierarchical design into a 'flat netlist of always blocks'. The Verilog ouput is really an intermediate steptowards an implementation. The whole design is flat and contained in a singlefile, which may make it easier to hand it off to back-end synthesis andimplementation tools.

Note also how the convertor expands tuple indexing in MyHDL into a casestatement in Verilog.

Synthesis

We will synthesize the design with Xilinx ISE 8.1. We first create a project inthe ISE environment, add the source of the Verilog file to it, and we are readyto go.

The following is extracted from the synthesis report. It shows how thesynthesis tool recognizes higher-level functions such as ROMs and counters:

How these blocks are actually implemented depends on the target technology andthe capabilities of the synthesis tool.

Stopwatch Tutorial Is Finally Posted!lobster Productions

You can review the full FPGA synthesis report here.

FPGA implementation

The FPGA implementation report can be reviewed here.

CPLD implementation

The same design was also targetted to a CPLD technology. The detailed reportcan be viewed here.

More about py.test

To verify the stopwatch design, we have been using py.test. However, this isnot the only unit testing framework available for Python. In fact, the standardunit testing framework that comes with Python is the unittest module. Theunittest framework is presented in the MyHDL manual, and is used to verifyMyHDL itself. On the other hand, py.test is not part of the standard Pythonlibrary currently. Why then did we use py.test in this case?

The reason is that I believe that py.test will be a better option in thefuture. As demonstrated on this page, py.test is non-intrusive. The onlything we need to do for basic usage is to obey some simple naming conventionsand to use the assert statement for testing - things we might want to dowithout a testing framework anyway. In contrast, unittest requires us towrap our tests into dedicated subclasses and to use special test methods. Thiscan be especially awkward with MyHDL, because MyHDL hardware is typicallydescribed using top-level and embedded functions, not classes and methods.

In short, it is much easier to develop unit tests with py.test than it iswith unittest, in particular in the case of MyHDL code. However, py.testalso has its disadvantages:

  • As py.test is not part of the standard Python library, it has to beinstalled separately.
  • py.test is currently not distributed in a convential way such as a tarfile. It is part of the py.lib library that has to be checked out from asubversion repository. This requires the installation of a subversion client.
  • The use of the assert statement for unit testing is controversial inPython. The assert statement is originally intended for programmer usage,to make programs safer. However, in my opinion the use of assert fortesting is natural and warranted.
  • py.test uses a lot of 'magic' behind the scenes to modify Python's behaviorfor its purposes, such as extensive error reporting.

However, I believe that the benefits are far more important than thedisadvantages. Moreover, some disadvantages may disappear over time.Consequently, I plan to promote py.test as the unit testing framework ofchoice for MyHDL in the future.

More info on the usage and installation of py.test can be foundhere.

A prerequisite before we dive into the difference of measuring time in Python is to understand various types of time in the computing world. The first type of time is called CPU or execution time, which measures how much time a CPU spent on executing a program. The second type of time is called wall-clock time, which measures the total time to execute a program in a computer. The wall-clock time is also called elapsed or running time. Compared to the CPU time, the wall-clock time is often longer because the CPU executing the measured program may also be executing other program's instructions at the same time.

Another important concept is the so-called system time, which is measured by the system clock. System time represents a computer system's notion of the passing of time. One should remember that the system clock could be modified by the operating system, thus modifying the system time.

Python's time module provides various time-related functions. Since most of the time functions call platform-specific C library functions with the same name, the semantics of these functions are platform-dependent.

Two useful functions for time measurement are time.time and time.clock. time.time returns the time in seconds since the epoch, i.e., the point where the time starts. For any operatin system, you can always run time.gmtime(0) to find out what epoch is on the given system. For Unix, the epoch is January 1, 1970. For Windows, the epoch is January 1, 1601. time.time is often used to benchmark a program on Windows. While time.time behaves the same on Unix and on Windows, time.clock has different meanings. On Unix, time.clock returns the current processor time expressed in seconds, i.e., the CPU time it takes to execute the current thread so far. While on Windows, it returns the wall-clock time expressed in seconds elapsed since the first call to this function, based on the Win32 function QueryPerformanceCounter. Another difference between time.time and time.clock is that time.time could return a lower-value than a previous call if the system clock has been set back between the two calls while time.clock always return non-decreasing values.

Here is an example of running time.time and time.clock on a Unix machine:

# On a Unix-based OS

2
4
6
>>>print(time.time(),time.clock())
>>>time.sleep(1)
1359147653.310.02168

time.time() shows that the wall-clock time has passed approximately one second while time.clock() shows the CPU time spent on the current process is less than 1 microsecond. time.clock() has a much higher precision than time.time().

Running the same program under Windows gives back completely different results:

On Windows

2
4
6
>>>print(time.time(),time.clock())
>>>time.sleep(1)
1359147764.041.01088769662

Both time.time() and time.clock() show that the wall-clock time passed approximately one second. Unlike Unix, time.clock() does not return the CPU time, instead it returns the wall-clock time with a higher precision than time.time().

Given the platform-dependent behavior of time.time() and time.clock(), which one should we use to measure the 'exact' performance of a program? Well, it depends. If the program is expected to run in a system that almost dedicates more than enough resources to the program, i.e., a dedicated web server running a Python-based web application, then measuring the program using time.clock() makes sense since the web application probably will be the major program running on the server. If the program is expected to run in a system that also runs lots of other programs at the same time, then measuring the program using time.time() makes sense. Most often than not, we should use a wall-clock-based timer to measure a program's performance since it often reflects the productions environment.

Instead of dealing with the different behaviors of time.time() and time.clock() on different platforms, which is often error-prone, Python's timeit module provides a simple way for timing. Besides calling it directly from code, you can also call it from the command-line.

For example:

On a Unix-based OS

2
4
10000loops,best of3:365usec per loop
%python-mtimeit-n10000'map(lambda x: x^2, range(1000))'

# On Windows

2
4
C:Python27>python.exe-mtimeit-n10000'[v for v in range(10000)]'
C:Python27>python.exe-mtimeit-n10000'map(lambda x: x^2, range(1000))'

In IDLE

2
4
6
8
10
>>>total_time=timeit.timeit('[v for v in range(10000)]',number=10000)
3.60528302192688# total wall-clock time to execute the statement 10000 times
0.00036052830219268796# average time per loop
>>>total_time=timeit.timeit('[v for v in range(10000)]',number=10000)
3.786295175552368# total wall-lock time to execute the statement 10000 times
0.0003786295175552368# average time per loop

Stopwatch Tutorial Is Finally Posted Lobster Productions Online

Which timer is timeit using? According to timeit’s source code, it uses the best timer available:

2
4
6
8
# On Windows, the best timer is time.clock
else:
# On most other platforms the best timer is time.time

Another important mechanism of timeit is that it disables the garbage collector during execution, as shown in the following code:

2
4
6
8
gc.disable()
timing=self.inner(it,self.timer)
ifgcold:

If garbage collection should be enabled to measure the program's performance more accurately, i.e., when the program allocates and de-allocates lots of objects, then you should enable it during the setup:

Stopwatch Tutorial Is Finally Posted Lobster Productions Going

2
>>>timeit.timeit('[v for v in range(10000)]',setup='gc.enable()',number=10000)

Stopwatch Tutorial Is Finally Posted Lobster Productions Free

Except for very special cases, you should always use the module timeit to benchmark a program. In addition, it is valuable to remember that measuring the performance of a program is always context-dependent since no program is executing in a system with boundless computing resources and an average time measured from a number of loops is always better than one time measured in one execution.