Debugging in TensorFlow

This article discusses the basics of TensorFlow and also dives deep into debugging in TensorFlow in Python. We will see debugging techniques, and debugging tools, and also get to know about common TensorFlow errors.

TensorFlow

TensorFlow is an open-source library that helps develop, deploy, and train models such as machine learning, deep learning, and neural networks. It is developed by Google Brain Team and written in Python, C++, and CUDA. It provides a flexible framework that makes it easy for developers to model high-level APIs. Let us try first to find the meaning of TensorFlow! TensorFlow is basically a software library for numerical computation using data flow graphs.

Mathematical operations are represented by a graph in the Node.
The intermediate multidimensional data arrays (known as tensors) are represented by the edges in the graph.

What sets it apart is its ability to run on multiple platforms, including CPU, GPU, and mobile devices. Its flexibility and scalability make it one of the most famous machine-learning libraries and left PyTorch behind. It is also best for fast debugging and easy prototyping, using eager execution.

Code is easy to write and understand, but debugging in TensorFlow can be a tedious, irritating, and challenging task for you if you don't use the right techniques and tools. Here are some of the most useful techniques:

Debug TensorFlow by Printing values within session.run

Firstly, a session environment is created using TensorFlow objects to execute them in a print statement. This method is very easy and fast. This session object contains an environment that assists other objects to be evaluated and executed in it. Furthermore, the session avoids the number of errors that can affect functionality outside of the session.

Python3

import tensorflow as tf

# Eager execution is automatic in TensorFlow 2.x
# and needs to be disabled in this instance:
tf.compat.v1.disable_eager_execution()

# Create a graph:
x = tf.constant(6.0)
y = tf.constant(4.0)
z = x * y

# Launch the graph in a session:
sess = tf.compat.v1.Session()

# Evaluate tensor 'c' in this session:
print(sess.run(z))

Output:

Using the tf.print operation for TensorFlow debugging.

Python uses tf.print operation instead of print which prints the tensor values during graph execution. During the runtime evaluation, the tf.print method proves useful when we prefer not to explicitly retrieve the code using session.run(). It helps to see the change or evolution of values during evaluation. Arguments that can be passed in this function are as below.

inputs	The inputs to the print function are positional arguments. In the printed output, these inputs will be separated by spaces.
output_stream	The output stream, logging level, or file to which the printing will be directed.
summarize	For each dimension, the first and last summarization elements are recursively printed per Tensor.
sep	separate the input using a string. " " is the default.
end	The end character is added at the end of the printed string. By default, it is set to the newline character
name	it is optional

During graph tracing, this function returns a TensorFlow operator that prints the specified inputs in the provided output stream or logging level.

Python3

import sys
# using tf.function
@tf.function
def f():
    tensor = tf.range(10)
    tf.print(tensor,
             output_stream=sys.stderr)
    return tensor

range_tensor = f()

Output:

(This prints "[0 1 2 ... 7 8 9]" to sys.stderr)

Debug TensorFlow by Increasing Logging

Importing the necessary library (import logging) is the first step to utilizing logfiles as a potential source of debugging information. TensorFlow logging facilities typically offer support for various levels of severity. TensorFlow includes the following five standard severity levels, listed in order of increasing severity: DEBUG, ERROR, FATAL, INFO, * WARN

tf.logging.DEBUG: Detailed information for debugging.
tf.logging.INFO: Informational messages.
tf.logging.WARN: Warning messages.
tf.logging.ERROR: Error messages.
tf.logging.FATAL: Critical error messages.

We can also use the tf.debugging.* functions to add extra debug-specific information, such as checking tensor values or enabling TensorFlow's runtime options for debugging. Remember to adjust the logging level back to an appropriate setting once the debugging process is complete to avoid excessive logging overhead.

Python3

tf.logging.set_verbosity(tf.logging.DEBUG)

tf.logging.log(tf.logging.DEBUG,
               "Debugging message")
tf.logging.info("Informational message")

TensorBoard visualization

TensorBoard is a visualization toolkit that is used for debugging and measuring TensorFlow models. It comes with a graphical user interface for analyzing and visualizing different aspects of your model, such as the log loss, accuracy, and gradient. Its main function is to monitor the performance of the model and that is why we can also call it a monitoring tool.

It can be installed either via pip or via conda. This is the installation command.

Python3

pip install tensorboard
conda install -c conda-forge tensorboard

Output:

To implement TensorBoard, your need to add some summary to your code that takes the values of your tensor and variables. Before seeing the example let's create a summary file writer for the given log directory first.

Syntax:

tf.summary.create_file_writer(logdir, max_queue=None, flush_millis=None, filename_suffix=None, name=None, experimental_trackable=False)

Arguments

logdir - specifying the directory of a string
max_queue - It will keep largest number summaries in a queue; will erase once the queue gets larger than this.
flush_millis - largest interval between flushes. Defaults to 120,000.
filename_suffix - suffix that act as optional for the event file name. Defaults to .v2.
name - creates the writer for the op

Return:

A SummaryWriter object.

Python3

# importing tensorflow
import tensorflow as tf
# create two constant nodes a and b
a = tf.constant(5)
b = tf.constant(3)
# add and created third node
c = tf.add(a, b)

tf.summary.scalar('c', c)

summary_writer = tf.summary.create_file_writer('logs/')
with summary_writer.as_default():
    tf.summary.scalar('c', c, step=0)

Output:

Output of the code will be an empty summary file in the logs/ directory.

TensorBoard Debugger

TensorFlow also provides a built-in debugger called tf. debugging. This debugger can be used to set breakpoints, watch variables, and step through your code to identify errors. We can debug a particular node by selecting it individually and also control the execution of the model using a graph.

Syntax:

tf.debugging.assert_equal(
x, y, message=None, summarize=None, name=None
)

Arguments:

x - Numeric Tensor
y - It is also a numeric Tensor and broadcastable to x.
summarize - Many numbers of entries can be printed for each tensor.
message - A string to prefix to the default message
name - It is optional. Default to "assert_equal".

Returns:

InvalidArgumentError if x == y is False. It can be used with "tf.control_dependencies" inside of "tf.functions" to block followup computation.

Here is an example of how to use the TensorFlow debugger:

Python3

import tensorflow as tf

a = tf.constant(5)
b = tf.constant(3)
c = tf.add(a, b)

tf.debugging.assert_equal(c, 8)

Output:

<tf.Operation 'assert_equal_1/Assert/Assert' type=Assert>

Debug TensorFlow Using the API Functions

Utilize the API to inspect functions for bugs, errors, and conditions that evaluate to True or False based on specific specifications.

For example, tf.debugging.assert_shapes use to assert tensor shape and dimension size relationship between tensors. This Op checks that a collection of tensors shape relationships satisfies given constraints.

Advantages of TensorFlow Debugging

The tensorBoard application is easy to perceive and very user-friendly.
With the help debugger, we will get to know about the cleaning requirements needed for our training data.
The Tensor Board GUI allows us to execute each step of our model.
During the training of the algorithm, we can use debugging to identify the output value at a specific stage.
The performance of our algorithm can be visualized in a graphical format using the Tensor Board application.

Disadvantages of TensorFlow Debugging

Complexity: It might be complex for developers who are not familiar with the underlying computation graph or the various components of the TensorFlow framework.
Time-consuming: If the model is complex then it can consume much time and also hard to find the source of the error.
Interruptions: Debugging in TensorFlow can interrupt the flow of development and force users to switch their tools.
False Positives: Debugging can also produce false positives and unnecessary warnings, which can distract the developer.
Overhead: It can add overhead to the training process, which may increase resource usage.

Tools Used to Debug TensorFlow

tf.debugging: TensorFlow's tf.debugging module provides several functions that can help you identify and resolve issues in your models. For example, the assert_all_finite function can help you identify if there are any NaN or Inf values in your tensors.

tf.debugging.check_numerics: This function can help you check if your tensors contain any NaN or Inf values. It throws an exception if it finds any.
tf.debugging.enable_check_numerics: This function can be used to enable or disable NaN and Inf checking for all TensorFlow operations.
tf.debugging.assert_equal: This function can be used to compare the values of two tensors and throw an exception if they are not equal.
tf.debugging.assert_greater: This function can be used to check if a tensor is greater than a given value and throw an exception if it is not.
tf.debugging.assert_less: This function can be used to check if a tensor is less than a given value and throw an exception if it is not.
tf.debugging.assert_rank: This function can be used to check the rank of a tensor and throw an exception if it is not equal to a given value.

By using these debugging tools in TensorFlow, you can quickly identify and resolve issues in your models, ensuring that they are performing as expected.

Conclusion

Debugging is an essential skill for any TensorFlow developer. By enabling verbose logging, using TensorBoard, using the TensorFlow Debugger, and checking data and shapes, you can quickly identify and fix bugs in your code. These techniques can help you become a more productive and efficient TensorFlow developer.

Debugging in TensorFlow

TensorFlow