Report: GSoC with TensorFlow

Official Work Product Submission for the 2020 edition of Google Summer of Code with TensorFlow.

Published: June 6, 2020

Updated: August 24, 2020

Status: Finished

Google Summer of Code requires a "work product submission"

1: https://developers.google.com/open-source/gsoc/help/work-product

that publicly documents the results of the project. I'm writing this post to summarize the goals achieved and work still in progress along with a concise description of the challenges encountered along the way. The project mainly dealt with tf1.x to tf2.x migration, improvements to Mobile Video Object Detection models, GitHub issue support and Documentation improvements. I will also add steps for future work at the end of the report.

A repository with most of the code and some thoughts noted down during the summer can be found here.

Overview

During Google Summer of Code 2020, I contributed to the TensorFlow Model Garden. The work was in collaboration with the maintainers of the various models I had to improve or build and that has been an excellent learning experience for me. Apart from developing my programming skills, I also learnt a lot about code quality, structuring and maintenance. This project was aimed at providing tf2.x versions of code to SotA models on the tensorflow/models repository, and improving documentation for existing models.

Throughout, I managed to work on migrations for the AttentionOCR model, and test the DeepSpeech conversion. Both run successfully on TF2.3 and TF1.15. Other work includes improvements for the Tensorflow Mobile Video Object Detection, documentation updates for Google Cloud Platform and general GitHub support.

Acknowledgements

Before I start with my report, I would like to thank my mentor Jaeyoun Kim, who was very supportive throughout the summer. I got a lot of help structuring my projects and prioritizing my next steps well in advance. I also learnt a lot from the individual project mentors, Xavier Gibert and Liangzhe Yuan respectively. I thank them for their constant guidance, code reviews, timely feedback, help and most importantly, for their dedicated advice and encouragement throughout GSoC. I would definitely love to contribute more in the future to the TensorFlow Model Garden.

SotA models in TensorFlow 2.x

TensorFlow has an extensive Model Garden covering many algorithms that are necessary for the generic machine-learning practitioner. The algorithms are mainly hosted under the official and research folders.

2: Note that some of the models in the research folder were shifted to an archive branch as referenced by a8ba923.

Many of the current architectures coded in the tf/models repository are written in TensorFlow 1.x. It would do the entire user community an excellent service to convert the existing tf1.x to tf2.x. For this, I was fortunate to receive the help of the original authors who wrote the models or are currently maintaining the model folders. The end goal of the project was to make TensorFlow more user-friendly from the perspective of researchers.

General Conversion

The significant changes when TensorFlow 2 came out were 1) API cleanup, 2) Eager execution, 3) no global namespaces and 4) functions in place of sessions.

3: https://www.tensorflow.org/guide/effective_tf2.

So the logical flowchart for conversion to native TF2 would be to first sort out the API's that were deprecated and replace them with the newer ones. TensorFlow has a list hosted here. The next step would be to break down the code into smaller functions. This has the additional advantage of allowing TensorFlow 2.x to gain all the benefits of graph mode.

# TensorFlow 1.x
outputs = session.run(f(placeholder), feed_dict={placeholder: input})

# TensorFlow 2.x
outputs = f(input)

We also have the option of using tf.function to make graphs out of our programs. tf.function wraps a Python function to create an object which selects the appropriate graph for the inputs. For example,

@tf.function
def double(a):
  print("Tracing with", a)
  return a + a

print(double(tf.constant(1)))
print()
print(double(tf.constant(1.1)))
print()
print(double(tf.constant("a")))
print()
Tracing with Tensor("a:0", shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)

Tracing with Tensor("a:0", shape=(), dtype=float32)
tf.Tensor(2.2, shape=(), dtype=float32)

Tracing with Tensor("a:0", shape=(), dtype=string)
tf.Tensor(b'aa', shape=(), dtype=string)

AttentionOCRPaper

Methods

For converting the AttentionOCR model from TF1.x to TF2.x, I followed a few steps that I think would be very useful to readers when they're upgrading their own models or code.

  1. Preparing the code: Any dependency that we have should be made compatible with TF2.x. This is important so that we don't end up making mistakes while converting the code. A good practice is to have unit tests written with good coverage.
  2. Install TensorFlow 1.15: We need to upgrade our TensorFlow to tf1.15 which is the latest TF1.x version.
  3. Run tests on tf1.15: Just to be careful
  4. Run tf_upgrade_v2: Run this script on both the code and unit tests. Most of the code will convert to symbols only available in TensorFlow 2.x and in cases where symbols are deprecated, tf.compat.v1 will be used, which we will come to shortly. Usage of the upgrade script is as follows:
$ tf_upgrade_v2 \
  --intree my_project/ \
  --outtree my_project_v2/ \
  --reportfile report.txt
  1. Run the tests again on tf1.15: This will weed out any possible bugs.
  2. Install TensorFlow 2.3: Even tfnightly should do the job.
  3. Test with v1.disable_v2_behavior Re-running your tests with v1.disable_v2_behavior() should give the same results as running under 1.15.

After going through the above steps for the AttentionOCR model, I found a few issues which I will list out below.

AttentionOCR Migration Issues

  1. Using member tf.flags.FLAGS.dataset_dir in deprecated module tf.flags.
  2. Using member tf.contrib.metrics.streaming_mean in deprecated module tf.contrib. link, link.
  3. Using member tf.contrib.lookup.index_to_string_table_from_tensor in deprecated module tf.contrib. link
  4. Using member tf.contrib.legacy_seq2seq.sequence_loss in deprecated module tf.contrib. link
  5. *.save requires manual check. (do I save to SavedModel or HDF5??) link
  6. Using member tf.flags.FLAGS.checkpoint in deprecated module tf.flags

These are the deprecated API's that I need to manually sort out by using functions and removing Session.run calls.

Work Done

This section will have a commit list and the deprecated API list found along with each conversion. See PR #8843.

Commit Link
README updates according to template 7a09beb
data_provider, model_export conversions e8ca3cd
demo_inference, tf.flags found ab293d9
inception_processing converted, tf.image deprecated 26c5304
minor conversions, tf.contrib deprecated 042ae38
metrics and model converted 2c391b9

The second part of the conversion is still underway. More specifically, we are waiting for the Object Detection API team to release their new code which would have substantial improvements over the existing interfaces. Blockers such as seq2seq will no more be a hindrance. Also, tf.summary.text has been added. See PR #8992.

Commit Link
tf.summary.text addition 1ee526f
compat.v1 for tf.summary.text 7efca34
demo_inference, tf.flags found ab293d9
transfer urls from deprecated branch 18bf2ba
fixed README to reflect dataset transfer 722e576

DeepSpeechPaper

I was not the primary author of the conversion for the DeepSpeech model which was submitted in PR #8696. But I was involved in writing tests and checking that the model reached reference accuracy on both tf1.15 and tf2.3.

We can see a good example of conversion to Keras layers here:


# TensorFlow 1.x
    "lstm": tf.contrib.rnn.BasicLSTMCell,
    "rnn": tf.contrib.rnn.RNNCell,
    "gru": tf.contrib.rnn.GRUCell,

# TensorFlow 2.x
    "lstm": tf.keras.layers.LSTMCell,
    "rnn": tf.keras.layers.SimpleRNNCell,
    "gru": tf.keras.layers.GRUCell,

Future Work

The main priorities for the immediate future are as follows,

  • Convert AttentionOCR to native TensorFlow 2 and run tests.

  • Prepare checkpoints for public usage and consolidate documentation.

  • Provide interactive tutorials in the form of Google Colab.

  • Convert DeepSpeech to native tf2 in collaboration with said PR author.

Challenges encountered

A lack of proper documentation was the biggest road-block to going about this project. Previous examples of working were incredibly hard to find. In fact, a key deliverable when I finish the conversion will be to write a detailed process of migration. Even though it will be for only one model, it will still help the user community port their own code to tf2.x.


Mobile Video Object Detection Paper Paper

I started work on this project in the second half of the summer, and my mentors had some exact goals for this project. I have to admit that some parts took more time than I expected but all that helped me learn a lot. The main things we wanted to be done were:

  • Check all GitHub issues and PRs related to lstm_object_detection and summarize key issues and asks from the community.

  • Rerun model training/evaluation/inference from tensorflow repo and fix potential issues. For both local and cloud.

  • Make and/or summarize necessary tools for data conversion, dense labeling, etc.

  • Reproduce paper results and prepare ckpt for public usage.

  • Prepare documentation and tutorial for the lstm_object_detection models.

Work Done

The first task was very critical towards the completion of this project as it meant that we prioritize the user community and make sure all the major bugs are weeded out. I found most of the main issues on GitHub (closed by the bot due to inactivity + open issues) and also a few relevant StackOverflow questions. Left out some redundant issues but noted their key asks. Next step at the moment was to review and prioritize major fixes. Key issues found were:

  1. Problems with creating tf-records
  2. More documentation needed (specifically training related: prepping data, etc)
  3. More support required in general

A few issues I used to scrape major problems were #6253, #5869, #7967, #7288, #6027, #6777, and #7109. I also looked at a few StackOverFlow questions for both answers and other problems. Most users agree that the above three issues were causing most problems.

For the next steps, I started with reading the Mobile Video Object Detection with Temporally-Aware Feature Maps paper and tried running the models locally. I faced a few issues with creating VID2015 tfrecords. So I went and checked the config for the Looking Fast and Slow paper to see if there was anything wrong with my code.

Future Work

I need to provide additional documentation for lstm_object_detection models. New additions will include training script and problems with custom data. The first thing to do will be training with ImageNet VID2015 as the train/eval set. Watch this space!!

GitHub Issue Support

We were all encouraged to get involved more deeply with the community by our mentors by supporting GitHub issues and attempting to fix minor bugs, answer questions, close stale issues etc. In the initial days, this helped me immensely by giving me a feel for how the TensorFlow Model Garden team moved and worked. The TensorFlow Development Community, as well as the User Community, is very active on the official Google Groups and StackOverflow forums. A key responsibility was to answer questions and provide support to GitHub issues. From my experience waiting for answers on the TensorFlow Stack Overflow forums etc., I think that the problem is not so much that there are no quality answers. Still, there are many repetitive questions and too few members active to answer them.

We had a good fix for this problem, along with the other students, we created a sheet where we classified frequently asked questions for three categories, Object Detection API, Official models and Research models. This sheet was shared with the OD API team with comments on possible fixes etc. I'm very hopeful that this will help them for the next release.


Documentation Improvements

The TensorFlow 2.x codebase is relatively new, and large parts of the code could be given more support than is currently the case. As part of this objective, I wanted to contribute in some or the other way to improving the docs for TensorFlow models. The main reason code goes undocumented is because of time. Apart from designing and writing the code itself, we also need to undergo code reviews, automation tests and add unit tests (to name a few things). Documentation is pretty much left out of the equation.

Here are some of the things I kept in mind while writing some docs:

  • Understand the audience of the documentation​. TensorFlow is genuinely a universal framework used by students in India to detect disease in farmlands to SotA research at GoogleAI. We need to keep the barrier to entry as low as we can.
  • List and describe the main features​ of our model/tutorial while making sure to point out any dependencies that exist with other components.
  • We really need to inform our readers about the ​validity of the documentation that we’re writing and hence, we need to include a timestamp. Also, we need to have the versions of the libraries that we’re using.
  • The tutorials and documentation that we write for TensorFlow should have a large number of coding examples​, the more we can show the various ways we can use a particular feature, the better.

Get started with Google Cloud Platform

I combined two existing GCP documents to make a comprehensive tutorial for anyone looking to leverage GCP resources for their ML project. The document lives here.

Minor improvements

Apart from the AttentionOCR README updates, I made a few minor doc updates which can be seen at #8677, #8654, #8846, #8675, and #40012


Final Thoughts

There is a lot more work to do. I have a list of deliverables I will complete soon,

  1. Continue with the model conversion from TF1.x if any models are leftover and also from PyTorch. The TensorFlow team has a list of projects that are in the pipeline.
  2. Create a To-do list​ for developers interested in converting models from tf1.x to tf2.x or between frameworks so that things move smoothly.
  3. Provide support and maintenance for more models in the research folder.

In closing, I'd like to continue contributing to TensorFlow and improve the Model Garden. Exciting things are on the way.

May all Tensors Flow Securely!!

1: https://developers.google.com/open-source/gsoc/help/work-product

2: Note that some of the models in the research folder were shifted to an archive branch as referenced by a8ba923.

3: https://www.tensorflow.org/guide/effective_tf2

Published: June 6, 2020

Updated: August 24, 2020

Status: Finished