Google Summer of Code requires a "work product submission"
1: https://developers.google.com/open-source/gsoc/help/work-product
that
publicly documents the results of the project. I'm writing this post to summarize
the goals achieved and work still
in progress along with a concise description of the challenges encountered along the
way. The project mainly dealt with
tf1.x
to tf2.x
migration, improvements to Mobile Video
Object Detection models, GitHub issue support and Documentation
improvements. I will also add steps for future work at the end of the report.
A repository with most of the code and
some thoughts noted down during the summer can be found here.
Overview
During Google Summer of Code 2020, I contributed to the TensorFlow Model Garden. The work was in collaboration with the maintainers of the various models I had to improve or build and that has been an excellent learning experience for me. Apart from developing my programming skills, I also learnt a lot about code quality, structuring and maintenance. This project was aimed at providing tf2.x versions of code to SotA models on the tensorflow/models repository, and improving documentation for existing models.
Throughout, I managed to work on migrations for the AttentionOCR model, and test the DeepSpeech conversion. Both run successfully on TF2.3 and TF1.15. Other work includes improvements for the Tensorflow Mobile Video Object Detection, documentation updates for Google Cloud Platform and general GitHub support.
Acknowledgements
Before I start with my report, I would like to thank my mentor Jaeyoun Kim, who was very supportive throughout the summer. I got a lot of help structuring my projects and prioritizing my next steps well in advance. I also learnt a lot from the individual project mentors, Xavier Gibert and Liangzhe Yuan respectively. I thank them for their constant guidance, code reviews, timely feedback, help and most importantly, for their dedicated advice and encouragement throughout GSoC. I would definitely love to contribute more in the future to the TensorFlow Model Garden.
SotA models in TensorFlow 2.x
TensorFlow has an extensive Model Garden covering many algorithms that are necessary for
the generic machine-learning practitioner. The algorithms are mainly hosted under the official and research
folders.
2: Note that some of
the models in the research folder were shifted to an archive branch as referenced by
a8ba923
.
Many of the current architectures coded in the tf/models repository
are
written in TensorFlow 1.x. It would do the entire user community an excellent service to
convert the existing tf1.x
to tf2.x
. For this, I was fortunate
to receive the help of
the original authors who wrote the models or are currently maintaining the model
folders. The end goal of the project was to make TensorFlow more user-friendly from the
perspective of researchers.
General Conversion
The significant changes when TensorFlow 2 came out were 1) API cleanup, 2) Eager
execution, 3) no global namespaces and 4) functions in place of sessions.
3: https://www.tensorflow.org/guide/effective_tf2.
So the logical
flowchart for conversion to native TF2 would be to first sort out the API's that
were deprecated and replace them with the newer ones. TensorFlow has a list hosted here.
The next step would be to break down the code into smaller functions. This has the
additional advantage of allowing TensorFlow 2.x to gain all the benefits of graph mode.
# TensorFlow 1.x
outputs = session.run(f(placeholder), feed_dict={placeholder: input})
# TensorFlow 2.x
outputs = f(input)
We also have the option of using tf.function
to make graphs out of our
programs. tf.function
wraps a Python function to create an object which
selects the appropriate graph for the inputs. For example,
@tf.function
def double(a):
print("Tracing with", a)
return a + a
print(double(tf.constant(1)))
print()
print(double(tf.constant(1.1)))
print()
print(double(tf.constant("a")))
print()
Tracing with Tensor("a:0", shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
Tracing with Tensor("a:0", shape=(), dtype=float32)
tf.Tensor(2.2, shape=(), dtype=float32)
Tracing with Tensor("a:0", shape=(), dtype=string)
tf.Tensor(b'aa', shape=(), dtype=string)
AttentionOCR
MethodsFor converting the AttentionOCR model from TF1.x to TF2.x, I followed a few steps that I think would be very useful to readers when they're upgrading their own models or code.
- Preparing the code: Any dependency that we have should be made compatible with TF2.x. This is important so that we don't end up making mistakes while converting the code. A good practice is to have unit tests written with good coverage.
- Install TensorFlow 1.15: We need to upgrade our TensorFlow to
tf1.15
which is the latest TF1.x version. - Run tests on
tf1.15
: Just to be careful - Run
tf_upgrade_v2
: Run this script on both the code and unit tests. Most of the code will convert to symbols only available in TensorFlow 2.x and in cases where symbols are deprecated,tf.compat.v1
will be used, which we will come to shortly. Usage of the upgrade script is as follows:
$ tf_upgrade_v2 \
--intree my_project/ \
--outtree my_project_v2/ \
--reportfile report.txt
- Run the tests again on
tf1.15
: This will weed out any possible bugs. - Install TensorFlow 2.3: Even tfnightly should do the job.
- Test with
v1.disable_v2_behavior
Re-running your tests withv1.disable_v2_behavior()
should give the same results as running under 1.15.
After going through the above steps for the AttentionOCR model, I found a few issues which I will list out below.
AttentionOCR Migration Issues
- Using member
tf.flags.FLAGS.dataset_dir
in deprecated moduletf.flags
. - Using member
tf.contrib.metrics.streaming_mean
in deprecated moduletf.contrib
. link, link. - Using member
tf.contrib.lookup.index_to_string_table_from_tensor
in deprecated moduletf.contrib
. link - Using member
tf.contrib.legacy_seq2seq.sequence_loss
in deprecated moduletf.contrib
. link *.save
requires manual check. (do I save to SavedModel or HDF5??) link- Using member
tf.flags.FLAGS.checkpoint
in deprecated moduletf.flags
These are the deprecated API's that I need to manually sort out by using functions
and removing Session.run
calls.
This section will have a commit list and the deprecated API list found along with each conversion. See PR #8843.
Commit | Link |
---|---|
README updates according to template | 7a09beb
|
data_provider, model_export conversions | e8ca3cd
|
demo_inference, tf.flags found |
ab293d9
|
inception_processing converted, tf.image deprecated |
26c5304
|
minor conversions, tf.contrib deprecated |
042ae38
|
metrics and model converted | 2c391b9
|
The second part of the conversion is still underway. More specifically,
we are waiting for the Object Detection API team to release their new code which would
have substantial improvements over the existing interfaces. Blockers such as
seq2seq
will no more be a hindrance. Also, tf.summary.text
has
been added. See PR #8992.
Commit | Link |
---|---|
tf.summary.text addition |
1ee526f
|
compat.v1 for tf.summary.text |
7efca34
|
demo_inference, tf.flags found |
ab293d9
|
transfer urls from deprecated branch | 18bf2ba
|
fixed README to reflect dataset transfer | 722e576
|
DeepSpeech
I was not the primary author of the conversion for the DeepSpeech model which was
submitted in PR #8696. But
I was involved in writing tests and checking that the model reached reference accuracy
on both tf1.15
and tf2.3
.
We can see a good example of conversion to Keras layers here:
# TensorFlow 1.x
"lstm": tf.contrib.rnn.BasicLSTMCell,
"rnn": tf.contrib.rnn.RNNCell,
"gru": tf.contrib.rnn.GRUCell,
# TensorFlow 2.x
"lstm": tf.keras.layers.LSTMCell,
"rnn": tf.keras.layers.SimpleRNNCell,
"gru": tf.keras.layers.GRUCell,
Future Work
The main priorities for the immediate future are as follows,
-
Convert AttentionOCR to native TensorFlow 2 and run tests.
-
Prepare checkpoints for public usage and consolidate documentation.
-
Provide interactive tutorials in the form of Google Colab.
-
Convert DeepSpeech to native tf2 in collaboration with said PR author.
Challenges encountered
A lack of proper documentation was the biggest road-block to going about this project. Previous examples of working were incredibly hard to find. In fact, a key deliverable when I finish the conversion will be to write a detailed process of migration. Even though it will be for only one model, it will still help the user community port their own code to tf2.x.
Mobile Video Object Detection
I started work on this project in the second half of the summer, and my mentors had some exact goals for this project. I have to admit that some parts took more time than I expected but all that helped me learn a lot. The main things we wanted to be done were:
-
Check all GitHub issues and PRs related to lstm_object_detection and summarize key issues and asks from the community.
-
Rerun model training/evaluation/inference from tensorflow repo and fix potential issues. For both local and cloud.
-
Make and/or summarize necessary tools for data conversion, dense labeling, etc.
-
Reproduce paper results and prepare ckpt for public usage.
-
Prepare documentation and tutorial for the lstm_object_detection models.
Work Done
The first task was very critical towards the completion of this project as it meant that we prioritize the user community and make sure all the major bugs are weeded out. I found most of the main issues on GitHub (closed by the bot due to inactivity + open issues) and also a few relevant StackOverflow questions. Left out some redundant issues but noted their key asks. Next step at the moment was to review and prioritize major fixes. Key issues found were:
- Problems with creating tf-records
- More documentation needed (specifically training related: prepping data, etc)
- More support required in general
A few issues I used to scrape major problems were #6253, #5869, #7967, #7288, #6027, #6777, and #7109. I also looked at a few StackOverFlow questions for both answers and other problems. Most users agree that the above three issues were causing most problems.
For the next steps, I started with reading the Mobile Video Object Detection with Temporally-Aware Feature Maps paper and tried running the models locally. I faced a few issues with creating VID2015 tfrecords. So I went and checked the config for the Looking Fast and Slow paper to see if there was anything wrong with my code.
Future Work
I need to provide additional documentation for lstm_object_detection models. New additions will include training script and problems with custom data. The first thing to do will be training with ImageNet VID2015 as the train/eval set. Watch this space!!
GitHub Issue Support
We were all encouraged to get involved more deeply with the community by our mentors by supporting GitHub issues and attempting to fix minor bugs, answer questions, close stale issues etc. In the initial days, this helped me immensely by giving me a feel for how the TensorFlow Model Garden team moved and worked. The TensorFlow Development Community, as well as the User Community, is very active on the official Google Groups and StackOverflow forums. A key responsibility was to answer questions and provide support to GitHub issues. From my experience waiting for answers on the TensorFlow Stack Overflow forums etc., I think that the problem is not so much that there are no quality answers. Still, there are many repetitive questions and too few members active to answer them.
We had a good fix for this problem, along with the other students, we created a sheet where we classified frequently asked questions for three categories, Object Detection API, Official models and Research models. This sheet was shared with the OD API team with comments on possible fixes etc. I'm very hopeful that this will help them for the next release.
Documentation Improvements
The TensorFlow 2.x codebase is relatively new, and large parts of the code could be given more support than is currently the case. As part of this objective, I wanted to contribute in some or the other way to improving the docs for TensorFlow models. The main reason code goes undocumented is because of time. Apart from designing and writing the code itself, we also need to undergo code reviews, automation tests and add unit tests (to name a few things). Documentation is pretty much left out of the equation.
Here are some of the things I kept in mind while writing some docs:
- Understand the audience of the documentation. TensorFlow is genuinely a universal framework used by students in India to detect disease in farmlands to SotA research at GoogleAI. We need to keep the barrier to entry as low as we can.
- List and describe the main features of our model/tutorial while making sure to point out any dependencies that exist with other components.
- We really need to inform our readers about the validity of the documentation that we’re writing and hence, we need to include a timestamp. Also, we need to have the versions of the libraries that we’re using.
- The tutorials and documentation that we write for TensorFlow should have a large number of coding examples, the more we can show the various ways we can use a particular feature, the better.
Get started with Google Cloud Platform
I combined two existing GCP documents to make a comprehensive tutorial for anyone looking to leverage GCP resources for their ML project. The document lives here.
Minor improvements
Apart from the AttentionOCR README updates, I made a few minor doc updates which can be seen at #8677, #8654, #8846, #8675, and #40012
Final Thoughts
There is a lot more work to do. I have a list of deliverables I will complete soon,
- Continue with the model conversion from TF1.x if any models are leftover and also from PyTorch. The TensorFlow team has a list of projects that are in the pipeline.
- Create a To-do list for developers interested in converting models from tf1.x to tf2.x or between frameworks so that things move smoothly.
- Provide support and maintenance for more models in the research folder.
In closing, I'd like to continue contributing to TensorFlow and improve the Model Garden. Exciting things are on the way.
May all Tensors Flow Securely!!
1: https://developers.google.com/open-source/gsoc/help/work-product
2: Note that some of the models in the research folder were shifted to an archive
branch as referenced by a8ba923
.
Published: June 6, 2020
Updated: August 24, 2020
Status: Finished