Friday, March 16, 2018

Gradients explode - Deep Networks are shallow - ResNet explained

So last night at the Paris Machine Learning meetup, we had the good folks from Snips making an announcement on the release/open sourcing of their Natural language Understanding code. Joseph also mentioned that after many architectures search, a simple CRF model, a single layer model, did as well as other commercial models. It's NLP so the representability issue has already been parsed. In a different corner of the galaxy, the following paper seems to suggest that ResNets, while rendering these deep networks effectively shallower, do not solve the gradient explosion problem. 

Abstract: Whereas it is believed that techniques such as Adam, batch normalization and, more recently, SeLU nonlinearities ``solve'' the exploding gradient problem, we show that this is not the case and that in a range of popular MLP architectures, exploding gradients exist and that they limit the depth to which networks can be effectively trained, both in theory and in practice. We explain why exploding gradients occur and highlight the {\it collapsing domain problem}, which can arise in architectures that avoid exploding gradients. ResNets have significantly lower gradients and thus can circumvent the exploding gradient problem, enabling the effective training of much deeper networks, which we show is a consequence of a surprising mathematical property. By noticing that {\it any neural network is a residual network}, we devise the {\it residual trick}, which reveals that introducing skip connections simplifies the network mathematically, and that this simplicity may be the major cause for their success.
TL;DR: We show that in contrast to popular wisdom, the exploding gradient problem has not been solved and that it limits the depth to which MLPs can be effectively trained. We show why gradients explode and how ResNet handles them.

In this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training. To support this observation, we rewrite residual networks as an explicit collection of paths. Unlike traditional models, paths through residual networks vary in length. Further, a lesion study reveals that these paths show ensemble-like behavior in the sense that they do not strongly depend on each other. Finally, and most surprising, most paths are shorter than one might expect, and only the short paths are needed during training, as longer paths do not contribute any gradient. For example, most of the gradient in a residual network with 110 layers comes from paths that are only 10-34 layers deep. Our results reveal one of the key characteristics that seem to enable the training of very deep networks: Residual networks avoid the vanishing gradient problem by introducing short paths which can carry gradient throughout the extent of very deep networks.

Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers.
The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Wednesday, March 14, 2018

Paris Machine Learning Meetup #7 Season 5, Natural Language Understanding (NLU), AI for HR, decentralized AI

Tonight we will be hosted by Urban Linker ! The video of the streaming is here and presentation slides will be available here as well before the meetup. Stay tuned.


Joseph Dureau, Snips NLU (, an Open Source, Private by Design alternative to cloud-based solutions

As part of its mission to expand the use of privacy-preserving AI solutions, the Snips team has decided to fully open source its solution for Natural Language Understanding. Snips NLU is an alternative to all cloud-based NLU solutions powering chatbots or voice assistants: Dialogflow,, Recast, Amazon Lex,, Watson, etc. You can run it on the edge or on premises, thus avoiding giving away your user data to a third party service.

Erik Mathiesen, (, An AI Careers Advisor: Using Machine Learning to Predict Your Career Path specializes in smart solutions for recruitment. In this talk, I will describe how we use AI, and in particular Neural Networks and Deep Learning, to analyse and predict people’s career paths. Having analysed millions of CVs, our system can predict from a person’s CV what jobs are most likely to be next in the career path of that individual, as well as when the next job move is mostly likely to happen. By doing this, we enable companies to predict and find better candidates as well as forecast future hiring needs within an organisation. I will outline the technologies and techniques used in this application and give a few illustrative example of its usage.

An open-source community focused on building technology to facilitate the decentralized ownership of data and intelligence.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Monday, March 12, 2018

Random projections in gravitational wave searches of compact binaries

Randomized Matrix factorization and gravitational waves, this is cool !

Random projection (RP) is a powerful dimension reduction technique widely used in analysis of high dimensional data. We demonstrate how this technique can be used to improve the computational efficiency of gravitational wave searches from compact binaries of neutron stars or black holes. Improvements in low-frequency response and bandwidth due to detector hardware upgrades pose a data analysis challenge in the advanced LIGO era as they result in increased redundancy in template databases and longer templates due to higher number of signal cycles in band. The RP-based methods presented here address both these issues within the same broad framework. We first use RP for an efficient, singular value decomposition inspired template matrix factorization and develop a geometric intuition for why this approach works. We then use RP to calculate approximate time-domain correlations in a lower dimensional vector space. For searches over parameters corresponding to non-spinning binaries with a neutron star and a black hole, a combination of the two methods can reduce the total on-line computational cost by an order of magnitude over a nominal baseline. This can, in turn, help free-up computational resources needed to go beyond current spin-aligned searches to more complex ones involving generically spinning waveforms.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, March 10, 2018

Saturday Morning Videos: NIPS2017 Meta Learning Symposium videos

Pieter Abbeel mentioned that the #nips2017 Meta Learning Symposium videos are now available here.

Thanks to Risto Miikkulainen, Quoc Le, Kenneth Stanley, and Chrisantha Fernando for organizing and getting the videos online !

Opening remarks, Quoc Le (slides, video)

Topic I: Evolutionary Optimization

  • Evolving Multitask Neural Network Structure, Risto Miikkulainen (slides, video)
  • Evolving to Learn through Synaptic Plasticity, Ken Stanley (slides, video)
  • PathNet and Beyond, Chrisantha Fernando (slides, video)
Topic II: Bayesian Optimization

  • Bayesian Optimization for Automated Model Selection, Roman Garnett (slides, video)
  • Automatic Machine Learning (AutoML) and How To Speed It Up, Frank Hutter (slides, video)

Topic III: Gradient Descent

  • Contrasting Model- and Optimization-based Metalearning, Oriol Vinyals (slides, video)
  • Population-based Training for Neural Network Meta-Optimization, Max Jaderberg (slides, video)
  • Learning to Learn for Robotic Control, Pieter Abbeel (slides, video)
  • On Learning How to Learn Learning Strategies, Juergen Schmidhuber (slides, video)

Topic IV: Reinforcement Learning

  • Intrinsically Motivated Reinforcement Learning, Satinder Singh (video)
  • Self-Play, Ilya Sutskever (slides, video)
  • Neural Architecture Search, Quoc Le (slides, video)
  • Multiple scales of reward and task learning, Jane Wang (slides, video)

Panel discussion, Moderator: Risto Miikkulainen, Panelists: Frank Hutter, Juergen Schmidhuber, Ken Stanley, Ilya Sutskever (video)

Credit photn: NASA, Starshine 2 , more on project Starshine.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, March 06, 2018

Randomness in Deconvolutional Networks for Visual Representation

So random weight networks seem to have better generalization properties, uh.

Toward a deeper understanding on the inner work of deep neural networks, we investigate CNN (convolutional neural network) using DCN (deconvolutional network) and randomization technique, and gain new insights for the intrinsic property of this network architecture. For the random representations of an untrained CNN, we train the corresponding DCN to reconstruct the input images. Compared with the image inversion on pre-trained CNN, our training converges faster and the yielding network exhibits higher quality for image reconstruction. It indicates there is rich information encoded in the random features; the pre-trained CNN may discard information irrelevant for classification and encode relevant features in a way favorable for classification but harder for reconstruction. We further explore the property of the overall random CNN-DCN architecture. Surprisingly, images can be inverted with satisfactory quality. Extensive empirical evidence as well as theoretical analysis are provided.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Sunday, February 18, 2018

Sunday Morning Insight: LightOn Cloud: Light Based Technology for AI on the Cloud

As some of you may know, part of the reason I am little less active on Nuit Blanche these days stems from being involved with LightOn. At LightOn, we build hardware that uses light to perform computations of interest to Machine Learning, in short, we bring light to AI

Quite simply we are building a hardware product that does random projections... for now. If you are a student of history or if you know the history of how technologies begin and thrive, it is essential for that technology to meet its eventual end users very early on. 

At LightOn, we want to get as much feedback as possible from the Machine Learning community as early as possible. And so for the past year, we have been working on integrating our technology so that it can be accessible on the web.  

Thanks to the OVH Labs program, we got one of our prototype to run in a nearby data center. On December 20th, we had our first light and it was beautiful.

Since then we have been going through our Verification and Validation (V\&V) program and started to run some algorithms on it. On Friday, we issued a press release on opening up our cloud to the Machine Learning community. If you want to be a beta user on our cluster, please register your interest here 

Forward we go !

How to find us on the web ?

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, February 17, 2018

Posters: SysML 2018 Conference

There is currently the SysML 2018 Conference at Stanford and while the Live Stream is over, the poster session is taking place. Here are the presentations of each poster:

Session I: 4:30pm - 6:00pm
1-1 A SIMD-MIMD Acceleration with Access-Execute Decoupling for Generative Adversarial Networks Amir Yazdanbakhsh, Kambiz Samadi, Hadi Esmaeilzadeh, Nam Sung Kim
1-2 Slice Finder: Automated Data Slicing for Model Interpretability Yeounoh Chung, Tim Kraska, Steven Euijong Whang, Neoklis Polyzotis
1-3 Data Infrastructure for Machine Learning Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, Martin Zinkevich
1-4 Speeding up ImageNet Training on Supercomputers Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, Kurt Keutzer
1-5 Aloha: A Machine Learning Framework for Engineers Ryan M Deak, Jonathan H Morra
1-6 Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
1-7 Stitch-X: An Accelerator Architecture for Exploiting Unstructured Sparsity in Deep Neural Networks Ching-En Lee, Yakun Sophia Shao, Jie-Fang Zhang, Angshuman Parashar, Joel Emer, Stephen W. Keckler, Zhengya Zhang
1-8 DeepVizdom: Deep Interactive Data Exploration Carsten Binnig, Kristian Kersting, Alejandro Molina, Emanuel Zgraggen
1-9 Massively Parallel Video Networks João Carreira, Viorica Pătrăucean, Andrew Zisserman, Simon Osindero
1-10 EVA: An Efficient System for Exploratory Video Analysis Ziqiang Feng, Junjue Wang, Jan Harkes, Padmanabhan Pillai, Mahadev Satyanarayanan
1-11 Declarative Metadata Management: A Missing Piece in End-To-End Machine Learning Sebastian Schelter, Joos-Hendrik Böse, Johannes Kirschnick, Thoralf Klein, Stephan Seufert
1-12 Runway: machine learning model experiment management tool Jason Tsay, Todd Mummert, Norman Bobroff, Alan Braz, Peter Westerink, Martin Hirzel
1-13 STRADS-AP: Simplifying Distributed Machine Learning Programming Jin Kyu Kim, Garth A. Gibson, Eric P. Xing
1-14 A Deeper Look at FFT and Winograd Convolutions Aleksandar Zlateski, Zhen Jia, Kai Li, Fredo Durand
1-15 Efficient Deep Learning Inference on Edge Devices Ziheng Jiang, Tianqi Chen, Mu Li
1-16 On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann
1-17 DeepThin: A Self-Compressing Library for Deep Neural Networks Matthew Sotoudeh, Sara S. Baghsorkhi
1-18 MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Programmable Interconnects Hyoukjun Kwon, Ananda Samajdar, Tushar Krishna
1-19 On Machine Learning and Programming Languages Mike Innes, Stefan Karpinski, Viral Shah, David Barber, Pontus Stenetorp, Tim Besard, James Bradbury, Valentin Churavy, Simon Danisch, Alan Edelman, Jon Malmaud, Jarrett Revels, Deniz Yuret
1-20 "I Like the Way You Think!" - Inspecting the Internal Logic of Recurrent Neural Networks Thibault Sellam, Kevin Lin, Ian Yiran Huang, Carl Vondrick, Eugene Wu
1-21 Automatic Differentiation in Myia Olivier Breuleux, Bart van Merriënboer
1-22 TFX Frontend: A Graphical User Interface for a Production-Scale Machine Learning Platform Peter Brandt, Josh Cai, Tommie Gannert, Pushkar Joshi, Rohan Khot, Chiu Yuen Koo, Chenkai Kuang, Sammy Leong, Clemens Mewald, Neoklis Polyzotis, Herve Quiroz, Sudip Roy, Po-Feng Yang, James Wexler, Steven Euijong Whang
1-23 Learned Index Structures Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis
1-24 Towards Optimal Winograd Convolution on Manycores Zhen Jia, Aleksandar Zlateski, Fredo Durand, Kai Li
1-25 Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective Yuhao Zhu, Matthew Mattina, Paul Whatmough
1-26 Deep Learning with Apache SystemML Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm, Berthold Reinwald, Prithviraj Sen
1-27 Scalable Language Modeling: WikiText-103 on a Single GPU in 12 hours Stephen Merity, Nitish Shirish Keskar, James Bradbury, Richard Socher
1-28 PipeDream: Pipeline Parallelism for DNN Training Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri, Gregory R. Ganger, Phillip B. Gibbons
1-29 Efficient Mergeable Quantile Sketches using Moments Edward Gan, Jialin Ding, Peter Bailis
1-30 Systems Optimizations for Learning Certifiably Optimal Rule Lists Nicholas Larus-Stone, Elaine Angelino, Daniel Alabi, Margo Seltzer, Vassilios Kaxiras, Aditya Saligrama, Cynthia Rudin
1-31 Accelerating Model Search with Model Batching Deepak Narayanan, Keshav Santhanam, Matei Zaharia
1-32 Programming Language Support for Natural Language Interaction Alex Renda, Harrison Goldstein, Sarah Bird, Chris Quirk, Adrian Sampson
1-33 Factorized Deep Retrieval and Distributed TensorFlow Serving Xinyang Yi, Yi-Fan Chen, Sukriti Ramesh, Vinu Rajashekhar, Lichan Hong, Noah Fiedel, Nandini Seshadri, Lukasz Heldt, Xiang Wu, Ed H. Chi
1-34 Relaxed Pruning: Memory-Efficient LSTM Inference Engine by Limiting the Synaptic Connection Patterns Jaeha Kung, Junki Park, Jae-Joon Kim
1-35 Deploying Deep Ranking Models for Search Verticals Rohan Ramanath, Gungor Polatkan, Liqin Xu, Harold Lee, Bo Hu, Shan Zhou
1-36 Understanding the Error Structure as a Key to Regularize Convolutional Neural Networks Bilal Alsallakh, Amin Jourabloo, Mao Ye, Xiaoming Liu, Liu Ren
1-37 On Scale-out Deep Learning Training for Cloud and HPC Srinivas Sridharan, Karthikeyan Vaidyanathan, Dhiraj Kalamkar, Dipankar Das, Mikhail E. Smorkalov, Mikhail Shiryaev, Dheevatsa Mudigere, Naveen Mellempudi, Sasikanth Avancha, Bharat Kaul, Pradeep Dubey
1-38 In-network Neural Networks Giuseppe Siracusano, Roberto Bifulco
1-39 Compressing Deep Neural Networks with Probabilistic Data Structures Brandon Reagen, Udit Gupta, Robert Adolf, Michael M. Mitzenmacher, Alexander M. Rush, Gu-Yeon Wei, David Brooks
1-40 Greenhouse: A Zero-Positive Machine Learning System for Time-Series Anomaly Detection Tae Jun Lee, Justin Gottschlich, Nesime Tatbul, Eric Metcalf, Stan Zdonik
1-41 Precision and Recall for Range-Based Anomaly Detection Tae Jun Lee, Justin Gottschlich, Nesime Tatbul, Eric Metcalf, Stan Zdonik
1-42 Whetstone: An accessible, platform-independent method for training spiking deep neural networks for neuromorphic processors William M. Severa, Craig M. Vineyard, Ryan Dellana, James B. Aimone
1-43 SparseCore: An Accelerator for Structurally Sparse CNNs Sharad Chole, Ramteja Tadishetti, Sree Reddy
1-44 SGD on Random Mixtures: Private Machine Learning under Data Breach Threats Kangwook Lee, Kyungmin Lee, Hoon Kim, Changho Suh, Kannan Ramchandran
1-45 Towards High-Performance Prediction Serving Systems Yunseong Lee, Alberto Scolari, Matteo Interlandi, Markus Weimer, Byung-Gon Chun
1-46 Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization Fabian Pedregosa, Rémi Leblond, Simon Lacoste–Julien
1-47 Corpus Conversion Service: A machine learning platform to ingest documents at scale. Peter W J Staar, Michele Dolfi, Christoph Auer, Costas Bekas
1-48 Representation Learning for Resource Usage Prediction Florian Schmidt, Mathias Niepert, Felipe Huici
1-49 TVM: End-to-End Compilation Stack for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
1-50 vectorflow: a minimalist neural-network library Benoît Rostykus, Yves Raimond
1-51 Learning Heterogeneous Cloud Storage Configuration for Data Analytics Ana Klimovic, Heiner Litz, Christos Kozyrakis
1-52 Salus: Fine-Grained GPU Sharing Among CNN Applications Peifeng Yu, Mosharaf Chowdhury
1-53 OpenCL Acceleration for TensorFlow Mehdi Goli, Luke Iwanski, John Lawson, Uwe Dolinsky, Andrew Richards
1-54 Picking Interesting Frames in Streaming Video Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, Subramanya R. Dulloor
1-55 SLAQ: Quality-Driven Scheduling for Distributed Machine Learning Haoyu Zhang, Logan Stafman, Andrew Or, Michael J. Freedman
1-56 A Comparison of Bottom-Up Approaches to Grounding for Templated Markov Random Fields Eriq Augustine, Lise Getoor
1-57 Growing Cache Friendly Decision Trees Niloy Gupta, Adam Johnston
1-58 Parallelizing Hyperband for Large-Scale Tuning Lisha Li, Kevin Jamieson, Afshin Rostamizadeh, Ameet Talwalkar
1-59 Towards Interactive Curation and Automatic Tuning of ML Pipelines Carsten Binnig, Benedetto Buratti, Yeounoh Chung, Cyrus Cousins, Dylan Ebert, Tim Kraska, Zeyuan Shang, Isabella Tromba, Eli Upfal, Linnan Wang, Robert Zeleznik, Emanuel Zgraggen

Session II: 6:00pm - 7:30pm
2-1 Ternary Residual Networks Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey
2-2 Neural Architect: A Multi-objective Neural Architecture Search with Performance Prediction Yanqi Zhou, Gregory Diamos
2-3 Federated Kernelized Multi-Task Learning Sebastian Caldas, Virginia Smith, Ameet Talwalkar
2-4 Materialization Trade-offs for Feature Transfer from Deep CNNs for Multimodal Data Analytics Supun Nakandala, Arun Kumar
2-5 Scaling HDBSCAN Clustering with kNN Graph Approximation Jacob Jackson, Aurick Qiao, Eric P. Xing
2-6 BlazeIt: An Optimizing Query Engine for Video at Scale Daniel Kang, Peter Bailis, Matei Zaharia
2-7 Time Travel based Feature Generation Kedar Sadekar, Hua Jiang
2-8 Controlling AI Engines in Dynamic Environments Nikita Mishra, Connor Imes, Henry Hoffmann, John D. Lafferty
2-9 Intermittent Deep Neural Network Inference Graham Gobieski, Nathan Beckmann, Brandon Lucia
2-10 CascadeCNN: Pushing the performance limits of quantisation Alexandros Kouris, Stylianos I. Venieris, Christos-Savvas Bouganis
2-11 Making Machine Learning Easy with Embeddings Dan Shiebler, Abhishek Tayal
2-12 CrossBow: Scaling Deep Learning on Multi-GPU Servers Alexandros Koliousis, Pijika Watcharapichat, Matthias Weidlich, Paolo Costa, Peter Pietzuch
2-13 Better Caching with Machine Learned Advice Thodoris Lykouris, Sergei Vassilvitskii
2-14 Large Model Support for Deep Learning in Caffe and Chainer Minsik Cho, Tung D. Le, Ulrich A. Finkler, Haruiki Imai, Yasushi Negishi, Taro Sekiyama, Saritha Vinod, Vladimir Zolotov, Kiyokuni Kawachiya, David S. Kung, Hillery C. Hunter
2-15 Learning Graph-based Cluster Scheduling Algorithms Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Mohammad Alizadeh
2-16 Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb
2-17 Efficient Multi-Tenant Inference on Video using Microclassifiers Giulio Zhou, Thomas Kim, Christopher Canel, Conglong Li, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, Subramanya R. Dulloor
2-18 Abstractions for Containerized Machine Learning Workloads in the Cloud Balaji Subramaniam, Niklas Nielsen, Connor Doyle, Ajay Deshpande, Jason Knight, Scott Leishman
2-19 Not All Ops Are Created Equal! Liangzhen Lai, Naveen Suda, Vikas Chandra
2-20 Robust Gradient Descent via Moment Encoding with LDPC Codes Raj Kumar Maity, Ankit Singh Rawat, Arya Mazumdar
2-21 Buzzsaw: A System for High Speed Feature Engineering Andrew Stanton, Liangjie Hong, Manju Rajashekhar
2-22 Predicate Optimization for a Visual Analytics Database Michael R. Anderson, Michael Cafarella, Thomas F. Wenisch, German Ros
2-23 Understanding the Limitations of Current Energy-Efficient Design Approaches for Deep Neural Networks Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze
2-24 Compiling machine learning programs via high-level tracing Roy Frostig, Matthew James Johnson, Chris Leary
2-25 Dynamic Stem-Sharing for Multi-Tenant Video Processing Angela Jiang, Christopher Canel, Daniel Wong, Michael Kaminsky, Michael A. Kozuch, Padmanabhan Pillai, David G. Andersen, Gregory R. Ganger
2-26 A Hierarchical Model for Device Placement Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, Jeff Dean
2-27 Blink: A fast NVLink-based collective communication library Guanhua Wang, Amar Phanishayee, Shivaram Venkataraman, Ion Stoica
2-28 TOP: A Compiler-Based Framework for Optimizing Machine Learning Algorithms through Generalized Triangle Inequality Yufei Ding, Lin Ning, Hui Guang, Xipeng Shen, Madanlal Musuvathi, Todd Mytkowicz
2-29 UberShuffle: Communication-efficient Data Shuffling for SGD via Coding Theory Jichan Chung, Kangwook Lee, Ramtin Pedarsani, Dimitris Papailiopoulos, Kannan Ramchandran
2-30 Toward Scalable Verification for Safety-Critical Deep Networks Lindsey Kuper, Guy Katz, Justin Gottschlich, Kyle Julian, Clark Barrett, Mykel J. Kochenderfer
2-31 DAWNBench: An End-to-End Deep Learning Benchmark and Competition Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, Matei Zaharia
2-32 Learning Network Size While Training with ShrinkNets Guillaume Leclerc, Raul Castro Fernandez, Samuel Madden
2-33 Have a Larger Cake and Eat It Faster Too: A Guideline to Train Larger Models Faster Newsha Ardalani, Joel Hestness, Gregory Diamos
2-34 Retrieval as a defense mechanism against adversarial examples in convolutional neural networks Junbo Zhao, Jinyang Li, Kyunghyun Cho
2-35 DNN-Train: Benchmarking and Analyzing Deep Neural Network Training Hongyu Zhu, Bojian Zheng, Bianca Schroeder, Gennady Pekhimenko, Amar Phanishayee
2-36 High Accuracy SGD Using Low-Precision Arithmetic and Variance Reduction (for Linear Models) Alana Marzoev, Christopher De Sa
2-37 SkipNet: Learning Dynamic Routing in Convolutional Networks Xin Wang, Fisher Yu, Zi-Yi Dou, Joseph E. Gonzalez
2-38 Memory-Efficient Data Structures for Learning and Prediction Damian Eads, Paul Baines, Joshua S. Bloom
2-39 Efficient and Programmable Machine Learning on Distributed Shared Memory via Static Analysis Jinliang Wei, Garth A. Gibson, Eric P. Xing
2-40 Parle: parallelizing stochastic gradient descent Pratik Chaudhari, Carlo Baldassi, Riccardo Zecchina, Stefano Soatto, Ameet Talwalkar, Adam Oberman
2-41 Optimal Message Scheduling for Aggregation Leyuan Wang, Mu Li, Edo Liberty, Alex J. Smola
2-42 Analog electronic deep networks for fast and efficient inference Jonathan Binas, Daniel Neil, Giacomo Indiveri, Shih-Chii Liu, Michael Pfeiffer
2-43 Network Evolution for DNNs Michael Alan Chang, Aurojit Panda, Domenic Bottini, Lisa Jian, Pranay Kumar, Scott Shenker
2-44 BinaryCmd: Keyword Spotting with deterministic binary basis Javier Fernández-Marqués, Vincent W.-S. Tseng, Sourav Bhattachara, Nicholas D. Lane
2-45 YellowFin: Adaptive Optimization for (A)synchronous Systems Jian Zhang, Ioannis Mitliagkas
2-46 GPU-acceleration for Large-scale Tree Boosting Huan Zhang, Si Si, Cho-Jui Hsieh
2-47 Treelite: toolbox for decision tree deployment Hyunsu Cho, Mu Li
2-48 On Importance of Execution Ordering in Graph-Based Distributed Machine Learning Systems Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, Roy Campbell
2-49 Draco: Robust Distributed Training against Adversaries Lingjiao Chen, Hongyi Wang, Dimitris Papailiopoulos
2-50 Clustering System Data using Aggregate Measures Johnnie C-N. Chang, Robert H-J. Chen, Jay Pujara, Lise Getoor
2-51 A Framework for Searching a Predictive Model Yoshiki Takahashi, Masato Asahara, Kazuyuki Shudo
2-52 Distributed Placement of Machine Learning Operators for IoT applications spanning Edge and Cloud Resources Tarek Elgamal, Atul Sandur, Klara Nahrstedt, Gul Agha
2-53 Finding Heavily-Weighted Features with the Weight-Median Sketch Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant
2-54 Flexible Primitives for Distributed Deep Learning in Ray Yaroslav Bulatov, Robert Nishihara, Philipp Moritz, Melih Elibol, Ion Stoica, Michael I. Jordan
2-55 BLAS-on-flash: an alternative for training large ML models? Suhas Jayaram Subramanya, Srajan Garg, Harsha Vardhan Simhadri
2-56 Treating Machine Learning Algorithms As Declaratively Specified Circuits Jason Eisner, Nathaniel Wesley Filardo
2-57 Tasvir: Distributed Shared Memory for Machine Learning Amin Tootoonchian, Aurojit Panda, Aida Nematzadeh, Scott Shenker

Rest of the program:

  • 9:00 am - 9:15 am Opening Remarks: Ameet Talwalkar
  • Session I (moderator: Virginia Smith)
  • 9:15 am - 9:55 am Invited talk: Michael I. Jordan
  • 9:55 am - 10:05 am Contributed talk: TVM: End-to-End Compilation Stack for Deep Learning, Tianqi Chen
  • 10:05 am - 10:15 am Contributed talk: Robust Gradient Descent via Moment Encoding with LDPC Codes, Arya Mazumdar
  • 10:15 am - 10:25 am Contributed talk: Analog electronic deep networks for fast and efficient inference, Jonathan Binas
  • 10:25 am - 10:50 pm Coffee Break
  • Session II (moderator: Virginia Smith)
  • 10:50 am - 11:30 am Invited talk: Hardware for Deep Learning, Bill Dally
  • 11:30 am - 11:40 am Contributed talk: YellowFin: Adaptive Optimization for (A)synchronous Systems, Ioannis Mitliagkas
  • 11:40 am - 12:20 am Invited talk: Security, Privacy, and Democratization: Challenges & Future Directions for ML Systems beyond Scalability, Dawn Song
  • 12:20 pm - 1:30 pm Lunch
  • Session III (moderator: Sarah Bird)
  • 1:30 pm - 2:10 pm Invited talk: Structured ML: Opportunities and Challenges for the SysML Community, Lise Getoor
  • 2:10 pm - 2:20 pm Contributed talk: Understanding the Limitations of Current Energy-Efficient Design Approaches for Deep Neural Networks, Vivienne Sze
  • 2:20 am - 2:30 am Contributed talk: Towards High-Performance Prediction Serving Systems, Matteo Interlandi
  • 2:30 pm - 2:55 pm Coffee Break
  • Session IV (moderator: Sarah Bird)
  • 2:55 pm - 3:05 pm Contributed talk: "I Like the Way You Think!" - Inspecting the Internal Logic of Recurrent Neural Networks, Thibault Sellam
  • 3:05 pm - 3:45 pm Invited talk: Systems and Machine Learning Symbiosis, Jeff Dean
  • 3:45 pm - 4:00 pm Closing Remarks: Matei Zaharia

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday Morning Videos: IPAM Workshop on New Deep Learning Techniques

Yann mentioned it on its twitter feed, the videos and slides of the IPAM workshop on New Deep Learning Techniques is out. Enjoy !

Samuel Bowman (New York University)
Toward natural language semantics in learned representations

Emily Fox (University of Washington)
Interpretable and Sparse Neural Network Time Series Models for Granger Causality Discovery

Ellie Pavlick (University of Pennsylvania)
Should we care about linguistics?

Leonidas Guibas (Stanford University)
Knowledge Transport Over Visual Data

Yann LeCun (New York University)
Public Lecture: Deep Learning and the Future of Artificial Intelligence

Alán Aspuru-Guzik (Harvard University)
Generative models for the inverse design of molecules and materials

Daniel Rueckert (Imperial College)
Deep learning in medical imaging: Techniques for image reconstruction, super-resolution and segmentation

Kyle Cranmer (New York University)
Deep Learning in the Physical Sciences

Stéphane Mallat (École Normale Supérieure)
Deep Generative Networks as Inverse Problems

Michael Elad (Technion - Israel Institute of Technology)
Sparse Modeling in Image Processing and Deep Learning

Yann LeCun (New York University)
Public Lecture: AI Breakthroughs & Obstacles to Progress, Mathematical and Otherwise

Xavier Bresson (Nanyang Technological University, Singapore)
Convolutional Neural Networks on Graphs

Federico Monti (Universita della Svizzera Italiana)
Deep Geometric Matrix Completion: a Geometric Deep Learning approach to Recommender Systems

Joan Bruna (New York University)
On Computational Hardness with Graph Neural Networks

Jure Leskovec (Stanford University)
Large-scale Graph Representation Learning

Arthur Szlam (Facebook)
Composable planning with attributes

Yann LeCun (New York University)
A Few (More) Approaches to Unsupervised Learning

Sanja Fidler (University of Toronto)
Teaching Machines with Humans in the Loop

Raquel Urtasun (University of Toronto)
Deep Learning for Self-Driving Cars

Pratik Chaudhari (University of California, Los Angeles (UCLA))
Unraveling the mysteries of stochastic gradient descent on deep networks

Stefano Soatto (University of California, Los Angeles (UCLA))
Emergence Theory of Deep Learning

Tom Goldstein (University of Maryland)
What do neural net loss functions look like?

Stanley Osher (University of California, Los Angeles (UCLA))
New Techniques in Optimization and Their Applications to Deep Learning and Related Inverse Problems

Michael Bronstein (USI Lugano, Switzerland)
Deep functional maps: intrinsic structured prediction for dense shape correspondence

Sainbayar Sukhbaatar (New York University)
Deep Architecture for Sets and Its Application to Multi-agent Communication

Zuowei Shen (National University of Singapore)
Deep Learning: Approximation of functions by composition

Wei Zhu (Duke University)
LDMnet: low dimensional manifold regularized neural networks

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.