Robot Learning Seminar

The objective of the class is to:
(1) learn about robotics and control,
(2) understand why machine learning is a necessary tool for building autonomous and intelligent robots,  
(3) get familiar with recent research articles on robot learning,
(4) learn advanced machine learning techniques for robotics and control, and
(5) provide experience in implementing such techniques on representative challenges.

The course is intended for computer science graduate students, who have been exposed to artificial intelligence material in the past. Experience in programming in Python/C++ is needed for a project with a Baxter robot.

Abdeslam Boularias


Abdeslam Boularias: Fridays 2:00-3:00 PM in CBIM 07


The class presents recent developments in machine learning that are related to robotics . Example topics include:

(a) Classical Robotics and Optimal Control: Kinematics, Dynamics, Representing Trajectories, Control in Joint Space and in Task Space
(b) Reinforcement Learning, Learning from Demonstrations, Model Learning
(c) Grasping and Manipulation
(d) Robot Vision

No particular textbook will be used in the course. The material will be primarily based on research papers. Some suggested classical background reading:
  • on robotics: B. Siciliano, L. Sciavicco: Robotics: Modelling, Planning and Control, Springer, 2009 
  • on machine learning: C.Bishop: Pattern Recognition and Machine Learning, Springer, 2006. 
  • on reinforcement learning: R. Sutton, S. Barto: Reinforcement Learning, MIT Press, 1998.


Regular readings and homework, written and oral presentations, projects.

Participation: 10%
Written presentation: 15%
Oral presentation: 15%
Homework: 20%
Project: 40%


Lecture 1 : Introduction and Overview

Course data. Challenges in deploying robots into real-world environments. Progress in humanoid robots. Industrial vs. autonomous robots. Programming robots vs Machine Learning. What can robots learn? Challenges in robot learning. Schedule overview.

Lecture 2 : Classical Robotics

What is a robot? Basic Terminology (joints, degrees of freedom, redundancy, forces, torques, controls, task-space, end- effector). Modeling Robots: (1) Kinematics (rotations and translations, singularities, inverse kinematics, Jacobian), (2) Dynamics (essential equations, mass matrix, coriolis forces and centrifugal forces, gravity compensation, Lagrange’s equations of the second kind, Newton-Euler equations, Newton-Euler recursive algorithm). Trajectory representation (splines, potential fields, dynamical systems, dynamical motor primitives). Control in joint space (linear control, PID controllers, model-based control). Control in task space (inverse kinematics, differential inverse kinematics).

Lecture 3 : Optimal Control

Decision-making. Interaction between a robot and its environment. Example of decision-making problems: robot naviga- tion. Value (Utility). Markov Decision Process (MDP). Example of a Markov Decision Process. Finite or Infinite Horizons, Discount Factors. Policies and Value Functions. Bellman Equation. Optimal policies. Planning with Markov Decision Pro- cesses. Policy iteration. Value iteration. Bellman backup as a contraction operator. Fixed point theorem and convergence of value and policy iteration. Partially Observable MDP (POMDP). General problem. Hamilton-Jacobi-Bellman equa- tion. Linear Quadratic Regulators (LQR). Finite-horizon, continuous-time LQR. Infinite-horizon, continuous-time LQR. Finite-horizon, discrete-time LQR. Infinite-horizon, discrete-time LQR. Algebraic Riccati equation. Real-world examples of LQR problems in physical systems (pushing an object from A to B). Value iteration solution to LQR. Generalized LQR assumptions. Affine systems. System with stochasticity. Regulation around non-zero fixed point for non-linear systems. Penalization for change in control inputs. Linear time varying (LTV) systems. Trajectory following for non-linear systems. Differential dynamic programming.

Lecture 4 : Machine Learning

What is Learning? Adaptive Behavior. What is Machine Learning? Empirical Inference. Generalization. Principle of Occam’s razor. Function class complexity. Overfitting vs Generalization. Bias-Variance error decomposition. A Taste of Machine Learning: Applications, Data, Problems (binary and multiclass classification, regression, online vs batch learning, transduction, active learning, semi-supervised learning, covariate shift correction, domaine adaptation, co-training). Basic machine learning algorithms. Naive Bayes. Nearest Neighbors (distances, voronoi diagram, KD-tree data structure, tuning k in k-NN and relationship to bias-variance trade-off, memory issues, curse of dimensionality, adaptability to local changes in robotic applications, classification vs regression versions of k-NN). Kernel Regression (Parzen Windows, Nadaraya- Watson estimator, metric learning, application to learning inverse dynamics). Gaussian Process Regression. The Mean Classifier (kernelized version). Perceptron. Kernel Perceptron. Novikov’s theorem. K-means. Mean-shift. Spectral clustering. Neural networks. Back-propagation algorithm.

Lecture 5 : Reinforcement Learning

Reminder on Markov Decision Processes. Reinforcement Learning (RL) Setup. RL in behavioral psychology. The Q- learning algorithm. Robbins-Monro conditions for convergence. An example of exploration-exploration tradeoff. On- policy vs off-policy learning. Value prediction problems. Temporal differences algorithms for policy evaluation TD(0). Monte Carlo. Example comparing Monte Carlo with TD (0) (idea of bootstrapping in TD). TD(λ) algorithm. Value function approximation. Polynomial features. Radial basis functions. State aggregation. Tile coding. Non-parametric value function approximation. Kernel RL. TD(λ) algorithm with linear value function approximation. Least-Squares Temporal Differences (LSTD). Online version of LSTD. LSTD vs TD (stability, accuracy, and computational cost). Learning control policies. Multi-armed bandits. Notion of regret minimization. ε-greedy policies. Boltzmann policies. UCB algorithm. Combining lower and upper bounds for action pruning. UCT algorithm. Q-learning with function approximation. SARSA algorithm. Fitted Q-iteration. Deep Q-Networks.

Lecture 6 : Policy Search

Actor-critic architectures. Implementing a critic. Least-Squares Policy Iteration (LSPI) as an example of an actor-critic model. Parameterized policies. Policy gradient theorem. Finite-differences method. Likelihood ratio methods (REIN- FORCE algorithms). Natural Actor-Critic (NAC). RL in robotics by Reward-weighted Regression. Cost-Regularized Kernel Regression (CRKR). Policy Search for Motor Primitives in Robotics. Policy learning by Weighting Exploration with the Returns (PoWER) for Motor Primitives. Relative Entropy Policy Search (REPS) algorithms. Policy search as an inference problem. Monte-Carlo EM-based policy search. Policy Improvements by Path Integrals. Trust Region Policy Optimization (TRPO). Real Robot Applications with Model-free Policy Search. Learning Baseball with NAC. Learning Ball-in-the-Cup with PoWER. Learning Pan-Cake Flipping with PoWER/RWR. Learning Dart Throwing with CRKR. Learning Table Tennis with CRKR. Learning Tetherball with hierarchical REPS.

Lecture 7 : Model Learning

General loop in model-based RL. Model Learning in finite spaces, Dyna algorithm. Approaches to Dealing with Uncertain Models. Challenges. Models: Locally Weighted Bayesian Regression and Gaussian Process Regression. Policy Evaluation- of-Goodness and Search Using Scenarios (PEGASUS). Linearization. Moment Matching. Gradient-free Model-based Policy Updates. Sampling-based policy gradients. Analytic policy gradients. Probabilistic Inference for Learning Control (PILCO). Guided Policy Search (GPS). Robot Applications. Model-based policy search methods with stochastic inference were used for learning to hover helicopters. Combination of a parametric prior and GPs for modeling and learning to control an autonomous blimp. Pile stacking with PILCO. Inverted pendulum with PILCO. Control of Tensegrity robots with Guided Policy Search.

Lecture 8 : Learning from Demonstrations

Overview of learning from demonstration. What is learning from demonstration (LfD)? Advantages of LfD vs Reinforce- ment Learning. Design and setup choices. Dealing with dataset limitations. Human-robot correspondance problem. Record mapping. Embodiment mapping. Demonstrations (teleoperation, shadowing, sensors on teacher, external observations). Statistical models for learning policies. Behavioral cloning vs inverse reinforcement learning (IRL). Motivation for inverse RL. Example applications in robotics (highway driving, aerial imagery based navigation, parking lot navigation, urban nav- igation, human path planning, human goal inference, quadruped locomotion). Max margin algorithms. Feature expectation matching methods (MaxEnt IRL and Relative Entropy IRL). Reward function as policy parameterization. Bayesian IRL. Nonlinear IRL. GP-IRL. Deep maximum entropy IRL. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization.

Lecture 9 : Grasping and Manipulation

Mathematical models of grasping. Velocity kinematics. Grasp Matrix and Hand Jacobian. Contact Modeling. Planar Simplifications. Dynamics and Equilibrium. Grasp Classifications. Form Closure vs Force Closure. Example: Grasped Sphere. Analytic vs. Data-Driven Approaches. Offline generation of a grasp experience database for known objects. Learning from humans. Reinforcement learning approaches to grasping. Discriminative approaches for grasping familiar objects. Grasp synthesis by comparison. Generative models for grasp synthesis. Grasping unknown objects. Features for learning to grasp unknown objects. End-to-end deep learning approaches. Grasp affordances, anthropomorphic dexterous hands, Grasp planning and learning. Non-rigid object manipulation.

Lecture 10 : Robot Vision

Object Recognition: a machine learning approach. Overview of training and testing processes for object recognition. Sim- plest features: pixel values, image gradients, color histograms. Spatial histograms. Haar-like features. Integral images. Gabor filter. GIST Features. Textons. Learning Textons from data. Histogram of Gradients (HOG). 3D Pose Estimation. Point Set Registration. Scale-Invariant Feature Transform (SIFT). Convolutional Neural Networks. Region-based Convolutional Networks (R-CNN). Image segmentation. Tracking. Interactive perception. Pixel-to-torque: an end-to-end deep learning approach to robot learning.

list of topics for presentations

Topic 1: Reinforcement Learning in Brain-Machine Interfaces
Assigned to: Mingwen Dong

      Topic 2: Deep Reinforcement Learning
      Assigned to: Raghav Bhardwaj
          Topic 3: Active Vision for Object Search
          Assigned to: WEI TANG
          Topic 4: Model Learning
          Assigned to: TBD
          Topic 5: Belief Space Planning
          Assigned to: Ankush Bhalotia
          Topic 6: Learning to Grasp
          Assigned to:  TBD
            Topic 7: Learning to Manipulate
            Assigned to: RUI WANG
                Topic 8: Learning to Walk
                Assigned to: Shikhar Dev Gupta
                  Topic 9: Inverse Reinforcement Learning
                  Assigned to: Poornima Suresh
                      Topic 10: Interactive Segmentation
                      Assigned to: Jay Kalyanaraman
                      Topic 11: Translating Commands in Natural Language into Plans
                      Assigned to: Hao Yan
                      Topic 12: High-speed Robots
                      Assigned to: TBD
                      Topic 13: Autonomous Driving
                      Assigned to: ZE LIU
                      Topic 14: Imitation Learning I
                      Assigned to: TBD
                      Topic 15: Imitation Learning II
                      Assigned to: TBD
                      Topic 16: Robot Reinforcement Learning I
                      Assigned to: TBD
                      Topic 17: Robot Reinforcement Learning II
                      Assigned to: Timothy Yong
                          Topic 18: Human-Robot Interaction
                          Assigned to: Zhe Chang
                                Topic 19: Learning-based Control for Quadcoptors
                                Assigned to: Merrill Edmonds

                                        Presentations date: TBD on December, 2017
                                        You do not have permission to add comments.