COURSE LEARNING GOALS
The objective of the class is to:(1) learn about robotics and control, (2) understand why machine learning is a necessary tool for building autonomous and intelligent robots, (3) get familiar with recent research articles on robot learning, (4) learn advanced machine learning techniques for robotics and control, and (5) provide experience in implementing such techniques on representative challenges. The course is intended for computer science graduate students, who have been exposed to artificial intelligence material in the past. Experience in programming in Python/C++ is needed for a project with a Baxter robot. INSTRUCTOR Abdeslam Boularias
The class presents recent developments in machine learning that are related to robotics . Example topics include: (a) Classical Robotics and Optimal Control: Kinematics, Dynamics, Representing Trajectories, Control in Joint Space and in Task Space (b) Reinforcement Learning, Learning from Demonstrations, Model Learning (c) Grasping and Manipulation (d) Robot Vision BOOKS No particular textbook will be used in the course. The material will be primarily based on research papers. Some suggested classical background reading:
GRADING SCHEME
Participation: 10%
Written presentation: 15% Oral presentation: 15%
Homework: 20% Project: 40%
SCHEDULE
Lecture 1 : Introduction and Overview Course data. Challenges in deploying robots into real-world environments. Progress in humanoid robots. Industrial vs. autonomous robots. Programming robots vs Machine Learning. What can robots learn? Challenges in robot learning. Schedule overview. Lecture 2 : Classical Robotics What is a robot? Basic Terminology (joints, degrees of freedom, redundancy, forces, torques, controls, task-space, end- effector). Modeling Robots: (1) Kinematics (rotations and translations, singularities, inverse kinematics, Jacobian), (2) Dynamics (essential equations, mass matrix, coriolis forces and centrifugal forces, gravity compensation, Lagrange’s equations of the second kind, Newton-Euler equations, Newton-Euler recursive algorithm). Trajectory representation (splines, potential fields, dynamical systems, dynamical motor primitives). Control in joint space (linear control, PID controllers, model-based control). Control in task space (inverse kinematics, differential inverse kinematics). Lecture 3 : Optimal Control Decision-making. Interaction between a robot and its environment. Example of decision-making problems: robot naviga- tion. Value (Utility). Markov Decision Process (MDP). Example of a Markov Decision Process. Finite or Infinite Horizons, Discount Factors. Policies and Value Functions. Bellman Equation. Optimal policies. Planning with Markov Decision Pro- cesses. Policy iteration. Value iteration. Bellman backup as a contraction operator. Fixed point theorem and convergence of value and policy iteration. Partially Observable MDP (POMDP). General problem. Hamilton-Jacobi-Bellman equa- tion. Linear Quadratic Regulators (LQR). Finite-horizon, continuous-time LQR. Infinite-horizon, continuous-time LQR. Finite-horizon, discrete-time LQR. Infinite-horizon, discrete-time LQR. Algebraic Riccati equation. Real-world examples of LQR problems in physical systems (pushing an object from A to B). Value iteration solution to LQR. Generalized LQR assumptions. Affine systems. System with stochasticity. Regulation around non-zero fixed point for non-linear systems. Penalization for change in control inputs. Linear time varying (LTV) systems. Trajectory following for non-linear systems. Differential dynamic programming. Lecture 4 : Machine Learning What is Learning? Adaptive Behavior. What is Machine Learning? Empirical Inference. Generalization. Principle of Occam’s razor. Function class complexity. Overfitting vs Generalization. Bias-Variance error decomposition. A Taste of Machine Learning: Applications, Data, Problems (binary and multiclass classification, regression, online vs batch learning, transduction, active learning, semi-supervised learning, covariate shift correction, domaine adaptation, co-training). Basic machine learning algorithms. Naive Bayes. Nearest Neighbors (distances, voronoi diagram, KD-tree data structure, tuning k in k-NN and relationship to bias-variance trade-off, memory issues, curse of dimensionality, adaptability to local changes in robotic applications, classification vs regression versions of k-NN). Kernel Regression (Parzen Windows, Nadaraya- Watson estimator, metric learning, application to learning inverse dynamics). Gaussian Process Regression. The Mean Classifier (kernelized version). Perceptron. Kernel Perceptron. Novikov’s theorem. K-means. Mean-shift. Spectral clustering. Neural networks. Back-propagation algorithm. Lecture 5 : Reinforcement Learning Reminder on Markov Decision Processes. Reinforcement Learning (RL) Setup. RL in behavioral psychology. The Q- learning algorithm. Robbins-Monro conditions for convergence. An example of exploration-exploration tradeoff. On- policy vs off-policy learning. Value prediction problems. Temporal differences algorithms for policy evaluation TD(0). Monte Carlo. Example comparing Monte Carlo with TD (0) (idea of bootstrapping in TD). TD(λ) algorithm. Value function approximation. Polynomial features. Radial basis functions. State aggregation. Tile coding. Non-parametric value function approximation. Kernel RL. TD(λ) algorithm with linear value function approximation. Least-Squares Temporal Differences (LSTD). Online version of LSTD. LSTD vs TD (stability, accuracy, and computational cost). Learning control policies. Multi-armed bandits. Notion of regret minimization. ε-greedy policies. Boltzmann policies. UCB algorithm. Combining lower and upper bounds for action pruning. UCT algorithm. Q-learning with function approximation. SARSA algorithm. Fitted Q-iteration. Deep Q-Networks. Lecture 6 : Policy Search Actor-critic architectures. Implementing a critic. Least-Squares Policy Iteration (LSPI) as an example of an actor-critic model. Parameterized policies. Policy gradient theorem. Finite-differences method. Likelihood ratio methods (REIN- FORCE algorithms). Natural Actor-Critic (NAC). RL in robotics by Reward-weighted Regression. Cost-Regularized Kernel Regression (CRKR). Policy Search for Motor Primitives in Robotics. Policy learning by Weighting Exploration with the Returns (PoWER) for Motor Primitives. Relative Entropy Policy Search (REPS) algorithms. Policy search as an inference problem. Monte-Carlo EM-based policy search. Policy Improvements by Path Integrals. Trust Region Policy Optimization (TRPO). Real Robot Applications with Model-free Policy Search. Learning Baseball with NAC. Learning Ball-in-the-Cup with PoWER. Learning Pan-Cake Flipping with PoWER/RWR. Learning Dart Throwing with CRKR. Learning Table Tennis with CRKR. Learning Tetherball with hierarchical REPS. Lecture 7 : Model Learning General loop in model-based RL. Model Learning in finite spaces, Dyna algorithm. Approaches to Dealing with Uncertain Models. Challenges. Models: Locally Weighted Bayesian Regression and Gaussian Process Regression. Policy Evaluation- of-Goodness and Search Using Scenarios (PEGASUS). Linearization. Moment Matching. Gradient-free Model-based Policy Updates. Sampling-based policy gradients. Analytic policy gradients. Probabilistic Inference for Learning Control (PILCO). Guided Policy Search (GPS). Robot Applications. Model-based policy search methods with stochastic inference were used for learning to hover helicopters. Combination of a parametric prior and GPs for modeling and learning to control an autonomous blimp. Pile stacking with PILCO. Inverted pendulum with PILCO. Control of Tensegrity robots with Guided Policy Search. Lecture 8 : Learning from Demonstrations Overview of learning from demonstration. What is learning from demonstration (LfD)? Advantages of LfD vs Reinforce- ment Learning. Design and setup choices. Dealing with dataset limitations. Human-robot correspondance problem. Record mapping. Embodiment mapping. Demonstrations (teleoperation, shadowing, sensors on teacher, external observations). Statistical models for learning policies. Behavioral cloning vs inverse reinforcement learning (IRL). Motivation for inverse RL. Example applications in robotics (highway driving, aerial imagery based navigation, parking lot navigation, urban nav- igation, human path planning, human goal inference, quadruped locomotion). Max margin algorithms. Feature expectation matching methods (MaxEnt IRL and Relative Entropy IRL). Reward function as policy parameterization. Bayesian IRL. Nonlinear IRL. GP-IRL. Deep maximum entropy IRL. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. Lecture 9 : Grasping and Manipulation Mathematical models of grasping. Velocity kinematics. Grasp Matrix and Hand Jacobian. Contact Modeling. Planar Simplifications. Dynamics and Equilibrium. Grasp Classifications. Form Closure vs Force Closure. Example: Grasped Sphere. Analytic vs. Data-Driven Approaches. Offline generation of a grasp experience database for known objects. Learning from humans. Reinforcement learning approaches to grasping. Discriminative approaches for grasping familiar objects. Grasp synthesis by comparison. Generative models for grasp synthesis. Grasping unknown objects. Features for learning to grasp unknown objects. End-to-end deep learning approaches. Grasp affordances, anthropomorphic dexterous hands, Grasp planning and learning. Non-rigid object manipulation. Lecture 10 : Robot Vision Object Recognition: a machine learning approach. Overview of training and testing processes for object recognition. Sim- plest features: pixel values, image gradients, color histograms. Spatial histograms. Haar-like features. Integral images. Gabor filter. GIST Features. Textons. Learning Textons from data. Histogram of Gradients (HOG). 3D Pose Estimation. Point Set Registration. Scale-Invariant Feature Transform (SIFT). Convolutional Neural Networks. Region-based Convolutional Networks (R-CNN). Image segmentation. Tracking. Interactive perception. Pixel-to-torque: an end-to-end deep learning approach to robot learning. list of topics for presentations Topic 1: Reinforcement Learning in Brain-Machine Interfaces Assigned to: Mingwen Dong
Topic 2: Deep Reinforcement Learning Assigned to: Raghav Bhardwaj
Topic 3: Active Vision for Object Search Assigned to: WEI TANG
Topic 4: Model Learning Assigned to: TBD
Topic 5: Belief Space Planning Assigned to: Ankush Bhalotia
Topic 6: Learning to Grasp Assigned to: TBD
Topic 7: Learning to Manipulate Assigned to: RUI WANG
Topic 8: Learning to Walk Assigned to: Shikhar Dev Gupta
Topic 9: Inverse Reinforcement Learning Assigned to: Poornima Suresh
Topic 10: Interactive Segmentation Assigned to: Jay Kalyanaraman
Topic 11: Translating Commands in Natural Language into Plans Assigned to: Hao Yan
Topic 12: High-speed Robots Assigned to: TBD
Topic 13: Autonomous Driving Assigned to: ZE LIU
Topic 14: Imitation Learning I Assigned to: TBD
Topic 15: Imitation Learning II Assigned to: TBD
Topic 16: Robot Reinforcement Learning I Assigned to: TBD
Topic 17: Robot Reinforcement Learning II Assigned to: Timothy Yong
Topic 18: Human-Robot Interaction Assigned to: Zhe Chang
Topic 19: Learning-based Control for Quadcoptors Assigned to: Merrill Edmonds
Presentations date: TBD on December, 2017 |