Robot Learning Seminar

COURSE LEARNING GOALS

The objective of the class is to:
(1) learn about robotics and control,

(2) understand why machine learning is a necessary tool for building autonomous and intelligent robots,

(3) get familiar with recent research articles on robot learning,

(4) learn advanced machine learning techniques for robotics and control, and
(5) provide experience in implementing such techniques on representative challenges.

The course is intended for computer science graduate students, who have been exposed to artificial intelligence material in the past. Experience in programming in Python/C++ is needed for a project with a Baxter robot.

INSTRUCTOR

Abdeslam Boularias

OFFICE HOURS

Abdeslam Boularias: Fridays 2:00-3:00 PM in CBIM 07

TOPICS

The class presents recent developments in machine learning that are related to robotics . Example topics include:

(a) Classical Robotics and Optimal Control: Kinematics, Dynamics, Representing Trajectories, Control in Joint Space and in Task Space
(b) Reinforcement Learning, Learning from Demonstrations, Model Learning
(c) Grasping and Manipulation

(d) Robot Vision

BOOKS

No particular textbook will be used in the course. The material will be primarily based on research papers. Some suggested classical background reading:

on robotics: B. Siciliano, L. Sciavicco: Robotics: Modelling, Planning and Control, Springer, 2009
on machine learning: C.Bishop: Pattern Recognition and Machine Learning, Springer, 2006.
on reinforcement learning: R. Sutton, S. Barto: Reinforcement Learning, MIT Press, 1998.

EXPECTED WORK

Regular readings and homework, written and oral presentations, projects.

GRADING SCHEME

Participation: 10%

Written presentation: 15%

Oral presentation: 15%

Homework: 20%

Project: 40%

SCHEDULE

Lecture 1 : Introduction and Overview

Course data. Challenges in deploying robots into real-world environments. Progress in humanoid robots. Industrial vs. autonomous robots. Programming robots vs Machine Learning. What can robots learn? Challenges in robot learning. Schedule overview.

Lecture 2 : Classical Robotics

What is a robot? Basic Terminology (joints, degrees of freedom, redundancy, forces, torques, controls, task-space, end- effector). Modeling Robots: (1) Kinematics (rotations and translations, singularities, inverse kinematics, Jacobian), (2) Dynamics (essential equations, mass matrix, coriolis forces and centrifugal forces, gravity compensation, Lagrange’s equations of the second kind, Newton-Euler equations, Newton-Euler recursive algorithm). Trajectory representation (splines, potential fields, dynamical systems, dynamical motor primitives). Control in joint space (linear control, PID controllers, model-based control). Control in task space (inverse kinematics, differential inverse kinematics).

Lecture 3 : Optimal Control

Decision-making. Interaction between a robot and its environment. Example of decision-making problems: robot naviga- tion. Value (Utility). Markov Decision Process (MDP). Example of a Markov Decision Process. Finite or Infinite Horizons, Discount Factors. Policies and Value Functions. Bellman Equation. Optimal policies. Planning with Markov Decision Pro- cesses. Policy iteration. Value iteration. Bellman backup as a contraction operator. Fixed point theorem and convergence of value and policy iteration. Partially Observable MDP (POMDP). General problem. Hamilton-Jacobi-Bellman equa- tion. Linear Quadratic Regulators (LQR). Finite-horizon, continuous-time LQR. Infinite-horizon, continuous-time LQR. Finite-horizon, discrete-time LQR. Infinite-horizon, discrete-time LQR. Algebraic Riccati equation. Real-world examples of LQR problems in physical systems (pushing an object from A to B). Value iteration solution to LQR. Generalized LQR assumptions. Affine systems. System with stochasticity. Regulation around non-zero fixed point for non-linear systems. Penalization for change in control inputs. Linear time varying (LTV) systems. Trajectory following for non-linear systems. Differential dynamic programming.

Lecture 4 : Machine Learning

What is Learning? Adaptive Behavior. What is Machine Learning? Empirical Inference. Generalization. Principle of Occam’s razor. Function class complexity. Overfitting vs Generalization. Bias-Variance error decomposition. A Taste of Machine Learning: Applications, Data, Problems (binary and multiclass classification, regression, online vs batch learning, transduction, active learning, semi-supervised learning, covariate shift correction, domaine adaptation, co-training). Basic machine learning algorithms. Naive Bayes. Nearest Neighbors (distances, voronoi diagram, KD-tree data structure, tuning k in k-NN and relationship to bias-variance trade-off, memory issues, curse of dimensionality, adaptability to local changes in robotic applications, classification vs regression versions of k-NN). Kernel Regression (Parzen Windows, Nadaraya- Watson estimator, metric learning, application to learning inverse dynamics). Gaussian Process Regression. The Mean Classifier (kernelized version). Perceptron. Kernel Perceptron. Novikov’s theorem. K-means. Mean-shift. Spectral clustering. Neural networks. Back-propagation algorithm.

Lecture 5 : Reinforcement Learning

Reminder on Markov Decision Processes. Reinforcement Learning (RL) Setup. RL in behavioral psychology. The Q- learning algorithm. Robbins-Monro conditions for convergence. An example of exploration-exploration tradeoff. On- policy vs off-policy learning. Value prediction problems. Temporal differences algorithms for policy evaluation TD(0). Monte Carlo. Example comparing Monte Carlo with TD (0) (idea of bootstrapping in TD). TD(λ) algorithm. Value function approximation. Polynomial features. Radial basis functions. State aggregation. Tile coding. Non-parametric value function approximation. Kernel RL. TD(λ) algorithm with linear value function approximation. Least-Squares Temporal Differences (LSTD). Online version of LSTD. LSTD vs TD (stability, accuracy, and computational cost). Learning control policies. Multi-armed bandits. Notion of regret minimization. ε-greedy policies. Boltzmann policies. UCB algorithm. Combining lower and upper bounds for action pruning. UCT algorithm. Q-learning with function approximation. SARSA algorithm. Fitted Q-iteration. Deep Q-Networks.

Lecture 6 : Policy Search

Actor-critic architectures. Implementing a critic. Least-Squares Policy Iteration (LSPI) as an example of an actor-critic model. Parameterized policies. Policy gradient theorem. Finite-differences method. Likelihood ratio methods (REIN- FORCE algorithms). Natural Actor-Critic (NAC). RL in robotics by Reward-weighted Regression. Cost-Regularized Kernel Regression (CRKR). Policy Search for Motor Primitives in Robotics. Policy learning by Weighting Exploration with the Returns (PoWER) for Motor Primitives. Relative Entropy Policy Search (REPS) algorithms. Policy search as an inference problem. Monte-Carlo EM-based policy search. Policy Improvements by Path Integrals. Trust Region Policy Optimization (TRPO). Real Robot Applications with Model-free Policy Search. Learning Baseball with NAC. Learning Ball-in-the-Cup with PoWER. Learning Pan-Cake Flipping with PoWER/RWR. Learning Dart Throwing with CRKR. Learning Table Tennis with CRKR. Learning Tetherball with hierarchical REPS.

Lecture 7 : Model Learning

General loop in model-based RL. Model Learning in finite spaces, Dyna algorithm. Approaches to Dealing with Uncertain Models. Challenges. Models: Locally Weighted Bayesian Regression and Gaussian Process Regression. Policy Evaluation- of-Goodness and Search Using Scenarios (PEGASUS). Linearization. Moment Matching. Gradient-free Model-based Policy Updates. Sampling-based policy gradients. Analytic policy gradients. Probabilistic Inference for Learning Control (PILCO). Guided Policy Search (GPS). Robot Applications. Model-based policy search methods with stochastic inference were used for learning to hover helicopters. Combination of a parametric prior and GPs for modeling and learning to control an autonomous blimp. Pile stacking with PILCO. Inverted pendulum with PILCO. Control of Tensegrity robots with Guided Policy Search.

Lecture 8 : Learning from Demonstrations

Overview of learning from demonstration. What is learning from demonstration (LfD)? Advantages of LfD vs Reinforce- ment Learning. Design and setup choices. Dealing with dataset limitations. Human-robot correspondance problem. Record mapping. Embodiment mapping. Demonstrations (teleoperation, shadowing, sensors on teacher, external observations). Statistical models for learning policies. Behavioral cloning vs inverse reinforcement learning (IRL). Motivation for inverse RL. Example applications in robotics (highway driving, aerial imagery based navigation, parking lot navigation, urban nav- igation, human path planning, human goal inference, quadruped locomotion). Max margin algorithms. Feature expectation matching methods (MaxEnt IRL and Relative Entropy IRL). Reward function as policy parameterization. Bayesian IRL. Nonlinear IRL. GP-IRL. Deep maximum entropy IRL. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization.

Lecture 9 : Grasping and Manipulation

Mathematical models of grasping. Velocity kinematics. Grasp Matrix and Hand Jacobian. Contact Modeling. Planar Simplifications. Dynamics and Equilibrium. Grasp Classifications. Form Closure vs Force Closure. Example: Grasped Sphere. Analytic vs. Data-Driven Approaches. Offline generation of a grasp experience database for known objects. Learning from humans. Reinforcement learning approaches to grasping. Discriminative approaches for grasping familiar objects. Grasp synthesis by comparison. Generative models for grasp synthesis. Grasping unknown objects. Features for learning to grasp unknown objects. End-to-end deep learning approaches. Grasp affordances, anthropomorphic dexterous hands, Grasp planning and learning. Non-rigid object manipulation.

Lecture 10 : Robot Vision

Object Recognition: a machine learning approach. Overview of training and testing processes for object recognition. Sim- plest features: pixel values, image gradients, color histograms. Spatial histograms. Haar-like features. Integral images. Gabor filter. GIST Features. Textons. Learning Textons from data. Histogram of Gradients (HOG). 3D Pose Estimation. Point Set Registration. Scale-Invariant Feature Transform (SIFT). Convolutional Neural Networks. Region-based Convolutional Networks (R-CNN). Image segmentation. Tracking. Interactive perception. Pixel-to-torque: an end-to-end deep learning approach to robot learning.

list of topics for presentations

Topic 1: Reinforcement Learning in Brain-Machine Interfaces

Assigned to: Mingwen Dong

Jack DiGiovanna, Babak Mahmoudi, Jose Fortes, Jose C. Principe, Justin C. Sanchez. Co-adaptive Brain–Machine Interface via Reinforcement Learning. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4540104
Eric A. Pohlmeyer, Babak Mahmoudi, Shijia Geng, Noeline W. Prins, Justin C. Sanchez. Using Reinforcement Learning to Provide Stable Brain-Machine Interface Control Despite Neural Input Reorganization. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087253

Topic 2: Deep Reinforcement Learning

Assigned to: Raghav Bhardwaj

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Volodymyr Mnih et al. Human-level control through deep reinforcement learning. http://www.readcube.com/articles/10.1038%2Fnature14236

Juhnyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh. Action-Conditional Video Prediction Using Deep Networks in ATARI Games. http://web.eecs.umich.edu/~baveja/Papers/NIPS2015.pdf

Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, and Xiaoshi Wang. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning. http://web.eecs.umich.edu/~baveja/Papers/UCTtoCNNsAtariGames-FinalVersion.pdf

Guillaume Lample and Devendra Singh Chaplot. Playing FPS Games with Deep Reinforcement Learning. https://arxiv.org/pdf/1609.05521v1.pdf

Topic 3: Active Vision for Object Search

Assigned to: WEI TANG

Ksenia Shubina, John K. Tsotsos. Visual search for an object in a 3d environment using a mobile robot. http://www.sciencedirect.com/science/article/pii/S1077314210000378
Shengyong Chen, Youfu Li, and Ngai Ming Kwok. Active vision in robotic systems: A survey of recent developments. http://ijr.sagepub.com/content/30/11/1343.full.pdf+html
Lars Kunze, Michael Beetz, Manabu Saito, Haseru Azuma, Kei Okada, and Masayuki Inaba. Searching objects in large-scale indoor environments: A decision-thereotic approach. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6224965
Gregory Kahn, Peter Sujan, Sachin Patil, Bopardikar Shaunak D., Julian Ryde, Ken Goldberg, and Pieter Abbeel. Active exploration using trajectory optimization for robotic grasping in the presence of occlusions. http://rll.berkeley.edu/~sachin/papers/Kahn-ICRA2015.pdf

Topic 4: Model Learning

Assigned to: TBD

Duy Nguyen-Tuong and Jan Peters. Model Learning for Robot Control: A Survey. https://pdfs.semanticscholar.org/f3c2/0b3219beb62abb56ce833f5337ace119a7d3.pdf
Arjan Gijsberts and Giorgio Metta. Incremental Learning of Robot Dynamics using Random Features. http://www.tech.plym.ac.uk/socce/italk/publications/Gijsbert-icra2011.pdf
E. Gribovskaya, S.M. Khansari-Zadeh, Aude Billard. Learning Nonlinear Multivariate Dynamics of Motion in Robotic Manipulators. http://lasa.epfl.ch/publications/uploadedFiles/IJRR_Motion_Learning_v2.pdf
Duy Nguyen-Tuong, Jan Peters, Matthias Seeger, Bernhard Schoelkopf. Learning Inverse Dynamics: a Comparison. http://infoscience.epfl.ch/record/175477/files/esann08_nguyenetal.pdf

Topic 5: Belief Space Planning

Assigned to: Ankush Bhalotia

Robert Platt Jr., Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez. Belief space planning assuming maximum likelihood observations. http://groups.csail.mit.edu/robotics-center/public_papers/Platt10.pdf
Samuel Prentice and Nicholas Roy. The Belief Roadmap: Efficient Planning in Belief Space by Factoring the Covariance. https://courses.cs.washington.edu/courses/cse571/12wi/slides/ijrr09-brm.pdf
Robert Platt, Leslie Kaelbling, Tomas Lozano-Perez, and Russ Tedrake. Simultaneous Localization and Grasping as a Belief Space Control Problem. http://www.isrr-2011.org/ISRR-2011/Program_files/Papers/Platt-ISRR-2011.pdf
Joelle Pineau and Geoff Gordon. POMDP Planning for Robust Robot Control. http://www.cs.mcgill.ca/~jpineau/files/jpineau-isrr05.pdf

Topic 6: Learning to Grasp

Assigned to: TBD

Jeannette Bohg, Antonio Morales, Tamim Asfour, Danica Kragic. Data-Driven Grasp Synthesis - A Survey. http://arxiv.org/pdf/1309.2660.pdf
Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Chioma Osondu, Andrew Y. Ng. Learning to Grasp Novel Objects using Vision. http://pr.cs.cornell.edu/grasping/ISER_LearningGrasp.pdf
Lerrel Pinto, Abhinav Gupta. Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours. http://arxiv.org/abs/1509.06825
Ian Lenz, Honglak Lee and Ashutosh Saxena. Deep Learning for Detecting Robotic Grasps. http://www.roboticsproceedings.org/rss09/p12.pdf

Topic 7: Learning to Manipulate

Assigned to: RUI WANG

Yezhou Yang, Yi Li, Cornelia Fermüller, Yiannis Aloimonos. Robot Learning Manipulation Action Plans by “Watching” Unconstrained Videos from the World Wide Web. http://www.umiacs.umd.edu/~yzyang/paper/YouCookMani_CameraReady.pdf
Hao Dang and Peter K. Allen. Robot Learning of Everyday Object Manipulations via Human Demonstration. http://www.cs.columbia.edu/~dang/papers/dang_iros2010.pdf
Dov Katz, Yuri Pyuro and Oliver Brock. Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation. http://roboticsproceedings.org/rss04/p33.pdf
Abdeslam Boularias, James Andrew Bagnell, and Anthony Stentz. Learning to manipulate unknown objects in clutter by reinforcement. http://www.ri.cmu.edu/pub_files/2015/1/AbdeslamAAAI2015.pdf

Topic 8: Learning to Walk

Assigned to: Shikhar Dev Gupta

D. Belter and P. Skrzypczynski. A biologically inspired approach to feasible gait learning for a hexapod robot. http://matwbn.icm.edu.pl/ksiazki/amc/amc20/amc2015.pdf
M. Kalakrishnan, J. Buchli, P. Pastor, M. Mistry, and S. Schaal. Learning, planning, and control for quadruped locomotion over challenging terrain. http://www.cse.unr.edu/robotics/bekris/cs773_s12/sites/cse.unr.edu.robotics.bekris.cs773_s12/files/paper_09.pdf
Jun Nakanishi, Jun Morimoto, Gen Endo, Gordon Cheng, Stefan Schaal, Mitsuo Kawato. Learning from demonstration and adaptation of biped locomotion. http://www.sciencedirect.com/science/article/pii/S0921889004000399
J. Morimoto and C. G. Atkeson. Learning Biped Locomotion. http://www.cs.cmu.edu/~cga/papers/morimoto-ram.pdf

Topic 9: Inverse Reinforcement Learning

Assigned to: Poornima Suresh

Andrew Y. Ng and Stuart Russell. Algorithms for Inverse Reinforcement Learning. http://ai.stanford.edu/~ang/papers/icml00-irl.pdf
Pieter Abbeel and Andrew Y. Ng. Apprenticeship Learning via Inverse Reinforcement Learning. http://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf
Zico Kolter, Pieter Abbeel, Andrew Y. Ng. Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion. http://www.cs.stanford.edu/groups/littledog/pubs/kolter-nips08.pdf
Nathan D. Ratliff, J. Andrew Bagnell, Martin A. Zinkevich. Maximum Margin Planning. http://martin.zinkevich.org/publications/maximummarginplanning.pdf

Topic 10: Interactive Segmentation

Assigned to: Jay Kalyanaraman

Karol Hausman, Ferenc Balint-Benczedi, Dejan Pangercic, Zoltan-Csaba Marton, Ryohei Ueda, Kei Okada, Michael Beetz. Tracking-based Interactive Segmentation of Textureless Objects. http://robotics.usc.edu/~hausmankarol/hausman13interactive.pdf
Herke van Hoof, Oliver Kroemer and Jan Peters. Probabilistic Interactive Segmentation for Anthropomorphic Robots in Cluttered Environments. http://www.ausy.informatik.tu-darmstadt.de/uploads/Publications/hoof-HUMANOIDS.pdf
Dov Katz, Moslem Kazemi, J. Andrew Bagnell and Anthony Stentz. Interactive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects. https://www.ri.cmu.edu/pub_files/2013/5/ICRA13_1616_FI.pdf
Dov Katz and Andreas Orthey and Oliver Brock. Interactive Perception of Articulated Objects. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/iser2010_Katz_Orthey_Brock.pdf

Topic 11: Translating Commands in Natural Language into Plans

Assigned to: Hao Yan

Stefanie Tellex, Ross Knepper, Adrian Li, Daniela Rus, and Nicholas Roy. Asking for Help Using Inverse Semantics. http://cs.brown.edu/~stefie10/publications/tellex14.pdf
Walter, M.R., Hemachandra, S., Homberg, B., Tellex, S., Teller, S. A Framework for Learning Semantic Maps from Grounded Natural Language Descriptions. http://cs.brown.edu/~stefie10/publications/walter13.pdf
Stefanie Tellex, Pratiksha Thaker, Joshua Joseph, Nicholas Roy. Learning Perceptually Grounded Word Meanings From Unaligned Parallel Data. http://cs.brown.edu/~stefie10/publications/tellex13.pdf
Stefanie Tellex, Pratiksha Thaker, Robin Deits, Dimitar Simeonov, Thomas Kollar, Nicholas Roy. Toward Information Theoretic Human-Robot Dialog. http://cs.brown.edu/~stefie10/publications/tellex12.pdf

Topic 12: High-speed Robots

Assigned to: TBD

Muelling, K.; Kober, J.; Kroemer, O.; Peters. Learning to Select and Generalize Striking Movements in Robot Table Tennis. http://www.ausy.informatik.tu-darmstadt.de/uploads/Publications/Muelling_IJRR_2013.pdf
Mirrazavi Salehian, S. S., Khoramshahi, M. and Billard, A. A Dynamical System Approach for Catching Softly a Flying Object: Theory and Experiment. http://lasa.epfl.ch/publications/uploadedFiles/TRO_final.pdf
Kober, Jens; Glisson, Matthew; Mistry, Michael (2012). Playing Catch and Juggling with a Humanoid Robot. http://www.disneyresearch.com/wp-content/uploads/ICHR12_0136_FI.pdf
X. Chen, Y. Tian ; Q. Huang ; W. Zhang ; Z. Yu. Dynamic model based ball trajectory prediction for a robot ping-pong player. https://www.researchgate.net/profile/Xiaopeng_Chen4/publication/251992166_Dynamic_model_based_ball_trajectory_prediction_for_a_robot_ping-pong_player/links/54c0aa400cf28a6324a32dee.pdf

Topic 13: Autonomous Driving

Assigned to: ZE LIU

Paolo Falcone, Francesco Borrelli, Jahan Asgari, Hongtei Eric Tseng, and Davor Hrovat. Predictive Active Steering Control for Autonomous Vehicle Systems. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4162483
Dean A. Pomerleau. Efficient Training of Artificial Neural Networksfor Autonomous Navigation. https://www.ri.cmu.edu/pub_files/pub3/pomerleau_dean_1991_1/pomerleau_dean_1991_1.pdf
Sebastian Thrun et al. Stanley: the robot that won the DARPA grand challenge. http://isl.ecst.csuchico.edu/DOCS/darpa2005/DARPA%202005%20Stanley.pdf
Chris Urmson et al. Autonomous driving in urban environments: Boss and the Urban Challenge. http://onlinelibrary.wiley.com/doi/10.1002/rob.20255/epdf

Topic 14: Imitation Learning I

Assigned to: TBD

Stefan Schaal. Is Imitation Learning the Route to Humanoid Robots? http://web.media.mit.edu/~cynthiab/Readings/schaal-TICS1999.pdf
Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. http://www.sciencedirect.com/science/article/pii/S0921889008001772
Christopher G. Atkeson and Stefan Schaal. Robot Learning From Demonstration. http://www.mcgovern-fagg.org/amy/courses/cs5973_fall2005/lfd.pdf
Sylvain Calinon, Florent D’halluin, Eric L. Sauser, Darwin G. Caldwell and Aude G. Billard. Learning and reproduction of gestures by imitation. http://programming-by-demonstration.org/papers/Calinon-RAM2010.pdf

Topic 15: Imitation Learning II

Assigned to: TBD

Monica N. Nicolescu and Maja J. Mataric. Task Learning Through Imitation and Human-Robot Interaction. http://marvin.cs.uidaho.edu/Teaching/CS504/Papers/robotTrainingHumanInteraction.pdf
George Konidaris and Andrew Barto. Building Portable Options: Skill Transfer in Reinforcement Learning. http://www-anw.cs.umass.edu/pubs/2006/konidaris_b_TECH06.pdf
Cynthia Breazeal and Brian Scassellati. Robots that imitate humans. http://scazlab.yale.edu/sites/default/files/files/TICS-02-Imitation.pdf
Sonya Alexandrova, Maya Cakmak, Kaijen Hsiao, and Leila Takayama. Robot Programming by Demonstration with Interactive Action Visualizations. http://www.roboticsproceedings.org/rss10/p48.pdf

Topic 16: Robot Reinforcement Learning I

Assigned to: TBD

Petar Kormushev, Sylvain Calinon and Darwin G. Caldwell. Robot Motor Skill Coordination with EM-based Reinforcement Learning. http://kormushev.com/papers/Kormushev-IROS2010.pdf
Jens Kober, J. Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. http://www.ias.tu-darmstadt.de/uploads/Publications/Kober_IJRR_2013.pdf
Jens Kober and Jan Peters. Policy search for motor primitives in robotics. http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/kober_MACH_2011.pdf
William D. Smart and Leslie Pack Kaelbling. Effective Reinforcement Learning for Mobile Robots. http://people.csail.mit.edu/lpk/papers/2002/SmartKaelbling-ICRA2002.pdf

Topic 17: Robot Reinforcement Learning II

Assigned to: Timothy Yong

Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel. Deep Spatial Autoencoders for Visuomotor Learning. http://arxiv.org/pdf/1509.06113.pdf
Sergey Levine, Nolan Wagener, Pieter Abbeel. Learning Contact-Rich Manipulation Skills with Guided Policy Search. http://rll.berkeley.edu/icra2015gps/robotgps.pdf
Niklas Wahlström, Thomas B. Schön, Marc P. Deisenroth. Learning Deep Dynamical Models From Image Pixels. http://www.doc.ic.ac.uk/~mpd37/publications/sysid-2015-hmd.pdf
Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel. End-to-End Training of Deep Visuomotor Policies. http://arxiv.org/pdf/1504.00702v4.pdf

Topic 18: Human-Robot Interaction

Assigned to: Zhe Chang

David Feil-Seifer and Maja J Mataric. Human-Robot Interaction. http://robotics.usc.edu/publications/media/uploads/pubs/585.pdf
Michael A. Goodrich and Alan C. Schultz. Human–Robot Interaction: A Survey. http://liris.cnrs.fr/alain.mille/survey_robotique.pdf
Stefanos Nikolaidis, Ramya Ramakrishnan, Keren Gu, Julie Shah. Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks. http://arxiv.org/pdf/1405.6341.pdf
Nikolaidis, Stefanos; Shah, Julie. Human-Robot Cross-Training: Computational Formulation, Modeling and Evaluation of a Human Team Training Strategy. http://www.stefanosnikolaidis.net/papers/HRI2013_Nikol_Shah.pdf

Topic 19: Learning-based Control for Quadcoptors

Assigned to: Merrill Edmonds

J. Hwangbo, I. Sa, R. Siegwart, and M. Hutter, “Control of a Quadrotor with Reinforcement Learning,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 2096–2103, Oct. 2017.
https://www.youtube.com/watch?v=T0A9voXzhng
M. Hehn and R. D’Andrea, “A frequency domain iterative learning algorithm for high-performance, periodic quadrocopter maneuvers,” Mechatronics, vol. 24, no. 8, pp. 954–965, Dec. 2014.
https://www.youtube.com/watch?v=sWilGsWQ1jo
P. Bouffard, A. Aswani, and C. Tomlin, “Learning-based model predictive control on a quadrotor: Onboard implementation and experimental results,” in 2012 IEEE International Conference on Robotics and Automation, 2012, pp. 279–284.
https://www.youtube.com/watch?v=dL_ZFSvLXlU
Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew Bagnell and Martial Hebert. “Learning Monocular Reactive UAV Control in Cluttered Natural Environments”. https://www.youtube.com/watch?v=hNsP6-K3Hn4 https://www.ri.cmu.edu/pub_files/2013/3/icra_camera_ready.pdf

Presentations date: TBD on December, 2017

Comments

Cancel

You do not have permission to add comments.