COURSE LEARNING GOALS
The objective of the class is to:(1) learn about robotics and control, (2) understand why machine learning is a necessary tool for building autonomous and intelligent robots, (3) get familiar with recent research articles on robot learning, (4) learn advanced machine learning techniques for robotics and control, and (5) provide experience in implementing such techniques on representative challenges. The course is intended for computer science graduate students, who have been exposed to artificial intelligence material in the past. Experience in programming in Python/C++ is needed for a project with a Baxter robot. INSTRUCTOR Abdeslam Boularias
The class presents recent developments in machine learning that are related to robotics . Example topics include: (a) Classical Robotics and Optimal Control: Kinematics, Dynamics, Representing Trajectories, Control in Joint Space and in Task Space (b) Reinforcement Learning, Learning from Demonstrations, Model Learning (c) Grasping and Manipulation (d) Robot Vision BOOKS No particular textbook will be used in the course. The material will be primarily based on research papers. Some suggested classical background reading:
- on robotics: B. Siciliano, L. Sciavicco: Robotics: Modelling, Planning and Control, Springer, 2009
- on machine learning: C.Bishop: Pattern Recognition and Machine Learning, Springer, 2006.
- on reinforcement learning: R. Sutton, S. Barto: Reinforcement Learning, MIT Press, 1998.
GRADING SCHEME
Participation: 10%
Written presentation: 15% Oral presentation: 15%
Homework: 20% Project: 40%
SCHEDULE
Course data. Challenges in deploying robots into real-world environments. Progress in humanoid robots. Industrial vs. autonomous robots. Programming robots vs Machine Learning. What can robots learn? Challenges in robot learning. Schedule overview.
What is a robot? Basic Terminology (joints, degrees of freedom, redundancy, forces, torques, controls, task-space, end- effector). Modeling Robots: (1) Kinematics (rotations and translations, singularities, inverse kinematics, Jacobian), (2) Dynamics (essential equations, mass matrix, coriolis forces and centrifugal forces, gravity compensation, Lagrange’s equations of the second kind, Newton-Euler equations, Newton-Euler recursive algorithm). Trajectory representation (splines, potential fields, dynamical systems, dynamical motor primitives). Control in joint space (linear control, PID controllers, model-based control). Control in task space (inverse kinematics, differential inverse kinematics).
Decision-making. Interaction between a robot and its environment. Example of decision-making problems: robot naviga- tion. Value (Utility). Markov Decision Process (MDP). Example of a Markov Decision Process. Finite or Infinite Horizons, Discount Factors. Policies and Value Functions. Bellman Equation. Optimal policies. Planning with Markov Decision Pro- cesses. Policy iteration. Value iteration. Bellman backup as a contraction operator. Fixed point theorem and convergence of value and policy iteration. Partially Observable MDP (POMDP). General problem. Hamilton-Jacobi-Bellman equa- tion. Linear Quadratic Regulators (LQR). Finite-horizon, continuous-time LQR. Infinite-horizon, continuous-time LQR. Finite-horizon, discrete-time LQR. Infinite-horizon, discrete-time LQR. Algebraic Riccati equation. Real-world examples of LQR problems in physical systems (pushing an object from A to B). Value iteration solution to LQR. Generalized LQR assumptions. Affine systems. System with stochasticity. Regulation around non-zero fixed point for non-linear systems. Penalization for change in control inputs. Linear time varying (LTV) systems. Trajectory following for non-linear systems. Differential dynamic programming.
What is Learning? Adaptive Behavior. What is Machine Learning? Empirical Inference. Generalization. Principle of Occam’s razor. Function class complexity. Overfitting vs Generalization. Bias-Variance error decomposition. A Taste of Machine Learning: Applications, Data, Problems (binary and multiclass classification, regression, online vs batch learning, transduction, active learning, semi-supervised learning, covariate shift correction, domaine adaptation, co-training). Basic machine learning algorithms. Naive Bayes. Nearest Neighbors (distances, voronoi diagram, KD-tree data structure, tuning k in k-NN and relationship to bias-variance trade-off, memory issues, curse of dimensionality, adaptability to local changes in robotic applications, classification vs regression versions of k-NN). Kernel Regression (Parzen Windows, Nadaraya- Watson estimator, metric learning, application to learning inverse dynamics). Gaussian Process Regression. The Mean Classifier (kernelized version). Perceptron. Kernel Perceptron. Novikov’s theorem. K-means. Mean-shift. Spectral clustering. Neural networks. Back-propagation algorithm.
Reminder on Markov Decision Processes. Reinforcement Learning (RL) Setup. RL in behavioral psychology. The Q- learning algorithm. Robbins-Monro conditions for convergence. An example of exploration-exploration tradeoff. On- policy vs off-policy learning. Value prediction problems. Temporal differences algorithms for policy evaluation TD(0). Monte Carlo. Example comparing Monte Carlo with TD (0) (idea of bootstrapping in TD). TD(λ) algorithm. Value function approximation. Polynomial features. Radial basis functions. State aggregation. Tile coding. Non-parametric value function approximation. Kernel RL. TD(λ) algorithm with linear value function approximation. Least-Squares Temporal Differences (LSTD). Online version of LSTD. LSTD vs TD (stability, accuracy, and computational cost). Learning control policies. Multi-armed bandits. Notion of regret minimization. ε-greedy policies. Boltzmann policies. UCB algorithm. Combining lower and upper bounds for action pruning. UCT algorithm. Q-learning with function approximation. SARSA algorithm. Fitted Q-iteration. Deep Q-Networks.
Actor-critic architectures. Implementing a critic. Least-Squares Policy Iteration (LSPI) as an example of an actor-critic model. Parameterized policies. Policy gradient theorem. Finite-differences method. Likelihood ratio methods (REIN- FORCE algorithms). Natural Actor-Critic (NAC). RL in robotics by Reward-weighted Regression. Cost-Regularized Kernel Regression (CRKR). Policy Search for Motor Primitives in Robotics. Policy learning by Weighting Exploration with the Returns (PoWER) for Motor Primitives. Relative Entropy Policy Search (REPS) algorithms. Policy search as an inference problem. Monte-Carlo EM-based policy search. Policy Improvements by Path Integrals. Trust Region Policy Optimization (TRPO). Real Robot Applications with Model-free Policy Search. Learning Baseball with NAC. Learning Ball-in-the-Cup with PoWER. Learning Pan-Cake Flipping with PoWER/RWR. Learning Dart Throwing with CRKR. Learning Table Tennis with CRKR. Learning Tetherball with hierarchical REPS.
General loop in model-based RL. Model Learning in finite spaces, Dyna algorithm. Approaches to Dealing with Uncertain Models. Challenges. Models: Locally Weighted Bayesian Regression and Gaussian Process Regression. Policy Evaluation- of-Goodness and Search Using Scenarios (PEGASUS). Linearization. Moment Matching. Gradient-free Model-based Policy Updates. Sampling-based policy gradients. Analytic policy gradients. Probabilistic Inference for Learning Control (PILCO). Guided Policy Search (GPS). Robot Applications. Model-based policy search methods with stochastic inference were used for learning to hover helicopters. Combination of a parametric prior and GPs for modeling and learning to control an autonomous blimp. Pile stacking with PILCO. Inverted pendulum with PILCO. Control of Tensegrity robots with Guided Policy Search.
Overview of learning from demonstration. What is learning from demonstration (LfD)? Advantages of LfD vs Reinforce- ment Learning. Design and setup choices. Dealing with dataset limitations. Human-robot correspondance problem. Record mapping. Embodiment mapping. Demonstrations (teleoperation, shadowing, sensors on teacher, external observations). Statistical models for learning policies. Behavioral cloning vs inverse reinforcement learning (IRL). Motivation for inverse RL. Example applications in robotics (highway driving, aerial imagery based navigation, parking lot navigation, urban nav- igation, human path planning, human goal inference, quadruped locomotion). Max margin algorithms. Feature expectation matching methods (MaxEnt IRL and Relative Entropy IRL). Reward function as policy parameterization. Bayesian IRL. Nonlinear IRL. GP-IRL. Deep maximum entropy IRL. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization.
Mathematical models of grasping. Velocity kinematics. Grasp Matrix and Hand Jacobian. Contact Modeling. Planar Simplifications. Dynamics and Equilibrium. Grasp Classifications. Form Closure vs Force Closure. Example: Grasped Sphere. Analytic vs. Data-Driven Approaches. Offline generation of a grasp experience database for known objects. Learning from humans. Reinforcement learning approaches to grasping. Discriminative approaches for grasping familiar objects. Grasp synthesis by comparison. Generative models for grasp synthesis. Grasping unknown objects. Features for learning to grasp unknown objects. End-to-end deep learning approaches. Grasp affordances, anthropomorphic dexterous hands, Grasp planning and learning. Non-rigid object manipulation.
Object Recognition: a machine learning approach. Overview of training and testing processes for object recognition. Sim- plest features: pixel values, image gradients, color histograms. Spatial histograms. Haar-like features. Integral images. Gabor filter. GIST Features. Textons. Learning Textons from data. Histogram of Gradients (HOG). 3D Pose Estimation. Point Set Registration. Scale-Invariant Feature Transform (SIFT). Convolutional Neural Networks. Region-based Convolutional Networks (R-CNN). Image segmentation. Tracking. Interactive perception. Pixel-to-torque: an end-to-end deep learning approach to robot learning. list of topics for presentations Topic 1: Reinforcement Learning in Brain-Machine Interfaces Assigned to: Mingwen Dong - Jack DiGiovanna, Babak Mahmoudi, Jose Fortes, Jose C. Principe, Justin C. Sanchez.
. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4540104*Co-adaptive Brain–Machine Interface via Reinforcement Learning* - Eric A. Pohlmeyer, Babak Mahmoudi, Shijia Geng, Noeline W. Prins, Justin C. Sanchez.
. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087253**Using Reinforcement Learning to Provide Stable Brain-Machine Interface Control Despite Neural Input Reorganization**
Topic 2: Deep Reinforcement Learning Assigned to: Raghav Bhardwaj - Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller.
. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf*Playing Atari with Deep Reinforcement Learning* - Volodymyr Mnih et al.
. http://www.readcube.com/articles/10.1038%2Fnature14236**Human-level control through deep reinforcement learning** - Juhnyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh.
. http://web.eecs.umich.edu/~baveja/Papers/NIPS2015.pdf**Action-Conditional Video Prediction Using Deep Networks in ATARI Games** - Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, and Xiaoshi Wang.
. http://web.eecs.umich.edu/~baveja/Papers/UCTtoCNNsAtariGames-FinalVersion.pdf*Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning* - Guillaume Lample and Devendra Singh Chaplot.
. https://arxiv.org/pdf/1609.05521v1.pdf*Playing FPS Games with Deep Reinforcement Learning*
Topic 3: Active Vision for Object Search Assigned to: WEI TANG - Ksenia Shubina, John K. Tsotsos.
. http://www.sciencedirect.com/science/article/pii/S1077314210000378**Visual search for an object in a 3d environment using a mobile robot** - Shengyong Chen, Youfu Li, and Ngai Ming Kwok.
. http://ijr.sagepub.com/content/30/11/1343.full.pdf+html**Active vision in robotic systems: A survey of recent developments** - Lars Kunze, Michael Beetz, Manabu Saito, Haseru Azuma, Kei Okada, and Masayuki Inaba.
. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6224965**Searching objects in large-scale indoor environments: A decision-thereotic approach** - Gregory Kahn, Peter Sujan, Sachin Patil, Bopardikar Shaunak D., Julian Ryde, Ken Goldberg, and Pieter Abbeel.
. http://rll.berkeley.edu/~sachin/papers/Kahn-ICRA2015.pdf**Active exploration using trajectory optimization for robotic grasping in the presence of occlusions**
Topic 4: Model Learning Assigned to: TBD - Duy Nguyen-Tuong and Jan Peters.
. https://pdfs.semanticscholar.org/f3c2/0b3219beb62abb56ce833f5337ace119a7d3.pdf**Model Learning for Robot Control: A Survey** - Arjan Gijsberts and Giorgio Metta.
. http://www.tech.plym.ac.uk/socce/italk/publications/Gijsbert-icra2011.pdf**Incremental Learning of Robot Dynamics using Random Features** - E. Gribovskaya, S.M. Khansari-Zadeh, Aude Billard.
. http://lasa.epfl.ch/publications/uploadedFiles/IJRR_Motion_Learning_v2.pdf**Learning Nonlinear Multivariate Dynamics of Motion in Robotic Manipulators** - Duy Nguyen-Tuong, Jan Peters, Matthias Seeger, Bernhard Schoelkopf.
. http://infoscience.epfl.ch/record/175477/files/esann08_nguyenetal.pdf**Learning Inverse Dynamics: a Comparison**
Topic 5: Belief Space Planning Assigned to: Ankush Bhalotia - Robert Platt Jr., Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez.
. http://groups.csail.mit.edu/robotics-center/public_papers/Platt10.pdf**Belief space planning assuming maximum likelihood observations** - Samuel Prentice and Nicholas Roy.
. https://courses.cs.washington.edu/courses/cse571/12wi/slides/ijrr09-brm.pdf**The Belief Roadmap: Efficient Planning in Belief Space by Factoring the Covariance** - Robert Platt, Leslie Kaelbling, Tomas Lozano-Perez, and Russ Tedrake.
. http://www.isrr-2011.org/ISRR-2011/Program_files/Papers/Platt-ISRR-2011.pdf**Simultaneous Localization and Grasping as a Belief Space Control Problem** - Joelle Pineau and Geoff Gordon.
. http://www.cs.mcgill.ca/~jpineau/files/jpineau-isrr05.pdf**POMDP Planning for Robust Robot Control**
Topic 6: Learning to Grasp Assigned to: TBD - Jeannette Bohg, Antonio Morales, Tamim Asfour, Danica Kragic.
. http://arxiv.org/pdf/1309.2660.pdf**Data-Driven Grasp Synthesis - A Survey** - Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Chioma Osondu, Andrew Y. Ng.
. http://pr.cs.cornell.edu/grasping/ISER_LearningGrasp.pdf**Learning to Grasp Novel Objects using Vision** - Lerrel Pinto, Abhinav Gupta.
. http://arxiv.org/abs/1509.06825**Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours** - Ian Lenz, Honglak Lee and Ashutosh Saxena.
. http://www.roboticsproceedings.org/rss09/p12.pdf**Deep Learning for Detecting Robotic Grasps**
Topic 7: Learning to Manipulate Assigned to: RUI WANG - Yezhou Yang, Yi Li, Cornelia Fermüller, Yiannis Aloimonos.
. http://www.umiacs.umd.edu/~yzyang/paper/YouCookMani_CameraReady.pdf**Robot Learning Manipulation Action Plans by “Watching” Unconstrained Videos from the World Wide Web** - Hao Dang and Peter K. Allen.
. http://www.cs.columbia.edu/~dang/papers/dang_iros2010.pdf**Robot Learning of Everyday Object Manipulations via Human Demonstration** - Dov Katz, Yuri Pyuro and Oliver Brock.
. http://roboticsproceedings.org/rss04/p33.pdf**Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation** - Abdeslam Boularias, James Andrew Bagnell, and Anthony Stentz.
. http://www.ri.cmu.edu/pub_files/2015/1/AbdeslamAAAI2015.pdf**Learning to manipulate unknown objects in clutter by reinforcement**
Topic 8: Learning to Walk Assigned to: Shikhar Dev Gupta - D. Belter and P. Skrzypczynski.
. http://matwbn.icm.edu.pl/ksiazki/amc/amc20/amc2015.pdf**A biologically inspired approach to feasible gait learning for a hexapod robot** - M. Kalakrishnan, J. Buchli, P. Pastor, M. Mistry, and S. Schaal.
. http://www.cse.unr.edu/robotics/bekris/cs773_s12/sites/cse.unr.edu.robotics.bekris.cs773_s12/files/paper_09.pdf**Learning, planning, and control for quadruped locomotion over challenging terrain** - Jun Nakanishi, Jun Morimoto, Gen Endo, Gordon Cheng, Stefan Schaal, Mitsuo Kawato.
. http://www.sciencedirect.com/science/article/pii/S0921889004000399**Learning from demonstration and adaptation of biped locomotion** - J. Morimoto and C. G. Atkeson.
. http://www.cs.cmu.edu/~cga/papers/morimoto-ram.pdf**Learning Biped Locomotion**
Topic 9: Inverse Reinforcement Learning Assigned to: Poornima Suresh - Andrew Y. Ng and Stuart Russell.
. http://ai.stanford.edu/~ang/papers/icml00-irl.pdf**Algorithms for Inverse Reinforcement Learning** - Pieter Abbeel and Andrew Y. Ng.
. http://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf**Apprenticeship Learning via Inverse Reinforcement Learning** - Zico Kolter, Pieter Abbeel, Andrew Y. Ng.
. http://www.cs.stanford.edu/groups/littledog/pubs/kolter-nips08.pdf**Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion** - Nathan D. Ratliff, J. Andrew Bagnell, Martin A. Zinkevich.
. http://martin.zinkevich.org/publications/maximummarginplanning.pdf**Maximum Margin Planning**
Topic 10: Interactive Segmentation Assigned to: Jay Kalyanaraman - Karol Hausman, Ferenc Balint-Benczedi, Dejan Pangercic, Zoltan-Csaba Marton, Ryohei Ueda, Kei Okada, Michael Beetz.
. http://robotics.usc.edu/~hausmankarol/hausman13interactive.pdf**Tracking-based Interactive Segmentation of Textureless Objects** - Herke van Hoof, Oliver Kroemer and Jan Peters.
. http://www.ausy.informatik.tu-darmstadt.de/uploads/Publications/hoof-HUMANOIDS.pdf*Probabilistic Interactive Segmentation for Anthropomorphic Robots in Cluttered Environments* - Dov Katz, Moslem Kazemi, J. Andrew Bagnell and Anthony Stentz.
. https://www.ri.cmu.edu/pub_files/2013/5/ICRA13_1616_FI.pdf**Interactive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects** - Dov Katz and Andreas Orthey and Oliver Brock.
. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/iser2010_Katz_Orthey_Brock.pdf**Interactive Perception of Articulated Objects**
Topic 11: Translating Commands in Natural Language into Plans Assigned to: Hao Yan - Stefanie Tellex, Ross Knepper, Adrian Li, Daniela Rus, and Nicholas Roy.
. http://cs.brown.edu/~stefie10/publications/tellex14.pdf**Asking for Help Using Inverse Semantics** - Walter, M.R., Hemachandra, S., Homberg, B., Tellex, S., Teller, S.
. http://cs.brown.edu/~stefie10/publications/walter13.pdf**A Framework for Learning Semantic Maps from Grounded Natural Language Descriptions** - Stefanie Tellex, Pratiksha Thaker, Joshua Joseph, Nicholas Roy.
. http://cs.brown.edu/~stefie10/publications/tellex13.pdf**Learning Perceptually Grounded Word Meanings From Unaligned Parallel Data** - Stefanie Tellex, Pratiksha Thaker, Robin Deits, Dimitar Simeonov, Thomas Kollar, Nicholas Roy.
. http://cs.brown.edu/~stefie10/publications/tellex12.pdf**Toward Information Theoretic Human-Robot Dialog**
Topic 12: High-speed Robots Assigned to: TBD - Muelling, K.; Kober, J.; Kroemer, O.; Peters.
. http://www.ausy.informatik.tu-darmstadt.de/uploads/Publications/Muelling_IJRR_2013.pdf**Learning to Select and Generalize Striking Movements in Robot Table Tennis** - Mirrazavi Salehian, S. S., Khoramshahi, M. and Billard, A.
. http://lasa.epfl.ch/publications/uploadedFiles/TRO_final.pdf**A Dynamical System Approach for Catching Softly a Flying Object: Theory and Experiment** - Kober, Jens; Glisson, Matthew; Mistry, Michael (2012).
. http://www.disneyresearch.com/wp-content/uploads/ICHR12_0136_FI.pdf**Playing Catch and Juggling with a Humanoid Robot** - X. Chen, Y. Tian ; Q. Huang ; W. Zhang ; Z. Yu.
. https://www.researchgate.net/profile/Xiaopeng_Chen4/publication/251992166_Dynamic_model_based_ball_trajectory_prediction_for_a_robot_ping-pong_player/links/54c0aa400cf28a6324a32dee.pdf**Dynamic model based ball trajectory prediction for a robot ping-pong player**
Topic 13: Autonomous Driving Assigned to: ZE LIU - Paolo Falcone, Francesco Borrelli, Jahan Asgari, Hongtei Eric Tseng, and Davor Hrovat.
. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4162483**Predictive Active Steering Control for Autonomous Vehicle Systems** - Dean A. Pomerleau.
. https://www.ri.cmu.edu/pub_files/pub3/pomerleau_dean_1991_1/pomerleau_dean_1991_1.pdf**Efficient Training of Artificial Neural Networksfor Autonomous Navigation** - Sebastian Thrun et al.
. http://isl.ecst.csuchico.edu/DOCS/darpa2005/DARPA%202005%20Stanley.pdf**Stanley: the robot that won the DARPA grand challenge** - Chris Urmson et al.
. http://onlinelibrary.wiley.com/doi/10.1002/rob.20255/epdf**Autonomous driving in urban environments: Boss and the Urban Challenge**
Topic 14: Imitation Learning I Assigned to: TBD - Stefan Schaal.
http://web.media.mit.edu/~cynthiab/Readings/schaal-TICS1999.pdf**Is Imitation Learning the Route to Humanoid Robots?** - Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning.
. http://www.sciencedirect.com/science/article/pii/S0921889008001772**A survey of robot learning from demonstration** - Christopher G. Atkeson and Stefan Schaal.
. http://www.mcgovern-fagg.org/amy/courses/cs5973_fall2005/lfd.pdf**Robot Learning From Demonstration** - Sylvain Calinon, Florent D’halluin, Eric L. Sauser, Darwin G. Caldwell and Aude G. Billard.
. http://programming-by-demonstration.org/papers/Calinon-RAM2010.pdf**Learning and reproduction of gestures by imitation**
Topic 15: Imitation Learning II Assigned to: TBD - Monica N. Nicolescu and Maja J. Mataric.
. http://marvin.cs.uidaho.edu/Teaching/CS504/Papers/robotTrainingHumanInteraction.pdf**Task Learning Through Imitation and Human-Robot Interaction** - George Konidaris and Andrew Barto.
. http://www-anw.cs.umass.edu/pubs/2006/konidaris_b_TECH06.pdf**Building Portable Options: Skill Transfer in Reinforcement Learning** - Cynthia Breazeal and Brian Scassellati.
. http://scazlab.yale.edu/sites/default/files/files/TICS-02-Imitation.pdf*Robots that imitate humans* - Sonya Alexandrova, Maya Cakmak, Kaijen Hsiao, and Leila Takayama.
. http://www.roboticsproceedings.org/rss10/p48.pdf*Robot Programming by Demonstration with Interactive Action Visualizations*
Topic 16: Robot Reinforcement Learning I Assigned to: TBD - Petar Kormushev, Sylvain Calinon and Darwin G. Caldwell.
. http://kormushev.com/papers/Kormushev-IROS2010.pdf**Robot Motor Skill Coordination with EM-based Reinforcement Learning** - Jens Kober, J. Andrew Bagnell, and Jan Peters.
. http://www.ias.tu-darmstadt.de/uploads/Publications/Kober_IJRR_2013.pdf**Reinforcement learning in robotics: A survey** - Jens Kober and Jan Peters.
. http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/kober_MACH_2011.pdf**Policy search for motor primitives in robotics** - William D. Smart and Leslie Pack Kaelbling.
. http://people.csail.mit.edu/lpk/papers/2002/SmartKaelbling-ICRA2002.pdf**Effective Reinforcement Learning for Mobile Robots**
Topic 17: Robot Reinforcement Learning II Assigned to: Timothy Yong - Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel.
. http://arxiv.org/pdf/1509.06113.pdf**Deep Spatial Autoencoders for Visuomotor Learning** - Sergey Levine, Nolan Wagener, Pieter Abbeel.
. http://rll.berkeley.edu/icra2015gps/robotgps.pdf**Learning Contact-Rich Manipulation Skills with Guided Policy Search** - Niklas Wahlström, Thomas B. Schön, Marc P. Deisenroth.
. http://www.doc.ic.ac.uk/~mpd37/publications/sysid-2015-hmd.pdf**Learning Deep Dynamical Models From Image Pixels** - Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel.
. http://arxiv.org/pdf/1504.00702v4.pdf**End-to-End Training of Deep Visuomotor Policies**
Topic 18: Human-Robot Interaction Assigned to: Zhe Chang - David Feil-Seifer and Maja J Mataric.
. http://robotics.usc.edu/publications/media/uploads/pubs/585.pdf**Human-Robot Interaction** - Michael A. Goodrich and Alan C. Schultz.
. http://liris.cnrs.fr/alain.mille/survey_robotique.pdf*Human–Robot Interaction: A Survey* - Stefanos Nikolaidis, Ramya Ramakrishnan, Keren Gu, Julie Shah.
. http://arxiv.org/pdf/1405.6341.pdf**Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks** - Nikolaidis, Stefanos; Shah, Julie.
. http://www.stefanosnikolaidis.net/papers/HRI2013_Nikol_Shah.pdf**Human-Robot Cross-Training: Computational Formulation, Modeling and Evaluation of a Human Team Training Strategy**
Topic 19: Learning-based Control for Quadcoptors Assigned to: Merrill Edmonds - J. Hwangbo, I. Sa, R. Siegwart, and M. Hutter, “
,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 2096–2103, Oct. 2017.*Control of a Quadrotor with Reinforcement Learning* https://www.youtube.com/watch?v=T0A9voXzhng - M. Hehn and R. D’Andrea, “
,” Mechatronics, vol. 24, no. 8, pp. 954–965, Dec. 2014.*A frequency domain iterative learning algorithm for high-performance, periodic quadrocopter maneuvers* https://www.youtube.com/watch?v=sWilGsWQ1jo - P. Bouffard, A. Aswani, and C. Tomlin, “
,” in 2012 IEEE International Conference on Robotics and Automation, 2012, pp. 279–284.*Learning-based model predictive control on a quadrotor: Onboard implementation and experimental results* https://www.youtube.com/watch?v=dL_ZFSvLXlU - Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar,
Andreas Wendel, Debadeepta Dey, J. Andrew Bagnell and Martial Hebert.
”. https://www.youtube.com/watch?v=hNsP6-K3Hn4 https://www.ri.cmu.edu/pub_files/2013/3/icra_camera_ready.pdf*“Learning Monocular Reactive UAV Control in Cluttered Natural Environments*
Presentations date: TBD on December, 2017 |