## 计算机代写|强化学习代写Reinforcement learning代考|Sequential Decision Problems

Learning to operate in the world is a high-level goal; we can be more specific. Reinforcement learning is about the agent’s behavior. Reinforcement learning can find solutions for sequential decision problems, or optimal control problems, as they are known in engineering. There are many situations in the real world where, in order to reach a goal, a sequence of decisions must be made. Whether it is baking a cake, building a house, or playing a card game; a sequence of decisions has to be made. Reinforcement learning provides efficient ways to learn solutions to sequential decision problems.

Many real-world problems can be modeled as a sequence of decisions [33]. For example, in autonomous driving, an agent is faced with questions of speed control, finding drivable areas, and, most importantly, avoiding collisions. In healthcare, treatment plans contain many sequential decisions, and factoring the effects of delayed treatment can be studied. In customer centers, natural language processing can help improve chatbot dialogue, question answering, and even machine translation. In marketing and communication, recommender systems recommend news, personalize suggestions, deliver notifications to user, or otherwise optimize the product experience. In trading and finance, systems decide to hold, buy, or sell financial titles, in order to optimize future reward. In politics and governance, the effects of policies can be simulated as a sequence of decisions before they are implemented. In mathematics and entertainment, playing board games, card games, and strategy games consists of a sequence of decisions. In computational creativity, making a painting requires a sequence of esthetic decisions. In industrial robotics and engineering, the grasping of items and the manipulation of materials consist of a sequence of decisions. In chemical manufacturing, the optimization of production processes consists of many decision steps that influence the yield and quality of the product. Finally, in energy grids, the efficient and safe distribution of energy can be modeled as a sequential decision problem.

In all these situations, we must make a sequence of decisions. In all these situations, taking the wrong decision can be very costly.

The algorithmic research on sequential decision making has focused on two types of applications: (1) robotic problems and (2) games. Let us have a closer look at these two domains, starting with robotics.

## 计算机代写|强化学习代写Reinforcement learning代考|Robotics

In principle, all actions that a robot should take can be pre-programmed step by step by a programmer in meticulous detail. In highly controlled environments, such as a welding robot in a car factory, this can conceivably work, although any small change or any new task requires reprogramming the robot.

It is surprisingly hard to manually program a robot to perform a complex task. Humans are not aware of their own operational knowledge, such as what “voltages” we put on which muscles when we pick up a cup. It is much easier to define a desired goal state, and let the system find the complicated solution by itself. Furthermore, in environments that are only slightly challenging, when the robot must be able to respond more flexibly to different conditions, an adaptive program is needed.

It will be no surprise that the application area of robotics is an important driver for machine learning research, and robotics researchers turned early on to finding methods by which the robots could teach themselves certain behavior.

The literature on robotics experiments is varied and rich. A robot can teach itself how to navigate a maze, how to perform manipulation tasks, and how to learn locomotion tasks.

Research into adaptive robotics has made quite some progress. For example, one of the recent achievements involves flipping pancakes [29] and flying an aerobatic model helicopter [1,2]; see Figs. 1.1 and 1.2. Frequently, learning tasks are combined with computer vision, where a robot has to learn by visually interpreting the consequences of its own actions.

