Modelling Human-like Behavior through Reward-based Approach in a First-Person Shooter Game
We present two examples of how human-like behavior can be implemented in a model of computer player to improve its characteristics and decision-making patterns in video game. At first, we describe a reinforcement learning model, which helps to choose the best weapon depending on reward values obtained from shooting combat situations. Secondly, we consider an obstacle avoiding path planning adapted to the tactical visibility measure. We describe an implementation of a smoothing path model, which allows the use of penalties (negative rewards) for walking through ``bad'' tactical positions. We also study algorithms of path finding such as improved I-ARA* search algorithm for dynamic graph by copying human discrete decision-making model of reconsidering goals similar to Page-Rank algorithm. All the approaches demonstrate how human behavior can be modeled in applications with significant perception of intellectual agent actions.
In this paper we outline the approach of solving special type of navigation tasks for robotic systems, when a coalition of robots (agents) acts in the 2D environment, which can be modified by the actions, and share the same goal location. The latter is originally unreachable for some members of the coalition, but the common task still can be accomplished as the agents can assist each other (e.g. by modifying the environment). We call such tasks smart relocation tasks (as the can not be solved by pure path planning methods) and study spatial and behavior interaction of robots while solving them. We use cognitive approach and introduce semiotic knowledge representation — sign world model which underlines behavioral planning methodology. Planning is viewed as a recursive search process in the hierarchical state-space induced by sings with path planning signs reside on the lowest level. Reaching this level triggers path planning which is accomplished by state of the art grid-based planners focused on producing smooth paths (e.g. LIAN) and thus indirectly guarantying feasibility of that paths against agent’s dynamic constraints.
Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system.
Adaptive and Learning Agents Workshop at International Joint Conference on Autonomous Agents and Multiagent Systems
Path planning is typically considered in Artificial Intelligence as a graph searching problem and R* is state-of-the-art algorithm tailored to solve it. The algorithm decomposes given path finding task into the series of subtasks each of which can be easily (in computational sense) solved by well-known methods (such as A*). Parameterized random choice is used to perform the decomposition and as a result R* performance largely depends on the choice of its input parameters. In our work we formulate a range of assumptions concerning possible upper and lower bounds of R* parameters, their interdependency and their influence on R* performance. Then we evaluate these assumptions by running a large number of experiments. As a result we formulate a set of heuristic rules which can be used to initialize the values of R* parameters in a way that leads to algorithm’s best performance.
Humans often change their beliefs or behavior due to the behavior or opinions of others. This study explored, with the use of human event-related potentials (ERPs), whether social conformity is based on a general performance-monitoring mechanism. We tested the hypothesis that conflicts with a normative group opinion evoke a feedback-related negativity (FRN) often associated with performance monitoring and subsequent adjustment of behavior. The experimental results show that individual judgments of facial attractiveness were adjusted in line with a normative group opinion. A mismatch between individual and group opinions triggered a frontocentral negative deflection with the maximum at 200 ms, similar to FRN. Overall, a conflict with a normative group opinion triggered a cascade of neuronal responses: from an earlier FRN response reflecting a conflict with the normative opinion to a later ERP component (peaking at 380 ms) reflecting a conforming behavioral adjustment. These results add to the growing literature on neuronal mechanisms of social influence by disentangling the conflict-monitoring signal in response to the perceived violation of social norms and the neural signal of a conforming behavioral adjustment.
Humans can adapt their behavior by learning from the consequences of their own actions or by observing others. Gradual active learning of action-outcome contingencies is accompanied by a shift from feedback- to response-based performance monitoring. This shift is reflected by complementary learning-related changes of two ACC-driven ERP components, the feedback-related negativity (FRN) and the error-related negativity (ERN), which have both been suggested to signal events "worse than expected," that is, a negative prediction error. Although recent research has identified comparable components for observed behavior and outcomes (observational ERN and FRN), it is as yet unknown, whether these components are similarly modulated by prediction errors and thus also reflect behavioral adaptation. In this study, two groups of 15 participants learned action-outcome contingencies either actively or by observation. In active learners, FRN amplitude for negative feedback decreased and ERN amplitude in response to erroneous actions increased with learning, whereas observational ERN and FRN in observational learners did not exhibit learning-related changes. Learning performance, assessed in test trials without feedback, was comparable between groups, as was the ERN following actively performed errors during test trials. In summary, the results show that action-outcome associations can be learned similarly well actively and by observation. The mechanisms involved appear to differ, with the FRN in active learning reflecting the integration of information about own actions and the accompanying outcomes.