首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background  

Delays between actions and their outcomes severely hinder reinforcement learning systems, but little is known of the neural mechanism by which animals overcome this problem and bridge such delays. The nucleus accumbens core (AcbC), part of the ventral striatum, is required for normal preference for a large, delayed reward over a small, immediate reward (self-controlled choice) in rats, but the reason for this is unclear. We investigated the role of the AcbC in learning a free-operant instrumental response using delayed reinforcement, performance of a previously-learned response for delayed reinforcement, and assessment of the relative magnitudes of two different rewards.  相似文献   

2.

Background  

Animals must frequently make choices between alternative courses of action, seeking to maximize the benefit obtained. They must therefore evaluate the magnitude and the likelihood of the available outcomes. Little is known of the neural basis of this process, or what might predispose individuals to be overly conservative or to take risks excessively (avoiding or preferring uncertainty, respectively). The nucleus accumbens core (AcbC) is known to contribute to rats' ability to choose large, delayed rewards over small, immediate rewards; AcbC lesions cause impulsive choice and an impairment in learning with delayed reinforcement. However, it is not known how the AcbC contributes to choice involving probabilistic reinforcement, such as between a large, uncertain reward and a small, certain reward. We examined the effects of excitotoxic lesions of the AcbC on probabilistic choice in rats.  相似文献   

3.
Redundant manipulators are widely used in fields such as human-robot collaboration due to their good flexibility. To ensure efficiency and safety, the manipulator is required to avoid obstacles while tracking a desired trajectory in many tasks. Conventional methods for obstacle avoidance of redundant manipulators may encounter joint singularity or exceed joint position limits while tracking the desired trajectory. By integrating deep reinforcement learning into the gradient projection method, a reactive obstacle avoidance method for redundant manipulators is proposed. We establish a general DRL framework for obstacle avoidance, and then a reinforcement learning agent is applied to learn motion in the null space of the redundant manipulator Jacobian matrix. The reward function of reinforcement learning is redesigned to handle multiple constraints automatically. Specifically, the manipulability index is introduced into the reward function, and thus the manipulator can maintain high manipulability to avoid joint singularity while executing tasks. To show the effectiveness of the proposed method, the simulation of 4 degrees of planar manipulator freedom is given. Compared with the gradient projection method, the proposed method outperforms in a success rate of obstacles avoidance, average manipulability, and time efficiency.  相似文献   

4.
We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary distribution. Based on the fixed-point formulation of quasi-stationary distribution, we minimize the KL-divergence of two Markovian path distributions induced by candidate distribution and true target distribution. To solve this challenging minimization problem by gradient descent, we apply a reinforcement learning technique by introducing the reward and value functions. We derive the corresponding policy gradient theorem and design an actor-critic algorithm to learn the optimal solution and the value function. The numerical examples of finite state Markov chain are tested to demonstrate the new method.  相似文献   

5.

Background  

The dopamine transporter (DAT) plays a critical role in regulating dopamine neurotransmission. Variations in DAT or changes in basal dopaminergic tone have been shown to alter behavior and drug responses. DAT is one of the three known high affinity targets for cocaine, a powerful psychostimulant that produces reward and stimulates locomotor activity in humans and animals. We have shown that cocaine no longer produces reward in knock-in mice with a cocaine insensitive mutant DAT (DAT-CI), suggesting that cocaine inhibition of DAT is critical for its rewarding effect. However, in DAT-CI mice, the mutant DAT has significantly reduced uptake activity resulting in elevated basal dopaminergic tone, which might cause adaptive changes that alter responses to cocaine. Therefore, the objective of this study is to determine how elevated dopaminergic tone affects how mice respond to cocaine.  相似文献   

6.
The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.  相似文献   

7.

Background  

Impulsivity is defined as intolerance/aversion to waiting for reward. In intolerance-to-delay (ID) protocols, animals must choose between small/soon (SS) versus large/late (LL) rewards. In the probabilistic discount (PD) protocols, animals are faced with choice between small/sure (SS) versus large/luck-linked (LLL) rewards. It has been suggested that PD protocols also measure impulsivity, however, a clear dissociation has been reported between delay and probability discounting.  相似文献   

8.

Background  

We investigated how temporal context affects the learning of arbitrary visuo-motor associations. Human observers viewed highly distinguishable, fractal objects and learned to choose for each object the one motor response (of four) that was rewarded. Some objects were consistently preceded by specific other objects, while other objects lacked this task-irrelevant but predictive context.  相似文献   

9.
ZnO:Al (AZO) thin films were deposited on glass substrates by RF magnetron sputtering at room temperature and post-annealed in rapid thermal annealing (RTA) system. The effect of post-annealing temperature on the structural, optical, and electrical properties was investigated. As the post-annealing temperature increased, electrical conductivity is deteriorated due to a decrease in the mobility or carrier concentration, gradually. According to X-ray photoelectron spectroscopy (XPS) analysis, the behavior of mobility and carrier concentration is attributed to increase the O2 absorption on film surface, which act as rising the barrier potential at the low post-annealing temperature (200 °C) and reducing the density of donor-like defects at the high post-annealing temperature (400 °C). In case of post-annealing, the minimization of O2 absorption is a very important factor to obtain better electrical properties.  相似文献   

10.

Background  

Dopamine modulation of neuronal signaling in the frontal cortex, midbrain, and striatum is essential for processing and integrating diverse external sensory stimuli and attaching salience to environmental cues that signal causal relationships, thereby guiding goal-directed, adaptable behaviors. At the cellular level, dopamine signaling is mediated through D1-like or D2-like receptors. Although a role for D1-like receptors in a variety of goal-directed behaviors has been identified, an explicit involvement of D2 receptors has not been clearly established. To determine whether dopamine D2 receptor-mediated signaling contributes to associative and reversal learning, we compared C57Bl/6J mice that completely lack functional dopamine D2 receptors to wild-type mice with respect to their ability to attach appropriate salience to external stimuli (stimulus discrimination) and disengage from inappropriate behavioral strategies when reinforcement contingencies change (e.g. reversal learning).  相似文献   

11.
Three experiments were carried out that employed low-frequency tone complexes with interaural delays that varied across the frequency domain. In the first experiment, threshold interaural delays were measured for three-tone complexes for which one, two, or all three components were delayed. The center frequency was 750 Hz and the frequency spacing (delta f) between components was 20, 50, 100, 250, or 450 Hz. For all delta f's, the presence of two diotic components elevated the threshold interaural delays obtained for the third component relative to that obtained for a pure tone of the same frequency. In the second experiment, observers made left-right judgments regarding the direction of movement of signals for which two components were delayed by 25 microseconds to the left ear during one interval and to the right ear during the other interval, while a third component of a variable time difference was delayed to the opposite side as the tone pair. Subjects reported single intracranial images during each interval, and the data showed that interaural delays of one component to one ear could be offset by interaural delays of the other two components to the other ear. In the final experiment, threshold interaural delays were measured for five-tone complexes in which one, two, three, four, or five components were delayed. The center frequency was 750 Hz and delta f was fixed at 100 Hz. Thresholds decreased in a linear fashion as the number of delayed components increased, falling by about a factor of 5 as the number of delayed components went from one to five. These results are consistent with spectrally synthetic binaural processing, with the lateral position of intracranial images determined by a combination of interaural information across the spectrum. These effects could be brought about by a linear combination of the outputs of frequency-specific cross-correlation networks or by a wideband cross correlation of the signals at the two ears.  相似文献   

12.
This paper investigates the synchronization of time delayed complex dynamical networks with periodical on-off coupling. Both the theoretical and numerical results show that, in spite of time delays and on-off coupling, two networks may synchronize if the coupling strength and the on-off rate are large enough. It is shown that, for undirected and strongly connected networks, the upper bound of time delays for synchronization is a decreasing function of the absolute value of the minimum eigenvalue of the adjacency matrix. The theoretical analysis confirms the numerical results and provides a better understanding of the influence of time delays and on-off coupling on the synchronization transition. The influence of random delays on the synchronization is also discussed.  相似文献   

13.
Experimental results related to the influence of time delayed pulses for ablation efficiency with short multi pulses (pulse duration of 5 ps) are reported. A significant improvement of the micro structuring quality at relatively high fluence regime in metals is obtained. Less removed or recast matter is observed and the processed surface appears to be smoother with better roughness. Ablation depths and burr heights are compared for single pulses and double pulses in steel, Al and Cu as a function of scans number. Best results are obtained for weak time delays, typically less than 1 ps. PACS 79.20.Ds; 42.62.Cf; 81.65.Cf  相似文献   

14.
Software Defined Network (SDN) has been used in many organizations due to its efficiency in transmission. Machine learning techniques have been applied in SDN to improve its efficiency in resource scheduling. The existing models in SDN have limitations of overfitting, local optima trap and lower efficiency in path selection. This study applied Balancing Module (BM)-Spider Monkey Optimization (SMO)-Crow Search Algorithm (CSA) for multi path selection in SDN to improve its efficiency. The balancing module applies Gaussian distribution to balance between exploration and exploitation in the multi-path selection process. The Balancing module helps to escape local optima trap and increases the convergence rate. Deep Reinforcement learning is applied for resource scheduling in SDN. The Deep reinforcement learning technique uses the reward function to improve the learning performance, and the BM-SMO-CSA technique has 30 J energy consumption, where the existing models: DRL has 40 J energy consumption, and Graph-ACO has 62 J energy consumption.  相似文献   

15.
Because environment pollutants have a strong impact on ecosystems, including human health, methods of their determination and mitigation have received special attention in recent years. Taking advantage of the wide range of data that can be obtained by synchrotron radiation X‐ray fluorescence spectroscopy (SRXRF) in the field of environmental sciences, different instrumental setups were used to study the biological fates of toxic elements in volcanic environments. The elemental composition of plants, algae, and bacteria in Copahue and Domuyo volcanoes from Argentinean Patagonia was determined by SRXRF and the volcanic elements Ti, Fe, and Zn were abundant in these organisms. Interestingly, a high As concentration was found in cyanobacteria (26.2 μg/g) living in As contaminated stream (250 μg/ml). Because arsenic is toxic and human carcinogen, element‐retention capacity, element‐protein associations, and arsenic metabolism in this As resistant organism were analyzed by SRXRF. A high capacity (100–95%) of Ti > Fe > Cr > Sr > Ni > Cu > Mn > Zn > As retention was found after aqueous/alcoholic extraction assisted by ultrasonication. The cyanobacterial proteins were separated by SDS‐PAGE, electro‐transferred to nitrocellulose, and mapped by SRXRF. Defined protein bands containing Ca, Ti, Mn, Fe, and/or Zn were observed. Their ability to metabolize arsenic was revealed by combining SRXRF and X‐ray absorption near edge spectroscopy and Dimethylarsenic was found. Based on results, we speculate that these cyanobacteria could be interesting candidates for water treatment. Finally, we conclude that SRXRF is a valuable tool to study the biological cycle of environmental pollutants, including their accumulation, molecular targets, and metabolism. The SRXRF may also assist in remediation researches.  相似文献   

16.
Excess conductance fluctuations with peculiar temperature dependence from 1.4 to 250 K were observed in curved nanographite sheets with electrode gap lengths of 300 and 450 nm, whereas the conductance fluctuation is greatly suppressed above 4.2 K when the electrode gap lengths increase to 800 and 1000 nm. The former is discussed in the context of the presence of a small energy bandgap in the nanographite sheets, while the latter is attributed to the crossover from the coherent transport to diffusive transport regime.  相似文献   

17.
This review brings together two fundamental, but unreconciled, aspects of human language: embodiment and compositionality.One major scientific advance in recent decades has been Embodiment – the realization that scientific understanding of mind and language entails detailed modeling of the human brain and how it evolved to control a physical body in a social community.The ability to learn and use language is one of the most characteristically human traits. Many animals signal, but only people can express and understand an essentially unbounded range of messages. The technical term for the ability of human language to support all these messages from a few dozen alphabetic symbols is Compositionality.Rigor is essential for the advancement of any science, but there has been essentially no overlap between efforts to formalize language compositionality and the manifest embodiment of thought. Recent developments suggest that it is feasible to formalize the compositionality of embodied language, but that this requires a focus on conceptual composition and better understanding of contextual best-fit.  相似文献   

18.
刘莹莹  潘炜  江宁  项水英 《光子学报》2012,41(9):1023-1027
针对双延时和三延时互耦合半导体激光器系统,研究了互耦合延时和互耦合强度对实时混沌同步质量的影响,提出了双延时互耦合系统中可将其中一个互耦合延时看作反馈延时的思想,揭示了多延时互耦合半导体激光器系统实时混沌同步条件和规律.研究结果表明,多延时互耦合系统中,某两条双向链路的互耦合延时比值为2,是实现高品质实时混沌同步的基本条件;增大互耦合强度,可以改善实时混沌同步品质,且在较低的等效耦合强度条件下,双延时互耦合系统较三延时互耦合系统更易于实现良好的实时混沌同步.  相似文献   

19.
Mammals have evolved the ability to acquire auditory discriminations. The characteristics of this discriminative ability presumably fit the natural conditions under which discriminations are normally acquired. The purpose of this paper is to review experiments which were directed at showing that auditory discriminations are most rapidly acquired when natural features are incorporated into the experiments. The experiments were also directed at discovering the underlying characteristics of the discriminative ability. When animals were trained to discriminate the position of a sound source in which natural features were incorporated into the experiment, the discrimination was acquired in one trial. Manipulation of the natural features suggested that one trial acquisition depends upon the following. (1) Stimulus novelty; the effect of reinforcement is stronger in the presence of novel than familiar stimuli. (2) Specific behavioral effect of reinforcement; the effect of reinforcing a response in the presence of a novel auditory stimulus is to increase the strength of approaching and manipulating the sound source.  相似文献   

20.
The precedence effect (PE) describes the ability to localize a direct, leading sound correctly when its delayed copy (lag) is present, though not separately audible. The relative contribution of binaural cues in the temporal fine structure (TFS) of lead-lag signals was compared to that of interaural level differences (ILDs) and interaural time differences (ITDs) carried in the envelope. In a localization dominance paradigm participants indicated the spatial location of lead-lag stimuli processed with a binaural noise-band vocoder whose noise carriers introduced random TFS. The PE appeared for noise bursts of 10 ms duration, indicating dominance of envelope information. However, for three test words the PE often failed even at short lead-lag delays, producing two images, one toward the lead and one toward the lag. When interaural correlation in the carrier was increased, the images appeared more centered, but often remained split. Although previous studies suggest dominance of TFS cues, no image is lateralized in accord with the ITD in the TFS. An interpretation in the context of auditory scene analysis is proposed: By replacing the TFS with that of noise the auditory system loses the ability to fuse lead and lag into one object, and thus to show the PE.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号