期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Digitally stimulated Raman passage by deep reinforcement learning

《Physics letters. A》2020,384(14):126266

Preparing an arbitrary preselected coherent superposition of quantum states finds widespread application in physics, including initialization of trapped ion and superconductor qubits in quantum computers. Both fractional and integer stimulated Raman adiabatic passage involve smooth Gaussian pulses, designed to grant adiabaticity, so to keep the system in an eigenstate constituted only of the initial and final states. We explore an alternative method for discovering appropriate pulse sequences based on deep reinforcement learning algorithms and by imposing that the control laser can be only either on or off instead of being continuously amplitude-modulated. Despite the adiabatic condition is violated, we obtain fast and flexible solutions for both integer and fractional population transfer. Such method, consisting of a Digital Stimulated Raman Passage (D-STIRaP), proves to be particularly effective when the system is affected by dephasing therefore providing an alternative path towards control of noisy quantum states, like trapped ions and superconductor qubits. 相似文献

2.

Tuning pianos using reinforcement learning

Matthew Millard 《Applied Acoustics》2007,68(5):576-593

The tuning system of a piano has remained relatively unchanged since the instrument’s inception. A piano’s tuning system has been designed to be both inexpensive to manufacture and to preserve the tension and thus pitch of each string over long periods of time. This tuning system requires such a high degree of skill to manipulate that only trained professionals are able to tune pianos. This paper presents a novel adjustable impact tuning hammer and a reinforcement learning control system that may allow piano owners to tune their own pianos in the future. 相似文献

3.

Optimal control strategy for COVID-19 concerning both life and economy based on deep reinforcement learning

下载免费PDF全文

Wei Deng 《中国物理 B》2021,30(12):120203-120203

At present, the global COVID-19 is still severe. More and more countries have experienced second or even third outbreaks. The epidemic is far from over until the vaccine is successfully developed and put on the market on a large scale. Inappropriate epidemic control strategies may bring catastrophic consequences. It is essential to maximize the epidemic restraining and to mitigate economic damage. However, the study on the optimal control strategy concerning both sides is rare, and no optimal model has been built. In this paper, the Susceptible-Infectious-Hospitalized-Recovered (SIHR) compartment model is expanded to simulate the epidemic's spread concerning isolation rate. An economic model affected by epidemic isolation measures is established. The effective reproduction number and the eigenvalues at the equilibrium point are introduced as the indicators of controllability and stability of the model and verified the effectiveness of the SIHR model. Based on the Deep Q Network (DQN), one of the deep reinforcement learning (RL) methods, the blocking policy is studied to maximize the economic output under the premise of controlling the number of infections in different stages. The epidemic control strategies given by deep RL under different learning strategies are compared for different reward coefficients. The study demonstrates that optimal policies may differ in various countries depending on disease spread and anti-economic risk ability. The results show that the more economical strategy, the less economic loss in the short term, which can save economically fragile countries from economic crises. In the second or third outbreak stage, the earlier the government adopts the control strategy, the smaller the economic loss. We recommend the method of deep RL to specify a policy which can control the epidemic while making quarantine economically viable. 相似文献

4.

Optimal chaos control through reinforcement learning

Gadaleta S Dangelmayr G 《Chaos (Woodbury, N.Y.)》1999,9(3):775-788

A general purpose chaos control algorithm based on reinforcement learning is introduced and applied to the stabilization of unstable periodic orbits in various chaotic systems and to the targeting problem. The algorithm does not require any information about the dynamical system nor about the location of periodic orbits. Numerical tests demonstrate good and fast performance under noisy and nonstationary conditions. (c) 1999 American Institute of Physics. 相似文献

5.

Removing additive noise via neuro-fuzzy-based reinforcement learning

Lin CS Kyriakakis C 《The Journal of the Acoustical Society of America》2008,124(2):1026-1037

In this paper, a systematic treatment for developing a noise removal system based on the fundamental principle of reinforcement learning and fuzzy cerebellar model articulation controller (FCMAC) is presented. The proposed system improves its performance over time through two mechanisms. First, the modified stochastic real-valued algorithm, learning from its own mistakes via the reinforcement signal and reinforcing its action to improve future performance, is used for searching the optimal noise spectrum for the overall training system. Second, system states associated with the positive reinforcement are memorized by FCMAC-based neurons, where, in the future, similar states will share the experiences already stored there and then lead the action to a more positive situation. In this work, FCMAC's intrinsically poor approximation of rapidly varying functions is solved by taking the complex semicepstrum. In addition, the FCMAC provides an improvement in accuracy of function approximation without losing the property of generalization, which makes the high fidelity digital signal processing possible. 相似文献

6.

Wave scattering by a strongly elongated irregularity

M. V. Tinin B.-Ch. Kim 《Radiophysics and Quantum Electronics》2004,47(12):947-954

We consider the problem of single scattering by a strongly elongated irregularity which is located in the near zone with respect to the coordinate along its major axis and in the far zone with respect to the transverse coordinates. Expressions describing the scattered-wave field are obtained by applying the stationary-phase method for integration over the longitudinal coordinate in the formula for single scattering. We present some results of modeling of the scattered-wave intensity based on the obtained formula as applied to meteor-burst propagation of radio waves.Translated from Izvestiya Vysshikh Uchebnykh Zavedenii, Radiofizika, Vol. 47, No. 12, pp. 1057–1065, December, 2004. 相似文献

7.

High-frequency scattering by a strongly elongated body

I. V. Andronov 《Acoustical Physics》2013,59(4):369-372

The problem of scattering of a high-frequency plane wave incident at a small angle to the axis of a strongly elongated spheroid is studied. The asymptotic formula for the scattering cross section is obtained in the case of ideal boundary conditions. The influence of the elongation rate of the spheroid and the angle of incidence of the plane wave is analyzed based on computations. 相似文献

8.

Control of chaos in Frenkel–Kontorova model using reinforcement learning

下载免费PDF全文

《中国物理 B》2021,30(5):50503-050503

It is shown that we can control spatiotemporal chaos in the Frenkel–Kontorova(FK) model by a model-free control method based on reinforcement learning. The method uses Q-learning to find optimal control strategies based on the reward feedback from the environment that maximizes its performance. The optimal control strategies are recorded in a Q-table and then employed to implement controllers. The advantage of the method is that it does not require an explicit knowledge of the system, target states, and unstable periodic orbits. All that we need is the parameters that we are trying to control and an unknown simulation model that represents the interactive environment. To control the FK model, we employ the perturbation policy on two different kinds of parameters, i.e., the pendulum lengths and the phase angles. We show that both of the two perturbation techniques, i.e., changing the lengths and changing their phase angles, can suppress chaos in the system and make it create the periodic patterns. The form of patterns depends on the initial values of the angular displacements and velocities. In particular, we show that the pinning control strategy, which only changes a small number of lengths or phase angles, can be put into effect. 相似文献

9.

Diffraction by a strongly elongated body of revolution

I. V. Andronov 《Acoustical Physics》2011,57(2):121-126

A high-frequency acoustic field in a penumbra domain on the surface of a strongly elongated body is investigated. A new asymptotic formula expressing the field in the form of inverse Mellin transform of an expression containing Whittaker functions is derived. Presented numerical results show that the increase of the transverse curvature of the body increases field attenuation on a hard surface and decreases it on an acoustically soft surface. This effect agrees to previous results. 相似文献

10.

Deep reinforcement learning based relaying for buffer-aided cooperative communications

《Physical Communication》2023

The advances in deep reinforcement learning (DRL) have shown a great potential in solving physical layer-related communication problems. This paper investigates DRL for the relay selection in buffer-aided (BA) cooperative networks. The capability of DRL in handling highly-dimensional problems with large state and action spaces paves the way for exploring additional degrees-of-freedom by relaxing the restrictive assumptions around which conventional cooperative networks are usually designed. This direction is examined in our work by advising and analyzing advanced DRL-based BA relaying strategies that can cope with a variety of setups in multifaceted cooperative networks. In particular, we advise novel BA relaying strategies for both parallel-relaying and serial-relaying systems. For parallel-relaying systems, we investigate the added value of merging packets at the relays and of activating the inter-relay links. For serial-relaying (multi-hop) systems, we explore the improvements that can be reaped by merging packets and by allowing for the simultaneous activation of sufficiently-spaced hops. Simulation results demonstrate the capability of DRL-based BA relaying in achieving substantial improvements in the network throughput while the adequate design of the reward/punishment in the learning process ensures fast convergence speeds. 相似文献

11.

Calculation of diffraction by strongly elongated bodies of revolution

I. V. Andronov 《Acoustical Physics》2012,58(1):22-29

An approach to construct approximations for the diffracted field on the surface of strongly elongated bodies is suggested. The approach is based on high-frequency asymptotic decomposition which is constructed under the supposition that transverse curvature of the surface is asymptotically large. Leading order terms of asymptotic decompositions for the fields diffracted by spheroid, one-sheeted and two-sheeted hyperboloids and on narrow cone are derived. 相似文献

12.

Exploring individuals'effective preventive measures against epidemics through reinforcement learning

下载免费PDF全文

Ya-Peng Cui Shun-Jiang Ni Shi-Fei Shen 《中国物理 B》2021,(4):729-737

Individuals'preventive measures,as an effective way to suppress epidemic transmission and to protect themselves from infection,have attracted much academic conc... 相似文献

13.

Crystallization by settling in suspensions of hard spheres

Ackerson BJ Paulin SE Johnson B van Megen W Underwood S 《Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics》1999,59(6):6903-6913

相似文献

14.

Convergence of reinforcement learning to Nash equilibrium: A search-market experiment

《Physica A》2005,355(1):119-130

Since the introduction of Reinforcement Learning (RL) in Game Theory, a growing literature is concerned with the theoretical convergence of RL-driven outcomes towards Nash equilibrium. In this paper, we apply this issue to a search-theoretic framework (posted-price market) where sellers are confronted with a population of imperfectly informed buyers and take one decision per period (posted prices) with no direct interactions between sellers. We focus on three different scenarios with varying buyers’ characteristics. For each of these scenarios, we quantitatively and qualitatively test whether the learned variable (price strategy) converges to the Nash equilibrium. We also study the impact of the temperature parameter (defining the exploitation/exploration trade off) on these results. 相似文献

15.

Statistical mechanics approach to a reinforcement learning model with memory

Adam Lipowski Krzysztof Gontarek 《Physica A》2009,388(9):1849-1856

We introduce a two-player model of reinforcement learning with memory. Past actions of an iterated game are stored in a memory and used to determine player’s next action. To examine the behaviour of the model some approximate methods are used and confronted against numerical simulations and exact master equation. When the length of memory of players increases to infinity the model undergoes an absorbing-state phase transition. Performance of examined strategies is checked in the prisoner’ dilemma game. It turns out that it is advantageous to have a large memory in symmetric games, but it is better to have a short memory in asymmetric ones. 相似文献

16.

Deep reinforcement learning based IRS-assisted mobile edge computing under physical-layer security

《Physical Communication》2022

In this paper, we investigate an intelligent reflecting surface (IRS)-assisted mobile edge computing (MEC) network under physical-layer security, where users can partially offload confidential and compute-intensive tasks to a computing access point (CAP) with the help of the IRS. We consider an eavesdropping environment, where an eavesdropper steals information from the communication. For the considered MEC network, we firstly design a secure data transmission rate to ensure physical-layer security. Moreover, we formulate the optimization target as minimizing the system cost linearized by the latency and energy consumption (ENCP). In further, we employ a deep deterministic policy gradient (DDPG) to optimize the system performance by allocating the offloading ratio and wireless bandwidth and computational capability to users. Finally, considering the impacts from different resources, based on DDPG, seeing our optimization strategy as one criterion, we designed other criteria with different resource allocation schemes. And some simulation results are given to demonstrate that our proposed criterion outperforms other criteria. 相似文献

17.

Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

下载免费PDF全文

《中国物理 B》2015,(9)

This paper estimates an off-policy integral reinforcement learning(IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman(HJB) equation, an off-policy IRL algorithm is proposed.It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. 相似文献

18.

Stable reinforcement learning via temporal competition between LTP and LTD traces

Marco A Huertas Sarah Schwettmann Alfredo Kirkwood Harel Shouval 《BMC neuroscience》2014,15(Z1):O12

相似文献

19.

UAV path design with connectivity constraint based on deep reinforcement learning

《Physical Communication》2022

Cellular networks are expected to communicate effectively with unmanned aerial vehicles (UAVs) and support various applications. However, existing cellular networks are primarily designed to cover users on the ground; thus, coverage holes in the sky will exist. In this paper, we investigate the problem of path design for cellular-connected UAVs, taking into account the interruption performance throughout the UAV mission to minimize the completion time. Two types of connectivity constraints requirements are assumed to be available. The first is defined as the maximum continuous time interval that the UAV loses connection with base stations (BSs) below a predefined threshold. For the second, we consider the sum outage of UAV is limited during the entire UAV mission. The UAV is tasked with flying from a starting location to a final destination while minimization the mission time, satisfying the two constraints, separately. The formulated path design problem which involves continues variables and a dynamic radio environment, is not convex and thus is extremely difficult to solve directly. To tackle this challenge, a deep reinforcement learning (DRL) based trajectory design algorithm is proposed, where the Dueling Double Deep Q Network(Dueling DDQN) with multi-steps learning method is applied. Simulation results demonstrate the effectiveness of the proposed DRL algorithm and achieve a trade-off between the trajectory length of the UAV and connection quality. 相似文献

20.

Relay selection scheme based on deep reinforcement learning in wireless sensor networks

《Physical Communication》2022

Cooperative communication technology has realized the enhancement in the wireless communication system’s spectrum utilization rate without resorting to any additional equipment; additionally, it ensures system reliability in transmission, increasingly becoming a research focus within the sphere of wireless sensor networks (WSNs). Since the selection of relay is crucial to cooperative communication technology, this paper proposes two different relay selection schemes subject to deep reinforcement learning (DRL), in response to the issues in WSNs with relay selection in cooperative communications, which can be summarized as the Deep-Q-Network Based Relay Selection Scheme (DQN-RSS), as well as the Proximal Policy Optimization Based Relay Selection Scheme (PPO-RSS); it further compared the commonly used Q-learning relay selection scheme (Q-RSS) with random relay selection scheme. First, the cooperative communication process in WSNs is modeled as a Markov decision process, and DRL algorithm is trained in accordance with the outage probability, as well as mutual information (MI). Under the condition of unknown instantaneous channel state information (CSI), the best relay is adaptively selected from multiple candidate relays. Thereafter, in view of the slow convergence speed of Q-RSS in high-dimensional state space, the DRL algorithm is used to accelerate the convergence. In particular, we employ DRL algorithm to deal with high-dimensional state space while speeding up learning. The experimental results reveal that under the same conditions, the random relay selection scheme always has the worst performance. And compared to Q-RSS, the two relay selection schemes designed in this paper greatly reduce the number of iterations and speed up the convergence speed, thereby reducing the computational complexity and overhead of the source node selecting the best relay strategy. In addition, the two relay selection schemes designed and raised in this paper are featured by lower-level outage probability with lower-level energy consumption and larger system capacity. In particular, PPO-RSS has higher reliability and practicability. 相似文献