首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
Zipf’s original law deals with the statistics of ranked words in natural languages. It has recently been generalized to “words” defined as n-tuples of symbols derived by translation of real-valued univariate timeseries into a literal sequence. We verify that the rank-frequency plot of these words shows, for fractional Brownian motion, the previously found power laws, but with large finite length corrections. We verify a finite size scaling ansatz for these corrections and, as aresult, demonstrate greatly improved estimates of the (generalized) Zipf exponents. This allows us to find the correct relation between the Zipf exponent and the Hurst exponent characterizing the fractional Brownian motion.  相似文献   

2.
Complex structure of human language enables us to exchange very complicated information. This communication system obeys some common nonlinear statistical regularities. We investigate four important long-range features of human language. We perform our calculations for adopted works of seven famous litterateurs. Zipf’s law and Heaps’ law, which imply well-known power-law behaviors, are established in human language, showing a qualitative inverse relation with each other. Furthermore, the informational content associated with the words ordering, is measured by using an entropic metric. We also calculate fractal dimension of words in the text by using box counting method. The fractal dimension of each word, that is a positive value less than or equal to one, exhibits its spatial distribution in the text. Generally, we can claim that the Human language follows the mentioned power-law regularities. Power-law relations imply the existence of long-range correlations between the word types, to convey an especial idea.  相似文献   

3.
We present in this paper a numerical investigation of literary texts by various well-known English writers, covering the first half of the twentieth century, based upon the results obtained through corpus analysis of the texts. A fractal power law is obtained for the lexical wealth defined as the ratio between the number of different words and the total number of words of a given text. By considering as a signature of each author the exponent and the amplitude of the power law, and the standard deviation of the lexical wealth, it is possible to discriminate works of different genres and writers and show that each writer has a very distinct signature, either considered among other literary writers or compared with writers of non-literary texts. It is also shown that, for a given author, the signature is able to discriminate between short stories and novels.  相似文献   

4.
Iddo Eliazar  Morrel H. Cohen 《Physica A》2011,390(23-24):4293-4303
We establish a “Central Limit Theorem” for rank distributions, which provides a detailed characterization and classification of their universal macroscopic statistics and phase transitions. The limit theorem is based on the statistical notion of Lorenz curves, and is termed the “Lorenzian Limit Law” (LLL). Applications of the LLL further establish: (i) a statistical explanation for the universal emergence of Pareto’s law in the context of rank distributions; (ii) a statistical classification of universal macroscopic network topologies; (iii) a statistical classification of universal macroscopic socioeconomic states; (iv) a statistical classification of Zipf’s law, and a characterization of the “self-organized criticality” it manifests.  相似文献   

5.
Wei Liang  Chi K. Tse 《Physica A》2009,388(23):4901-4909
Co-occurrence networks of Chinese characters and words, and of English words, are constructed from collections of Chinese and English articles, respectively. Four types of collections are considered, namely, essays, novels, popular science articles, and news reports. Statistical parameters of the networks are studied, including diameter, average degree, degree distribution, clustering coefficient, average shortest path length, as well as the number of connected subnetworks. It is found that the character and word networks of each type of article in the Chinese language, and the word network of each type of article in the English language all exhibit scale-free and small-world features. The statistical parameters of these co-occurrence networks are compared within the same language and across the two languages. This study reveals some commonalities and differences between Chinese and English languages, and among the four types of articles in each language from a complex network perspective. In particular, it is shown that expressions in English are briefer than those in Chinese in a certain sense.  相似文献   

6.
Statistical analysis of bacteria genomes has been performed on the basis of 20 complete genomes’ texts origin from Genebank. It has been revealed that the word ranked distributions are quite well approximated by logarithmic law. The results obtained in the absent word investigation show the considerably nonrandom character of DNA texts. In character of autocorrelation function behavior in several genomes period-3 oscillations have been observed. Short-range autocorrelations are present in short (n=3) words and practically absent in longer words.  相似文献   

7.
This paper deals with an application of Zipf law in climatology. This analysis allows the extraction of information not available by standard methods. In particular, rainfall temporal aggregation patterns associated with different climates are characterized by means of exponents derived from the resulting scaling laws. The analogy with linguistic analysis is obtained using a particular coding of precipitation as a discrete variable with four states (corresponding to four standard precipitation thresholds); each weekly symbolic sequence of observed precipitation is considered as a “word”, and each local station defines a “language” characterized by the observed words in a period representative of the climatology. To characterize these precipitation languages, we obtained characteristic exponents derived from the Zipf law for a set of representative stations of the main Köppen's climates and subclimates. We found different scaling behaviors for different subclimates, given by a single exponent in the range 0.60.6 (humid tropical climates) to 1.41.4 (polar climates); some humid middle-latitude subclimates exhibit a crossover with two different characteristic exponents corresponding to high and low frequency aggregation patterns (no explanation for this behavior is provided).  相似文献   

8.
M. Ausloos 《Physica A》2008,387(25):6411-6420
A comparison of two English texts written by Lewis Carroll, one (Alice in Wonderland), also translated into Esperanto, the other (Through the Looking Glass) are discussed in order to observe whether natural and artificial languages significantly differ from each other. One dimensional time series like signals are constructed using only word frequencies (FTS) or word lengths (LTS). The data is studied through (i) a Zipf method for sorting out correlations in the FTS and (ii) a Grassberger-Procaccia (GP) technique based method for finding correlations in LTS. The methods correspond to an equilibrium and a dynamic approach respectively to human texts features. There are quantitative statistical differences between the original English text and its Esperanto translation, but the qualitative differences are very minutes. However different power laws are observed with characteristic exponents for the ranking properties, and the phase space attractor dimensionality. The Zipf exponent can take values much less than unity (∼0.50 or 0.30) depending on how a sentence is defined. This variety in exponents can be conjectured to be an intrinsic measure of the book style or purpose, rather than the language or author vocabulary richness, since a similar exponent is obtained whatever the text. Moreover the attractor dimension r is a simple function of the so called phase space dimension n, i.e., r=nλ, with λ=0.79. Such an exponent could also be conjectured to be a measure of the author style versatility, — here well preserved in the translation.  相似文献   

9.
A series of phenomena pertaining to economics, quantum physics, language, literary criticism, and especially architecture is studied from the standpoint of synergetics (the study of self-organizing complex systems). It turns out that a whole series of concrete formulas describing these phenomena is identical in these different situations. This is the case of formulas relating to the Bose-Einstein distribution of particles and the distribution of words from a frequency dictionary. This also allows to apply a “quantized” from of the Zipf law to the problem of the authorship of Quiet Flows the Don and to the “blending in” of new architectural structures in an existing environment.  相似文献   

10.
《Physica A》1996,231(4):705-711
Various literature writings are compared by the “rank distance”, d, between two word frequency Zipf plots introduced by S. Havlin (Physica A 216 (1995) 148). We studied 22 books written by six authors. For this ensemble of books we find that the mean distance between books written by the same authors (〈d〉 = 15.2 ± 2.6) is considerably smaller than that between books written by different authors (〈d〉 = 21.8 ± 3.2), in good agreement with earlier results on a smaller sample of books. Our results suggest that the distribution of the rank difference of the same words in different books decays exponentially.  相似文献   

11.
This study examined the effect of presumed mismatches between speech input and the phonological representations of English words by native speakers of English (NE) and Spanish (NS). The English test words, which were produced by a NE speaker and a NS speaker, varied orthogonally in lexical frequency and neighborhood density and were presented to NE listeners and to NS listeners who differed in English pronunciation proficiency. It was hypothesized that mismatches between phonological representations and speech input would impair word recognition, especially for items from dense lexical neighborhoods which are phonologically similar to many other words and require finer sound discrimination. Further, it was assumed that L2 phonological representations would change with L2 proficiency. The results showed the expected mismatch effect only for words from dense neighborhoods. For Spanish-accented stimuli, the NS groups recognized more words from dense neighborhoods than the NE group did. For native-produced stimuli, the low-proficiency NS group recognized fewer words than the other two groups. The-high proficiency NS participants' performance was as good as the NE group's for words from sparse neighborhoods, but not for words from dense neighborhoods. These results are discussed in relation to the development of phonological representations of L2 words. (200 words).  相似文献   

12.
Some statistical properties of a network of two-Chinese-character compound words in the Japanese language are reported. In this network, a node represents a Chinese character and an edge represents a two-Chinese-character compound word. It is found that this network has properties of being “small-world” and “scale-free”. A network formed by only Chinese characters for common use (joyo-kanji in Japanese), which is regarded as a subclass of the original network, also has the small-world property. However, a degree distribution of the network exhibits no clear power law. In order to reproduce the disappearance of the power-law property, a model for a selecting process of the Chinese characters for common use is proposed.  相似文献   

13.
Chinese words may begin with /t/ and /d/, but a /t/-/d/ contrast does not exist in word-final position. The question addressed by experiment 1 was whether Chinese speakers of English could identify the final stop in words like beat and bead. The Chinese subjects examined approached the near-perfect identification rates of native English adults and children for words that were unedited, but performed poorly for words from which final release bursts had been removed. Removing closure voicing had a small effect on the Chinese but not the English listeners' sensitivity. A regression analysis indicated that the Chinese subjects' native language (Mandarin, Taiwanese, Shanghainese) and their scores on an English comprehension test accounted for a significant amount of variance in sensitivity to the (burstless) /t/-/d/ contrast. In experiment 2, a small amount of feedback training administered to Chinese subjects led to a small, nonsignificant increase in sensitivity to the English /t/-/d/ contrast. In experiment 3, more training trials were presented for a smaller number of words. A slightly larger and significant effect of training was obtained. The Chinese subjects who were native speakers of a language that permits obstruents in word-final position seemed to benefit more from the training than those whose native language (L1) has no word-final obstruents. This was interpreted to mean that syllable-processing strategies established during L1 acquisition may influence later L2 learning.  相似文献   

14.
Phe general invariant integral based on the energy conservation law is introduced into physical mesomechanics, with taking into account the cosmic, gravitational, mass, elastic, thermal and electromagnetic energy of matter. Phe physical mesomechanics thus becomes a mega-mechanics embracing most of the scales of nature. Some basic laws following from the general invariant integral are indicated, including Coulomb’s law of electricity generalized for moving electric charges, Newton’s law of gravitation generalized for coupled gravitational/cosmic field, the Archimedes’ law of buoyancy generalized for bodies partially submerged in water, and others. Using the invariant integral the temperature track behind moving cracks and dislocations is found out, and the coupling of elastic and thermal energies is set up in fracturing and plastic flow, namely for opening mode cracks and edge dislocations. For porous materials saturated with a fluid or gas, the notion of binary continuum is used to introduce the corresponding invariant integrals. As applied to the horizontal drilling and hydrofracturing of boreholes in the Earth’ crust, the field of pressure and flow rate as well as the fluid output from both a horizontal borehole and a diskshape fracture issuing the borehole, are derived in the fluid extraction regime. A theory of fracking in shale gas/oil reservoirs is suggested for three basic regimes of the drill mud permeation into the multiply fractured rock region, with calculating the shape and volume of this region in terms of the geometry parameters and pressures of rock, drill mud and shale gas. Phe method of functional equations in the theory of a complex variable and the boundary layer method are also used to solve these problems.  相似文献   

15.
We use the "wavelet transform microscope" to carry out a comparative statistical analysis of DNA bending profiles and of the corresponding DNA texts. In the three kingdoms, one reveals on both signals a characteristic scale of 100-200 bp that separates two different regimes of power-law correlations (PLC). In the small-scale regime, PLC are observed in eukaryotic, in double-strand DNA viral, and in archaeal genomes, which contrasts with their total absence in the genomes of eubacteria and their viruses. This strongly suggests that small-scale PLC are related to the mechanisms underlying the wrapping of DNA in the nucleosomal structure. We further speculate that the large scale PLC are the signature of the higher-order structure and dynamics of chromatin.  相似文献   

16.
We investigate the diffusion coefficient of the time integral of the Kuramoto order parameter in globally coupled nonidentical phase oscillators. This coefficient represents the deviation of the time integral of the order parameter from its mean value on the sample average. In other words, this coefficient characterizes long-term fluctuations of the order parameter. For a system of N coupled oscillators, we introduce a statistical quantity D, which denotes the product of N and the diffusion coefficient. We study the scaling law of D with respect to the system size N. In other well-known models such as the Ising model, the scaling property of D is D~O(1) for both coherent and incoherent regimes except for the transition point. In contrast, in the globally coupled phase oscillators, the scaling law of D is different for the coherent and incoherent regimes: D~O(1/N(a)) with a certain constant a>0 in the coherent regime and D~O(1) in the incoherent regime. We demonstrate that these scaling laws hold for several representative coupling schemes.  相似文献   

17.
We show that the laws of Zipf and Benford, obeyed by scores of numerical data generated by many and diverse kinds of natural phenomena and human activity are related to the focal expression of a generalized thermodynamic structure. This structure is obtained from a deformed type of statistical mechanics that arises when configurational phase space is incompletely visited in a strict way. Specifically, the restriction is that the accessible fraction of this space has fractal properties. The focal expression is an (incomplete) Legendre transform between two entropy (or Massieu) potentials that when particularized to first digits leads to a previously existing generalization of Benford’s law. The inverse functional of this expression leads to Zipf’s law; but it naturally includes the bends or tails observed in real data for small and large rank. Remarkably, we find that the entire problem is analogous to the transition to chaos via intermittency exhibited by low-dimensional nonlinear maps. Our results also explain the generic form of the degree distribution of scale-free networks.  相似文献   

18.
The empirical law uncovered by Menzerath and formulated by Altmann, known as the Menzerath–Altmann law (henceforth the MA law), reveals the statistical distribution behavior of human language in various organizational levels. Building on previous studies relating organizational regularities in a language, we propose that the distribution of distinct (or different) words in a large text can effectively be described by the MA law. The validity of the proposition is demonstrated by examining two text corpora written in different languages not belonging to the same language family (English and Turkish). The results show not only that distinct word distribution behavior can accurately be predicted by the MA law, but that this result appears to be language-independent. This result is important not only for quantitative linguistic studies, but also may have significance for other naturally occurring organizations that display analogous organizational behavior. We also deliberately demonstrate that the MA law is a special case of the probability function of the generalized gamma distribution.  相似文献   

19.
Function words, especially frequently occurring ones such as (the, that, and, and of), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., thi, thaet, aend, inverted-v v) or a more reduced or lenited pronunciation (e.g., thax, thixt, n, ax). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.  相似文献   

20.
We review recent progress in understanding the meaning of mutual information in natural language. Let us define words in a text as strings that occur sufficiently often. In a few previous papers, we have shown that a power-law distribution for so defined words (a.k.a. Herdan's law) is obeyed if there is a similar power-law growth of (algorithmic) mutual information between adjacent portions of texts of increasing length. Moreover, the power-law growth of information holds if texts describe a complicated infinite (algorithmically) random object in a highly repetitive way, according to an analogous power-law distribution. The described object may be immutable (like a mathematical or physical constant) or may evolve slowly in time (like cultural heritage). Here, we reflect on the respective mathematical results in a less technical way. We also discuss feasibility of deciding to what extent these results apply to the actual human communication.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号