首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
2.
3.
4.
5.
6.
An accurate and generally applicable method for estimating aqueous solubilities for a diverse set of 1297 organic compounds based on multilinear regression and artificial neural network modeling was developed. Molecular connectivity, shape, and atom-type electrotopological state (E-state) indices were used as structural parameters. The data set was divided into a training set of 884 compounds and a randomly chosen test set of 413 compounds. The structural parameters in a 30-12-1 artificial neural network included 24 atom-type E-state indices and six other topological indices, and for the test set, a predictive r2 = 0.92 and s = 0.60 were achieved. With the same parameters the statistics in the multilinear regression were r2 = 0.88 and s = 0.71, respectively.  相似文献   

7.
8.
A new method, ALOGPS v 2.0 (http://www.lnh.unil.ch/~itetko/logp/), for the assessment of n-octanol/water partition coefficient, log P, was developed on the basis of neural network ensemble analysis of 12 908 organic compounds available from PHYSPROP database of Syracuse Research Corporation. The atom and bond-type E-state indices as well as the number of hydrogen and non-hydrogen atoms were used to represent the molecular structures. A preliminary selection of indices was performed by multiple linear regression analysis, and 75 input parameters were chosen. Some of the parameters combined several atom-type or bond-type indices with similar physicochemical properties. The neural network ensemble training was performed by efficient partition algorithm developed by the authors. The ensemble contained 50 neural networks, and each neural network had 10 neurons in one hidden layer. The prediction ability of the developed approach was estimated using both leave-one-out (LOO) technique and training/test protocol. In case of interseries predictions, i.e., when molecules in the test and in the training subsets were selected by chance from the same set of compounds, both approaches provided similar results. ALOGPS performance was significantly better than the results obtained by other tested methods. For a subset of 12 777 molecules the LOO results, namely correlation coefficient r(2)= 0.95, root mean squared error, RMSE = 0.39, and an absolute mean error, MAE = 0.29, were calculated. For two cross-series predictions, i.e., when molecules in the training and in the test sets belong to different series of compounds, all analyzed methods performed less efficiently. The decrease in the performance could be explained by a different diversity of molecules in the training and in the test sets. However, even for such difficult cases the ALOGPS method provided better prediction ability than the other tested methods. We have shown that the diversity of the training sets rather than the design of the methods is the main factor determining their prediction ability for new data. A comparative performance of the methods as well as a dependence on the number of non-hydrogen atoms in a molecule is also presented.  相似文献   

9.
10.
The campaign against drug abuse is fought by all countries, most notably on ATS drugs. The identification process of ATS drugs depends heavily on its molecular structure. However, the process becomes more unreliable due to the introduction of new, sophisticated, and increasingly complex ATS molecular structures. Therefore, distinctive features of ATS drug molecular structure need to be accurately obtained. In this paper, two variants of refined 3D geometric moment invariants for ATS drug molecular structure representation are discussed. This paper is also meant for comparing the performance of these two variants. The comparison was conducted using drug chemical structures obtained from Isomer Design’s PiHKaL.info database for the ATS drugs, while non-ATS drugs are obtained randomly from ChemSpider database. The assessment highlights the best technique which is suitable to be further explored and improved in the future studies so that it is wholly attuned with ATS drug molecular similarity search domain.  相似文献   

11.
A wide range of molecular representations exist today, ranging from human-readable structural diagrams over line notations such as Wiswesser Line Notation (WLN) and SMILES to several dozen computer-readable file formats. Still, to encode molecular structures in a computer-readable way for inputting structures in computer systems those formats are not the method of choice since they are not easily and faultlessly readable via optical recognition. In the present study a two-dimensional (PDF417) barcode representation of molecular structures in SMILES format is explored that enables the user to read and input molecular structures into computer systems in a fully automated fashion. A Lempel-Ziv-Welch (LZW) based compressed version of SMILES is suggested for cases where the size of the structure exceeds the storage capacity of PDF417 barcodes. Alternatively, the compact ACS format may be employed as a structural representation. The input via barcodes is fast, practically error free due to the 2D barcodes used which employ error correction and fully automatic. A Web application interface is developed which is able to interpret these barcodes and export them as optimized 3D chemical structures. Applications of this representation range from keeping automated storage systems to Web-based tracking systems of molecular samples. The National Chemical Laboratory, Pune, employs 2D barcode encoded structures for in-house repository management, where barcodes can also be used for querying the database for similar or substructures of the query structure.  相似文献   

12.
13.
Chemical structure searching based on databases and machine learning has attracted great attention recently for fast screening materials with target functionalities. To this end, we established a high-performance chemical structure database based on MYSQL engines, named MYDB. More than 160000 metal-organic frameworks (MOFs) have been collected and stored by using new retrieval algorithms for efficient searching and recommendation. The evaluations results show that MYDB could realize fast and efficient keyword searching against millions of records and provide real-time recommendations for similar structures. Combining machine learning method and materials database, we developed an adsorption model to determine the adsorption capacitor of metal-organic frameworks toward argon and hydrogen under certain conditions. We expect that MYDB together with the developed machine learning techniques could support large-scale, low-cost, and highly convenient structural research towards accelerating discovery of materials with target functionalities in the field of computational materials research.  相似文献   

14.
The molecular weight and electrotopological E-state indices were used to estimate by Artificial Neural Networks aqueous solubility for a diverse set of 1291 organic compounds. The neural network with 33-4-1 neurons provided highly predictive results with r(2) = 0.91 and RMS = 0.62. The used parameters included several combinations of E-state indices with similar properties. The calculated results were similar to those published for these data by Huuskonen (2000). However, in the current study only E-state indices were used without need of additional indices (the molecular connectivity, shape, flexibility and indicator indices) also considered in the previous study. In addition, the present neural network contained three times less hidden neurons. Smaller neural networks and use of one homogeneous set of parameters provides a more robust model for prediction of aqueous solubility of chemical compounds. Limitations of the developed method for prediction of large compounds are discussed. The developed approach is available online at http://www.lnh.unil.ch/~itetko/logp.  相似文献   

15.
16.
17.
18.
19.
20.
This article describes the use of the ICL Distributed Array Processor (DAP) for the automatic classification of chemical structure databases using the Jarvis-Patrick clustering method. This method is based upon the calculation of a table containing the nearest neighbors for each of the molecules in the database which is to be clustered. These nearest neighbors can be identified very efficiently using the DAP since it allows up to 4096 molecules to be compared with a specified molecule in parallel. Experiments with files of 4096 and 8192 structures from the Fine Chemicals Database show that clustering with the DAP is up to 6.7 times as fast as using a highly efficient, inverted file algorithm on an IBM 3083 mainframe.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号