Search / Korean Journal of Chemical Engineering
Korean Chemical Engineering Research,
Vol.48, No.6, 717-724, 2010
유기물의 인화점 예측을 위한 부분최소자승법과 SVM의 비교
Comparison of Partial Least Squares and Support Vector Machine for the Flash Point Prediction of Organic Compounds
액체의 화재 및 폭발위험을 나타내는 가장 중요한 물성의 하나인 인화점의 실험 데이터는 그 필요에도 불구하고 실제로 데이터를 확보하는 것이 가능하지 않은 경우가 많다. 이 연구에서는 DIPPR 801에서 얻은 893개 유기물의 인화점 실험데이터로부터 인화점을 예측하는 부분최소자승법(PLS) 및 support vector machine(SVM) 모델을 만들고 비교하였다. 분자를 구성하는 각 구성요소들이 분자의 물성에 일정한 기여를 한다는 가정을 이용하여 분자의 물성을 예측하는 방법인 그룹기여법을 이용하여 65개 작용기가 이 예측모델의 독립변수가 되었고 분자량의 로그값이 추가되었다. 두 모델에서 결정해야 할 매개변수는 교차검증에서 계산된 오차를 이용하여 결정되었는데, SVM모델은 그 매개변수가 많아 particle swarm optimization을 이용한 최적화를 이용하였다. 훈련데이터의 선택이 예측성능에 영향을 줄 수 있 어 임의로 100개의 데이터 세트를 생성하여 테스트하였다. 전체 데이터에 대해 계산된 평균절대오차는 PLS가 13.86~14.55였고, SVM이 7.44~10.26여서 SVM이 PLS에 비해 매우 우수한 예측성능을 보였다.
The flash point is one of the most important physical properties used to determine the potential for fire and explosion hazards of flammable liquids. Despite the needs of the experimental flash point data for the design and construction of chemical plants, there is often a significant gap between the demands for the data and their availability. This study have built and compared two models of partial least squares(PLS) and support vector machine(SVM) to predict the experimental flash points of 893 organic compounds out of DIPPR 801. As the independent variables of the models, 65 functional groups were chosen based on the group contribution method that was oriented from the assumption that each fragment of a molecule contributes a certain amount to the value of its physical property, and the logarithm of molecular weight was added. The prediction errors calculated from cross-validation were employed to determine the optimal parameters of two models. And, an optimization technique should be used to get three parameters of SVM model. This work adopted particle swarm optimization that is one of heuristic optimization methods. As the selection of training data can affect the prediction performance, 100 data sets of randomly selected data were generated and tested. The PLS and SVM results of the average absolute errors for the whole data range from 13.86 K to 14.55 K and 7.44 K to 10.26 K, respectively, indicating that the predictive ability of the SVM is much superior than PLS.
[References]
  1. Katritzky AR, Petrukhin R, Jain R, Karelson M, J. Chem. Inf. Comput. Sci., 41, 1521, 2001
  2. Crowl DA, Louvar JF, Chemical Process Safety: Fundamentals with Applicatoins, 2nd Ed., Prentice Hall, Upper Saddle River, NJ, 2001
  3. Vidal M, Rogers WJ, Holste JC, Mannan MS, Process Saf. Prog., 23, 47, 2004
  4. Suzuki T, Ohtaguchi K, Koide K, J. Chem. Eng. Jpn., 24, 258, 1991
  5. Tetteh J, Suzuki T, Metcalfe E, Howells S, J. Chem. Inf. Comput. Sci., 39, 491, 1999
  6. Katritzky AR, Stoyanova-Slavova IB, Dobchev DA, Karelson M, J. Mol. Graph. Model., 26, 529, 2007
  7. Gharagheizi F, Alamdari RF, QSAR Comb. Sci., 27, 679, 2008
  8. Pan Y, Jiang J, Wang R, Cao H, Zhao J, QSAR Comb. Sci., 27, 1013, 2008
  9. Patel SJ, Ng D, Mannan MS, Ind. Eng. Chem. Res., 48(15), 7378, 2009
  10. http://michem.disat.unimib.it/mole_db/
  11. Constantinou L, Gani R, AIChE J., 40(10), 1697, 1994
  12. Wen X, Qiang Y, Ind. Eng. Chem. Res., 40(26), 6245, 2001
  13. Albahri TA, Ind. Eng. Chem. Res., 42(3), 657, 2003
  14. Kolska Z, Kukal J, ZAbransk M, Ruzicka V, Ind. Eng. Chem. Res., 47(6), 2075, 2008
  15. Lee CJ, Lee G, So W, Yoon ES, Korean J. Chem. Eng., 25(3), 568, 2008
  16. http://dippr.byu.edu/.
  17. Lee HD, Lee MH, Cho HW, Han C, Chang KS, HWAHAK KONGHAK, 35(5), 605, 1997
  18. Russell EL, Chiang LH, Braatz RD, Data-driven Techniques for Fault Detection and Diagnosis in Chemical Processes, Springer-Verlag, London, 2000
  19. Vapnik VN, The Nature of Statistical Learning Theory, Springer-Verlag, New York, NY, 1995
  20. ttp://www.csie.ntu.edu.tw/~cjlin/libsvm/.
  21. Schwaab M, Biscaia EC, Monteiro JL, Pinto JC, Chem. Eng. Sci., 63(6), 1542, 2008