数据挖掘,就是从存放在数据库,数据仓库或其他信息库中的大量的数据中获取有效的、新颖的、潜在有用的、最终可理解的模式的非平凡过程。其在商业领域具有广泛的使用,透过数理模式来分析企业内储存的大量资料,以找出不同的客户或市场划分,分析出消费者喜好和行为,为管理者提供决策支持。数据挖掘,简单说,就是从大量的数据中,抽取出潜在的、有价值的知识、模型或规则的过程。本文将数据分析过程分为确定数据分析目标、研究设计、数据预处理、整理与数据挖掘、解释和分析计算结果 5 个阶段。利用 MATLAB 软件的聚类分析和判别分析功能对某高校某一段时期内的用户上网日志的分析,挖掘出在抽样时间段内用户上网的行为模式,为科学的进行网络管理提供依据。实践表明,该方法具有简便易用,有着广泛的应用价值。
关键词:用户行为模式;数据挖掘;apriori算法;判别分析;MATLAB
Data mining is to extract effective, novel, potentially useful data from a large amount of data stored in databases, data warehouses, or other repositories. A nontrivial process that is ultimately understandable. It is widely used in the business world. It uses mathematical models to analyze a large amount of data stored in an enterprise, to identify different customer or market segments, and to analyze consumer preferences and behaviors. Data mining, in short, is the process of extracting potential, valuable knowledge, models or rules from a large amount of data. Data preprocessing, data processing and data mining, interpretation and analysis of the results of calculation are five stages. The function of clustering analysis and discriminant analysis of MATLAB software is used to analyze the users' online logs in a certain period of time in a certain university. In order to provide scientific basis for network management, this method is easy to use and has wide application value.
Keywords: user behavior pattern; data mining apriori algorithm; discriminant analysis; MATLAB
目录