Research on the Public's Understanding and Satisfaction of

Microalgae waste water Treatment Technology

Abstract


In order to explore the public's cognition and satisfaction with microalgae water purification technology, this paper adopts the method of questionnaire survey, based on the user questionnaire data collected by the online survey in the survey, through reliability test, validity test, item analysis and analysis, chi-square Statistical methods such as independent testing are used to test the rationality of the questionnaire, and prove that the survey results are true and credible.

According to the results of the questionnaire survey, the basic information of users is analyzed by using descriptive statistical methods, and a user cognitive model based on CRITIC weighting method is established based on Bloom's educational goal classification theory and six levels of cognition. The user's understanding of microalgae water purification technology. According to the user cognitive model, the user's understanding is analyzed, and the K-means clustering analysis method is used to mine potential users, and deeply explore the value of the five potential users and their personalized characteristics.

Finally, two major conclusions of this article are given: in the cognitive analysis of users, it was found that 211 people had never heard of microalgae before, and more than 50% of the remaining 364 people who had heard of microalgae were not familiar with microalgae. The understanding of algae water purification technology is not high; out of potential user mining, it is believed that it is necessary to focus on the two groups of people with high education or middle-aged and elderly people with rich experience to carry out corresponding publicity and popularization. According to the analysis of the questionnaire, this paper also puts forward corresponding product development suggestions and strategies.

Key words: Cognitive model; Cluster analysis; Potential user; CRITIC weight; Questionnaire analysis

1 Introduction

The current climate and water ecological environment are becoming increasingly tense. How to seek better water purification and carbon sequestration methods is of great significance to promoting the realization of the dual carbon goal. The traditional waste water treatment process turns the organic carbon and ammonia nitrogen in the waste water into carbon dioxide and nitrogen and discharges them into the atmosphere through a large amount of aeration and alternating anaerobic and aerobic operation, and does not realize the cyclic conversion in the form of energy or resources. With the depletion of fossil energy and the increasingly significant greenhouse effect, it is very urgent to find a more energy-saving and environmentally friendly wastewater treatment process. Using microalgae biotechnology to treat waste water is a promising waste water resource technology. In recent years, a large number of related researches have been carried out at homehunv and abroad.

However, as an emerging industry, genetically modified microalgae and related water purification technology will inevitably be questioned by the public. The public's understanding and satisfaction of microalgae may become a major obstacle to the development of microalgae. Therefore, by issuing questionnaires to different groups of people classified by occupation to conduct research and analysis, to study their understanding and acceptance of microalgae products and their water purification technology, and to provide certain guidance for the direction and publicity of microalgae water purification products in the future. References and suggestions.

2 Investigation Plan and Implementation

2.1 Investigation Plan
2.1.1 Purpose of the Investigation

(1)Collect the public's knowledge of microalgae water purification technology, and collect the basic information of the respondents, including gender, age, occupation, education, permanent city and other data, and analyze the understanding of different groups on microalgae water purification technology degree.

(2)Collect public satisfaction with microalgae products and related technologies, and provide certain reference for product promotion methods and future promotion.

2.1.2 Investigation Method

Affected by the new crown epidemic, this questionnaire survey adopts the questionnaire network platform, replacing traditional paper questionnaires with online survey electronic questionnaires. Jump questions are set in the electronic questionnaires, which can effectively target different occupational groups with different questioning methods, eliminating the need for data entry. Avoid typographical errors.

2.1.3 Investigation Items

This survey mainly studies the public's awareness of microalgae water purification technology. The survey is divided into three dimensions: factual component, cognitive component, and emotional component. The factual component is the current state of something; the cognitive component is the Knowledge and awareness of something; the emotional component is a preference for something. The corresponding survey questions are set through these three dimensions, the respondents are considered, and the survey frame of the questionnaire is constructed as shown in Table 1.

Table 1 Part of the Questionnaire Design Framework Table

(see Annex 1-3 for the complete questionnaire)

Measuring dimensions Measuring questions
Fact component 【Single choice】Does your industry fall into which of the following categories?
【Multiple choice】Which of the following methods has been used in your wastewater treatment?
【Single choice】Does your company have a special waste water discharge supervision institution?
Cognitive component 【Multiple choice】In response to the sustainable development strategy, the country's investment in environmental governance has increased significantly, and the comprehensive water pollution control of key rivers and waters has been strengthened. Which of the following policies do you know about the supervision of waste water discharge?
【Multiple choice】Which of the following wastewater treatment measures do you know?
【Multiple choice】Which of the following waste water discharge supervision content do you know?
【Scale questions】The government work report of the State Council in 2021 pointed out that we should do a good job in carbon peaking and carbon neutralization, and formulate an action plan for carbon emission peaking before 2030, in which microalgae also play an important role.How well do you understand the role of microalgae in the ecological environment?
【Multiple choice】What difficulties do you think may exist in the development of microalgae factories?
【Scale questions】The goal of synthetic biology is to build artificial biological systems that behave like electrical circuits. How well do you know about synthetic biology?
【Scale questions】How much do you know about the principles of microalgae treatment of waste water?
【Scale questions】How well do you understand the two principles of corporate waste water: the treatment principle of corporate domestic waste water and the principle of combining compliance with risk control?
【Scale questions】On November 10, 2016, the State Council issued the "Implementation Plan for the Control of Pollutant Discharge Permit System", which attracted widespread attention, and keywords such as "reducing the burden on enterprises" officially entered people's attention. How well do you understand the cost of microalgae technology for wastewater treatment?
【Scale questions】How do you know the process of microalgae treatment of waste water technology?
【Multiple choice】Which of the following cutting-edge technological developments in the field of microalgae are you aware of?
Emotional component 【Multiple choice】What do you think are the disadvantages of traditional environmental detection methods?
【Scale questions】In 2002, the 16th National Congress of the Communist Party of China took "continuous enhancement of sustainable development capability" as one of the goals of building a well-off society in an all-round way. Do you have high expectations for the large-scale development of microalgae for water pollution control in the future?
【Multiple choice】If there is an opportunity to promote and use microalgae to treat waste water in the future, which of the following characteristics do you most hope this technology has?
【Multiple choice】How would you most like to learn about microalgae?
【Scale questions】Sustainable development not only meets the needs of the present while protecting the environment, but also looks forward to not harming the needs of future generations. Based on the above advantages, what is your acceptance of microalgae applications in energy, agriculture, food, medical, and industrial fields?
【Scale questions】The Standing Committee of the National People's Congress implements the decisions and deployments of the CPC Central Committee and conducts inspections on the enforcement of the Marine Environmental Protection Law. Do you usually pay attention to the latest developments in the field of waste water treatment technology?
【Multiple choice】What paths do you think your company mainly relies on in its efforts to achieve carbon neutrality?
Normal mode data analysis 【Single choice】Your age?
【Single choice】Your gender?
【Single choice】Your occupation?
【Single choice】Your degree?
【Fill in the blank】Your resident city?
2.2 Implementation of the Investigation
2.2.1 Organization of the Investigation

Starting from January 2022, we have conducted sufficient team discussions to determine the content and purpose of the survey: to study the understanding and satisfaction of different groups of people on microalgae water purification technology, and to initially design the content of the questionnaire. From February to April 2022, we conducted a two-month pre-survey. By analyzing the pre-survey data, we screened and replaced unqualified questions to improve the questionnaire. After confirming the formal questionnaire, we began to formally release the questionnaire to the public, and planned the time for returning the questionnaire, data analysis, report writing and other work arrangements.

Figure 1 The Diagram of Time Progression
2.2.2 Quality Control
(1) Questionnaire Quality

In the formal questionnaire given, we set up a set of jump questions, starting from "what is your current occupation?" and "do you know anything about microalgae?", This measure can effectively distinguish user categories and be more targeted.and different survey questions are given based on different user responses.

(2) Investigation Fee

Conducting research in the form of electronic questionnaires can reduce the printing and binding costs of paper questionnaires.

3 Pre-survey Data Verification

3.1 Reliability Test

Reliability refers to the reliability or consistency of the measurement results. The higher the reliability, the more consistent or stable the measurement results. This questionnaire adopts the analysis based on the Cronbach Alpha coefficient method for reliability analysis. When the Alpha coefficient is greater than 0.7, it indicates that the sample reliability is high. We divided the questionnaire into microalgae-related issues and waste water treatment-related issues and asked three types of users. The users were divided into different occupations: corporate incumbents, biological-related professionals, and the public. SPSS software was used for reliability test to check the consistency of the scale.x

Table 1 Reliability Test of Microalgae Related Issues

User occupation type Cronbach Alpha based on normalization term number of items Evaluation of reliability
corporate employees 0.849 28 very high
Bio-related professionals 0.754 38 very high
the masses 0.725 22 very high

Table 2 Reliability Test of waste water Treatment Related Issues

User occupation type Cronbach Alpha based on normalization term number of items Evaluation of reliability
corporate employees 0.718 17 very high
Bio-related professionals 0.794 2 very high
the masses 0.758 6 very high

It can be clearly seen from the table that the coefficients of the three aspects of microalgae-related issues and waste water treatment-related issues are all greater than 0.7, and the questionnaire has high reliability.

3.2 Validity Test

Validity refers to the degree to which measurement tools or means can accurately measure the things to be measured, and exploratory factor analysis is used to test the validity. There are two main steps in the validity test:

Step 1: Carry out the Bartlett sphere test to test whether the data can be used for factor analysis;

Step 2: Rotate the factor analysis to obtain the factor loading value of each option, and calculate the cumulative explained overall variance variation.

The Bartlett sphericity test is used to judge whether the correlation matrix is a unit matrix, that is, whether each variable has a strong correlation. If P<.05, the sphericity test is not obeyed, and the assumption that each weight variable is independent should be rejected, that is, there is a strong correlation between the variables; when P>.05, the spherical test is obeyed, and the variables are independent of each other, and factor analysis cannot be done.

Table 3 Microalgae Related Issues Bartlett Test

User occupation type Bartlett test Approximate chi-square
corporate employees <0.01 193.301
Bio-related professionals <0.01 868.249
the masses <0.01 151.469

Table 4 Bartlett Inspection for waste water Treatment Related Issues

User occupation type Bartlett test Approximate chi-square
corporate employees <0.01 45.304
Bio-related professionals <0.01 93.302
the masses <0.01 13.909

The above table is the result of the validity test, the P values are all <0.05, and there is a strong correlation between the variables. Therefore, the structural classification of the scale is reasonable.

3.3 Project Analysis Test

Item analysis can test the distinction of each item in the scale. Specifically, it is to test whether some of the surveyed subjects can give high scores in the scale, and some respondents can give high scores in the scale. It means that each item in the scale has better discrimination. The essence is to explore the differences of subjects with high and low scores in each item.

Take the question "Have you felt the benefits of the application of microalgae in your daily life?" as an example for project analysis. There are 5 options in the question, and the answer to each option ranges from "very impressed" to "neither listening nor feeling." were set to 1-4 points respectively, and those with a setting lower than 27% were classified as low, and those higher than 73% were classified as high.The item analysis test was carried out, and the results are shown in the table. The significance level of the high and low groups is P<0.05, indicating that the setting with the item has a high degree of discrimination and can distinguish the attitudes of different respondents. At the same time, the item analysis of other items has passed, that is, the questionnaire has a good degree of discrimination.

Table 4 Subjective Attitude

Item P value-assuming equal variances
How receptive are you to the application of microalgae in the following areas? 0.00
Which of the following characteristics of microalgae technology are you most optimistic about? 0.00
What do you think is the main reason why the development of microalgal cell factories may be limited? 0.00
How well do you research the relevant functional mechanisms of microalgae? 0.00

3.4 Chi-square Test of Independence

The independence test can be used to judge whether two variables are related to each other or are independent of each other. Take the variables "Have you experienced the benefits of using microalgae in your daily life" and "How well do you know about fucoxanthin (one of the brown algae extracts) and its effects?" The chi-square independence test is performed, and the hypothesis test problem is:

H0: The two variables are irrelevant

H1: Two variables are correlated

Table 5 Chi-square Test of Independence

Value Degrees of freedom Progressive significance (two-sided)
Pearson's chi-square 66.542 16 <0.001
Likelihood ratio 66.091 16 <0.001
Number of valid cases 86

The asymptotic significance is all less than 0.05, so the null hypothesis can be rejected, indicating a correlation between the two variables, that is, people who experience the benefits of microalgae in their lives know about fucoxanthin. Therefore, the chi-square independent test is performed on other variables in turn, and it can be found that there is a pairwise correlation between most of the variables.


4 Data Processing and Analysis

4.1 Data Processing
4.1.1 Data Output

Through the questionnaire network platform, the user's detailed information can be exported to an excel file. In the exported data file, according to the order of options, it is coded in numerical order from small to large, indicating the content of the user's answer to each question.

4.1.2 Data Cleaning

(1) Eliminate the samples that did not complete the required questions, that is, the samples that did not complete the questionnaire;

(2) According to the user's filling and submitting time, the samples with too short or too much filling time are excluded.

4.2 Descriptive Statistics
4.2.1 Age Distribution of Sample People



Figure 2 Age Distribution map

Among the respondents surveyed, a total of six age groups were covered. The survey population is mainly concentrated between the ages of 18 and 25. At the same time, this group of people is also the main recipient and consumer of emerging things. Their attitudes and acceptance of genetically modified microalgae to develop new energy can to a certain extent reflect the development trend of the industry in the next few years.

4.2.2 Educational Distribution of Sample Personnel



Figure 3 Educational Distribution

Among the respondents, there are a total of 7 educational backgrounds, and the majority of them have a bachelor's degree or above. The specific proportion is as shown in the figure above. Among the surveyed population species, undergraduates and above account for the majority, and the population has more relevant knowledge and can give a more correct judgment on the new energy of microalgae.The data can well represent the attitudes of high-knowledge population towards the production of new energy by transgenic microalgae.

4.2.3 Occupational distribution of sample personnel



Figure 4 Occupation Distribution

Table 6 Analysis of the Degree of Understanding of Microalgae by Different Occupations


The investigators included thirteen occupations, of which students accounted for the majority. In addition to freelancers in all occupations, the understanding rate of microalgae reached 88.9%, and the chance caused by the small sample cannot be ruled out. More than 30% of people in other occupations have no knowledge of microalgae before. The lack of understanding of microalgae in the whole society may affect the promotion of microalgae industry.

5. User Cognition Analysis of the Model based on Bloom's Cognition
5.1 Bloom's Cognitive Process Dimension

In Bloom's educational goal classification theory, people's cognitive process is divided into six dimensions: memory, understanding, application, analysis, evaluation, and creation, to represent the different stages of the understanding of things[2].

First, the memory dimension, refers to the recognition and memory. This level involves the identification of specific knowledge or abstract knowledge, in a form very close to an idea and phenomenon that students had first encountered.

Second, the dimension of understanding refers to the understanding of things, but it does not require a deep understanding, but it is preliminary, which may be superficial. It includes "transformation", interpretation, inference, etc.

Third, the application dimension refers to the application of the concepts, principles and principles learned. It requires learning to correctly apply abstract concepts to appropriate situations without explaining problem solving patterns. The application mentioned here is a preliminary direct application, rather than a comprehensive, through analysis, comprehensive use of knowledge.

Fourth, the analysis dimension refers to the decomposition of the material into its components, so that the mutual relationship between the concepts is more clear, the organization structure of the material is clearer, and clarify the basic theory and basic principles in detail.

Fifth, the dimension of evaluation. The requirement of this level is not to make a judgment based on intuitive feelings or observation phenomena, but to make a rational and profound persuasive judgment on the value of the essence of things. It integrates internal and external data and information to make inferences in line with objective facts.

Sixth, the creation dimension is based on the analysis, and recombine them into the whole again, so as to solve the problem comprehensively and creatively. It involves distinctive expressions, making reasonable plans and feasible steps, and launching certain rules according to basic materials. It emphasizes characteristics and initiative, and is a high-level requirement.

5.2 User Cognitive Model based on CRITIC
5.2.1 The Weights are Calculated by the CRITIC Method

In order to study which level of users' cognition of microalgae waste water treatment technology comes from, by analyzing the influence of the questions on the degree of product demand, the CRITIC weight method is used to measure the objective weight of the questions, considering the variability of the questions and the correlation between the questions[3]. If the standard deviation is used to indicate the differential fluctuation of the values within the option, the larger the standard deviation, the greater the numerical difference of the option, the more information can be shown, and the stronger the influence of the option itself. Using the correlation coefficient to represent the correlation between indicators, the stronger the correlation with other options, the less the option conflicts with other options, reflecting the more the same information, which somewhat weakens the influence of the option.

So, we first selected the questionnaire for cognitive dimensions, such as " Do you understand how microalgae helps achieve carbon neutrality?"" How do you know about the development of microalgae treatment technology?"et al., the higher the value of these options, the higher the understanding. The lower the value, the lower the understanding degree. The larger the standard deviation, the greater the numerical difference in the option, and the greater the impact of the option of choice on whether to know the algae product. The larger the correlation coefficient, the more information it reflects the same as the other option, and then the choice of the option has less impact on knowing the product. Therefore, consider the two factors comprehensively, calculate the information quantity, and get its weight. The specific steps are as follows:

Step 1: Forward processing of the data, dimensionless processing;

(1)

Step 2: Calculate the option variability;

(2)

Step 3: Calculate the option conflict;


(3)

Step 4: Calculate the information quantity;

(4)

Step 5: Calculate the weights.


(5)

The weight calculation results are shown in Table 6:

Table 6 The Weight of the Six Major Dimensions of Cognition

Understanding dimension memory understand apply analyse appraise create
weight 0.0967 0.1127 0.4642 0.1189 0.1000 0.1076

5.2.2 Users’ Understanding Degree prediction

β:(i= 1,2..5)X1、X2、X3、X4、X5、X6 The regression equation is established by obtaining the weights of the six cognitive dimensions, where the coefficient is the weight of the five major requirements, and the variable represents the options selected by the user for the five types of problems. By calculating the Y value, it can be expressed as the user's acceptance score of the product.

The scores were calculated to indicate the user's understanding of the microalgae water purification technology. It was found that among the user data collected, 211 people had never heard of the microalgae, with a score of 0, which was defined as users to be popularized. Among the remaining users who know about microalgae, 24.17% of the predicted score is higher than 0.8,2.60% is below 0.3, and the rest is the middle score, which shows that our prediction score is in line with the general phenomenon.

Score segment number of people
0 211
0-10 3
10-20 5
20-30 7
30-40 2
40-50 3
50-60 15
50-60 15
60-70 63
70-80 137
80-90 95
90-100 34

6. Potential User Mining Model based on K-Clustering

6.1 Model Background

In modeling for the questionnaire results found that different users of microalgae water purification technology, how to through further analysis, understanding of microalgae water purification technology of the corresponding characteristics of different crowd, help to further analyze the corresponding science popularization means and the scope of propaganda, for microalgae water purification technology promotion and improve the corresponding recognition for reference.

6.2 User Mining Model Building
6.2.1 Cluster Analysis based on AESPU

For the cluster analysis of the questions for potential users in the questionnaire, the clustering factors are preferred[4]. Starting from the characteristics of users and the previous analysis of user cognitive model, we set five cluster analysis indicators: A, E, S, P and U.

Table 8 Potential User Task Meaning

The collected user data were analyzed, and the relationship between k and SSE and the profile coefficients were calculated, dividing the users into 5 categories for cluster analysis by python.

metric A E S P U
meaning age sex record of formal schooling occupation Understand the degree



Figure 6 Performance Indicator Analysis

6.2.2 Potential User Composition

User data were clustered, and the clustering results are shown in Table 8,

The Table 9 Clustering Results

Z_ age Z_ sex Z_ record of formal schooling Z_ occupation Z_ score Number of categories
0 0. 739849 -0. 262072 0. 137494 -0. 391089 0. 048058 190
1 -0. 335075 -0. 354205 0. 019592 2. 693135 0. 027030 34
2 -0. 275357 2. 706168 0. 313416 0. 396105 0. 078421 26
3 -1. 290563 -0. 311768 0. 153617 -0. 372527 -0. 067940 102
4 0. 739849 1. 790504 -3. 892901 0. 803027 -0. 083163 13

Draw the radar map of various users and analyze the attributes of five types of users. From the figure, it can be found that each user group has significantly different performance characteristics: user group 1 is the largest in age and understanding attributes, and the smallest in career attributes; user group 3 is relatively small in education and understanding attributes; user group 4 is evenly distributed in various attributes; user group 5 is the largest in gender attributes, and the distribution of other attributes is evenly distributed.


Figure 7 User Characteristics Radar


Based on the feature description, this paper defines four levels of user categories: important potential users, important development users, general users, and low-value potential users. The characteristics of each customer type are as follows:

(1) Important potential users: mainly for the first group of people, the biggest characteristic is higher education, older age, the analysis may be college teachers or highly educated elderly group, rich experience, pay more attention to and understand the microwater purification of algae and other related knowledge.

(2) Important development users: mainly in the third group, who are highly educated but young, and most of them are college students receiving higher education. Such users are in the new development force of the society and have a high acceptance degree of new things. According to the characteristics, this population's awareness of microalgae water purification technology needs to be improved, and it may develop into important potential users in the future.

(3) General users: This kind of user group is relatively complex, consisting of the second cluster of people. According to the user characteristics, they are mainly female groups with low educational background, and have a certain understanding of microalgae water purification technology.

(4) Low-value potential users: that is, the fourth and fifth clusters, except for the fifth cluster, are mostly female, and the special characteristics of these two types of users are relatively stable, with no features more significantly different from other users.


7. Conclusion and strategy

7.1 Research conclusions

Through the user data obtained through the online questionnaire, this paper deeply studies the users' understanding of the microalgae water purification technology, starting from the facts, cognition, composition and emotional components. The views and conclusions of this paper are as follows:

Users are not well aware of microalgae and microalgae water purification technology. The current non-professional technical personnel of microalgae and its water purification related principles technology is weak, mainly more stay in memory level understanding, is simply heard of the corresponding products, no specific understanding of microalgae application in various fields, as well as the corresponding water purification measures and policies, more there is no existence of microalgae water purification have corresponding understanding and improvement Suggestions.

In mining potential users, enterprises need to focus on the development of highly educated or have rich experience in the elderly, such users for microalgae water purification technology crucial group, at the same time, will accept higher education of college students into the next development of customers, by enhancing propaganda, drive this group for micro algae water purification technology understanding, thus to join this kind of high knowledge crowd to further promote the development of industry.

7.2 Suggestions
7.2.1Expand product publicity channels
(1) Strengthen the science popularization and publicity work

User understanding of microalgae and its water purification technology is largely from the microalgae products, and the lack of cognition of microalgae transgenic technology is closely related, enterprises should pay attention to transgenic microalgae products science propaganda work, through the system of information dissemination channels, keep the mass media: such as television, newspaper propaganda, strengthen new media platform such as: WeChat public, weibo position propaganda, effectively regulate and control pseudoscience information dissemination, ensure strong credibility and authority of product propaganda.

(2) Pay attention to key groups

In the face of different understanding degrees and contributions of different groups to microalgae and water purification related information, microalgae enterprises should focus on the cognitive differences of different groups, form a situation of information dissemination from point to surface, and improve the efficiency of publicity.

7.2.2 To ensure product safety and improve laws and regulations
(1) Improve the detection technology of microalgae products

Testing of transgenic microalgae products is necessary to ensure their quality and safety. With the continuous development of gm food, countries and related international organizations are actively carrying out transgenic product testing technology, at present, the transgenic products of the most mature technology is P CR technology, enterprises should be on the basis of international and domestic technology, looking for more accurate, fast, safe detection technology, strengthen the safety assessment of gm products, let users can safely use transgenic microalgae products.

(2) Improve relevant laws and regulations

To promote transgenic microalgae products, must strengthen the improvement of relevant laws and regulations, strictly control the use of transgenic products scientific field and species, not only to make consumers' right to know, choice and other legitimate rights and interests get reimbursement, more attention to control the microalgae harm to the ecological environment, it is strictly prohibited for enterprises or factories to influence and harm the ecological environment system operation[5].



          
Appendix IV critic The matlab code for the calculation of the weights

clc,clear

data =xlsread('data1.xlsx','Sheet1','G1:L576');

From the data (:, 3) = to-data (:, 3); the% cost-type

index is converted to the benefit type

[m, n] = size(data);

for i = 1:n

    data(:, i) = (data(:, i) - min(data(:, i)))/(max(data(:, i)) - min(data(:, i))); end

The corr = corrcoef (data);% calculates the

correlation coefficient matrix

The corr_1 = sum (1-corr);% calculate conflict

data_std = std (data);% calculates variance per

column

C = data_std .* The corr_1;% calculates the

information quantity

w = C./ sum (C);% calculation weights

w

Appendix V The python code for the k-means clustering

import numpy as np

import pandas as pd

from sklearn.cluster import KMeans

from sklearn import metrics

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

# Data preparation C://Users

data =

pd.read_excel('C://Users//15722//Desktop//data.xls

x')

columns = ["year", "gender",

"degree","profession",'score']

data = np.array(data[0:])[:,:]

data = (data - data.mean(axis = 0))/(data.std(axis =

0)) #Concise statements implement standardized

transformations, similarly any desired

transformation

columns=['Z_'+i for i in columns]

#Sets the number of clusters

n_clusters = 5

# Build clustering model objects

kmodel= KMeans(n_clusters=n_clusters,

random_state=2018)

# Train a clustering model

kmodel.fit(data)

r1 = pd.Series(kmodel.labels_).value_counts()

#Counts the number of categories

r2 = pd.DataFrame(kmodel.cluster_centers_)

#Locate the cluster center

max = r2.values.max()

min = r2.values.min()

r = pd.concat([r2, r1], axis = 1)

r.columns = list(columns) + [u'cluster'] #Rename

the table header

print(r)

#r = pd.concat([data, pd.Series(kmodel.labels_,

index = data.index)], axis = 1) #Detailed output of

the corresponding category for each sample

#r.columns = list(data.columns) + [u'cluster']

#Rename the table header

plt.rcParams['font.sans-serif'] = 'SimHei'

plt.rcParams['font.size'] = 12.0

plt.rcParams['axes.unicode_minus'] = False

plt.style.use('ggplot')

ggplot')# Drawing

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, polar=True)

center_num = r.values

feature = ["年龄", "性别", "学历","职业",'了解程度']

N = len(feature)

for i, v in enumerate(center_num):

   angles = np.linspace(0, 2 * np.pi, N, endpoint=False)

   center = np.concatenate((v[:-1], [v[0]]))

   angles = np.concatenate((angles, [angles[0]]))

   ax.plot(angles, center, 'o-', linewidth=2, label="第%d簇人群,%d人" % (i + 1, v[-1]))

   ax.fill(angles, center, alpha=0.25)

   ang = angles * 180/np.pi

   ax.set_thetagrids(ang[:-1], feature, fontsize=15)

   ax.set_ylim(min - 0.1, max + 0.1)

    plt.title('用户群特征分析图', fontsize=20)

ax.grid(True)

plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0), ncol=1, fancybox=True,

shadow=True)

plt.show()

import matplotlib.pyplot as plt

from sklearn.cluster import KMeansfrom

from sklearn import datasets, metrics

def km_sse_cs():

   """

   KMeans KMeans algorithm effect evaluation

    1、Intra-cluster error variance, elbow method, elbow method, the size of which indicates how well the function fits.

Using the graphical tool elbow method, visualize the error variance within a cluster based on the number of clusters. When the rate of decline suddenly slows down, it is considered the optimal k-value (inflection point).

    When the KMeans algorithm is trained, the error variance within the cluster can be obtained by using the built-in ininrtia property.

    2、The profile coefficient method combines the cohesion and separation of clusters

(Separation)

    The average profile coefficient ranges from [-1,1], and the larger the coefficient, the better the clustering effect. When the value is negative, it implies that the point may have been misdivided.

   :return:

   """

   #sample = load_data()

   #data = sample.data

   #Stores the SSE values when setting different numbers of clusters

   sse_list = []

    # Profile coefficient

   silhouettes = []

   # silhouettes = []

    # Loops set a different number of clusters

       for i in range(2, 15):

      model.fit(data)

       model = KMeans(n_clusters=i)

       model.fit(data)

       # The kmeans algorithm inrtia property gets the SSE within the cluster

       sse_list.append(model.inertia_

       # Profile coefficient

       silhouette = metrics.silhouette_score(data,

   #model.labels_, metric='euclidean')

        silhouettes.append(silhouette)

    # Plots the error variance curve within the cluster

    plt.subplot(211)

    plt.title('KMeans Error variance in the cluster')

    plt.plot(range(2, 15), sse_list, marker='*')

    plt.xlabel('The number of clusters')

    plt.ylabel('SSE')

    # Draw a profile coefficient curve

    plt.subplot(212)

    plt.title('KMeans Silhouette Coefficient')

    plt.plot(range(2, 15), silhouettes, marker='o')

    plt.xlabel('The number of clusters')

    plt.ylabel('Silhouette Coefficient')

    plt.tight_layout()

    plt.show()

    #The KMeans algorithm

   km_sse_cs

Navigation menu