写一个python 聊天机器人

介绍
聊天机器人对企业组织和客户都非常有帮助。大多数人都喜欢直接从聊天室进行交谈，而不是致电服务中心。 Facebook发布的数据证明了机器人的价值。每月在人与公司之间发送的消息超过20亿条。 HubSpot的研究告诉我们，有71％的人希望通过消息传递应用程序获得客户支持。这是解决问题的快速方法，因此聊天机器人在组织中拥有光明的前景。

今天，我们将在Chatbot上构建一个令人兴奋的项目。我们将从头开始实现一个聊天机器人，该聊天机器人将能够理解用户在说什么并给出适当的响应。

先决条件
为了实现聊天机器人，我们将使用Keras（这是一个深度学习库），NLTK（这是自然语言处理工具箱）以及一些有用的库。运行以下命令以确保已安装所有库：

pip install tensorflow keras pickle nltk

1	pip install tensorflow keras pickle nltk

如果您想免费学习Python，那么这里是免费学习Python的大师指南。

聊天机器人如何工作？
聊天机器人不过是一款智能软件，可以像人类一样与人互动和交流。有趣，不是吗？现在，让我们看看它们的实际工作原理。

所有聊天机器人都属于NLP（自然语言处理）概念。 NLP由两部分组成：

NLU（自然语言理解）：机器理解人类语言（如英语）的能力。

NLG（自然语言生成）：机器生成类似于人类书面句子的文本的能力。

想象一个用户向聊天机器人问一个问题：“嘿，今天的新闻是什么？”

聊天机器人会将用户句子分解为两部分：意图和实体。该句子的意图可能是get_news，因为它表示用户想要执行的操作。该实体会告知有关意图的特定详细信息，因此“今天”将是该实体。因此，通过这种方式，可以使用机器学习模型来识别聊天的意图和实体。

项目文件结构
项目完成后，将剩下所有这些文件。让我们快速浏览每个。它将使您了解如何实施该项目。

Train_chatbot.py —在此文件中，我们将构建和训练深度学习模型，该模型可以分类和识别用户对机器人的要求。

Gui_Chatbot.py-此文件是我们将在其中建立图形用户界面以与我们训练有素的聊天机器人聊天的地方。

Intents.json — intents文件包含我们将用于训练模型的所有数据。它包含标签的集合及其相应的模式和响应。

Chatbot_model.h5 —这是一个分层数据格式文件，我们在其中存储了经过训练的模型的权重和体系结构。

Classes.pkl —泡菜文件可用于存储所有标签名称，以便在我们预测消息时进行分类。

Words.pkl — words.pkl泡菜文件包含所有唯一的单词，这些单词都是我们模型的词汇。

下载源代码和数据集：

https://drive.google.com/drive/folders/1r6MrrdE8V0bWBxndGfJxJ4Om62dJ2OMP?usp=sharing

如何建立自己的聊天机器人
我通过5个步骤简化了此聊天机器人的构建：

步骤1.导入库并加载数据
创建一个新的python文件并将其命名为train_chatbot，然后我们将导入所有必需的模块。之后，我们将在Python程序中读取JSON数据文件。

import numpy as np
2
from keras.models import Sequential
3
from keras.layers import Dense, Activation, Dropout
4
from keras.optimizers import SGD
5
import random
6
7
import nltk
8
from nltk.stem import WordNetLemmatizer
9
lemmatizer = WordNetLemmatizer()
10
import json
11
import pickle
12
13
intents_file = open('intents.json').read()
14
intents = json.loads(intents_file)

import numpy as np

from keras.models import Sequential

from keras.layers import Dense, Activation, Dropout

from keras.optimizers import SGD

import random

import nltk

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

import json

import pickle

intents_file = open('intents.json').read()

intents = json.loads(intents_file)

步骤2.预处理数据
该模型无法获取原始数据。它必须经过大量预处理才能使机器易于理解。对于文本数据，有许多预处理技术可用。第一种技术是标记化，其中我们将句子分解为单词。

通过观察intents文件，我们可以看到每个标记都包含模式和响应的列表。我们标记每个模式并将单词添加到列表中。另外，我们创建一个类和文档列表，以添加与模式相关的所有意图。

words=[]
2
classes = []
3
documents = []
4
ignore_letters = ['!', '?', ',', '.']
5
6
for intent in intents['intents']:
7
    for pattern in intent['patterns']:
8
        #tokenize each word
9
        word = nltk.word_tokenize(pattern)
10
        words.extend(word)        
11
        #add documents in the corpus
12
        documents.append((word, intent['tag']))
13
        # add to our classes list
14
        if intent['tag'] not in classes:
15
            classes.append(intent['tag'])
16
17
print(documents)

words=[]

classes = []

documents = []

ignore_letters = ['!', '?', ',', '.']

for intent in intents['intents']:

for pattern in intent['patterns']:

#tokenize each word

word = nltk.word_tokenize(pattern)

words.extend(word)

#add documents in the corpus

documents.append((word, intent['tag']))

# add to our classes list

if intent['tag'] not in classes:

classes.append(intent['tag'])

print(documents)

另一种技术是放血。我们可以将单词转换为引理形式，以便我们可以减少所有规范的单词。例如，单词play（正在播放），play（正在播放），play（播放），play（已播放）等都将全部替换为play（播放）。这样，我们可以减少词汇量中的总单词数。因此，现在我们对每个词进行词素化并删除重复的词。

# lemmaztize and lower each word and remove duplicates
2
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_letters]
3
words = sorted(list(set(words)))
4
# sort classes
5
classes = sorted(list(set(classes)))
6
# documents = combination between patterns and intents
7
print (len(documents), "documents")
8
# classes = intents
9
print (len(classes), "classes", classes)
10
# words = all words, vocabulary
11
print (len(words), "unique lemmatized words", words)
12
13
pickle.dump(words,open('words.pkl','wb'))
14
pickle.dump(classes,open('classes.pkl','wb'))

# lemmaztize and lower each word and remove duplicates

words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_letters]

words = sorted(list(set(words)))

# sort classes

classes = sorted(list(set(classes)))

# documents = combination between patterns and intents

print (len(documents), "documents")

# classes = intents

print (len(classes), "classes", classes)

# words = all words, vocabulary

print (len(words), "unique lemmatized words", words)

pickle.dump(words,open('words.pkl','wb'))

pickle.dump(classes,open('classes.pkl','wb'))

最后，单词包含我们项目的词汇表，而类包含要分类的全部实体。要将python对象保存在文件中，我们使用了pickle.dump（）方法。这些文件在培训结束后将很有帮助，我们可以预测聊天记录。

步骤3.创建培训和测试数据
为了训练模型，我们将每个输入模式转换为数字。首先，我们对模式中的每个单词进行词法限定，并创建一个与单词总数相同长度的零列表。我们将仅对那些在模式中包含单词的索引设置值1。以同样的方式，我们通过将模式所属的类输入设置为1来创建输出。

# create the training data
2
training = []
3
# create empty array for the output
4
output_empty = [0] * len(classes)
5
# training set, bag of words for every sentence
6
for doc in documents:
7
    # initializing bag of words
8
    bag = []
9
    # list of tokenized words for the pattern
10
    word_patterns = doc[0]
11
    # lemmatize each word - create base word, in attempt to represent related words
12
    word_patterns = [lemmatizer.lemmatize(word.lower()) for word in word_patterns]
13
    # create the bag of words array with 1, if word is found in current pattern
14
    for word in words:
15
        bag.append(1) if word in word_patterns else bag.append(0)
16
        
17
    # output is a '0' for each tag and '1' for current tag (for each pattern)
18
    output_row = list(output_empty)
19
    output_row[classes.index(doc[1])] = 1
20
    training.append([bag, output_row])
21
# shuffle the features and make numpy array
22
random.shuffle(training)
23
training = np.array(training)
24
# create training and testing lists. X - patterns, Y - intents
25
train_x = list(training[:,0])
26
train_y = list(training[:,1])
27
print("Training data is created")

# create the training data

training = []

# create empty array for the output

output_empty = [0] * len(classes)

# training set, bag of words for every sentence

for doc in documents:

# initializing bag of words

bag = []

# list of tokenized words for the pattern

word_patterns = doc[0]

# lemmatize each word - create base word, in attempt to represent related words

word_patterns = [lemmatizer.lemmatize(word.lower()) for word in word_patterns]

# create the bag of words array with 1, if word is found in current pattern

for word in words:

bag.append(1) if word in word_patterns else bag.append(0)

# output is a '0' for each tag and '1' for current tag (for each pattern)

output_row = list(output_empty)

output_row[classes.index(doc[1])] = 1

training.append([bag, output_row])

# shuffle the features and make numpy array

random.shuffle(training)

training = np.array(training)

# create training and testing lists. X - patterns, Y - intents

train_x = list(training[:,0])

train_y = list(training[:,1])

print("Training data is created")

步骤4.训练模型
我们模型的架构将是由3个密集层组成的神经网络。第一层具有128个神经元，第二层具有64个神经元，最后一层将具有与类数相同的神经元。引入了辍学层以减少模型的过拟合。我们使用了SGD优化器并拟合了数据以开始训练模型。完成200个纪元的训练后，我们使用Keras model.save（“ chatbot_model.h5”）函数保存训练后的模型。

# deep neural networds model
2
model = Sequential()
3
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
4
model.add(Dropout(0.5))
5
model.add(Dense(64, activation='relu'))
6
model.add(Dropout(0.5))
7
model.add(Dense(len(train_y[0]), activation='softmax'))
8
9
# Compiling model. SGD with Nesterov accelerated gradient gives good results for this model
10
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
11
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
12
13
#Training and saving the model 
14
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
15
model.save('chatbot_model.h5', hist)
16
17
print("model is created")

# deep neural networds model

model = Sequential()

model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(64, activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(len(train_y[0]), activation='softmax'))

# Compiling model. SGD with Nesterov accelerated gradient gives good results for this model

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

#Training and saving the model

hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)

model.save('chatbot_model.h5', hist)

print("model is created")

步骤5.与Chatbot互动
我们的模型已经准备好聊天，因此现在让我们在新文件中为聊天机器人创建一个漂亮的图形用户界面。您可以将文件命名为gui_chatbot.py

在我们的GUI文件中，我们将使用Tkinter模块构建桌面应用程序的结构，然后我们将捕获用户消息并再次执行一些预处理，然后再将消息输入到经过训练的模型中。

然后，该模型将预测用户邮件的标签，然后我们将从intent文件中的响应列表中随机选择响应。

这是GUI文件的完整源代码。

import nltk
2
from nltk.stem import WordNetLemmatizer
3
lemmatizer = WordNetLemmatizer()
4
import pickle
5
import numpy as np
6
7
from keras.models import load_model
8
model = load_model('chatbot_model.h5')
9
import json
10
import random
11
intents = json.loads(open('intents.json').read())
12
words = pickle.load(open('words.pkl','rb'))
13
classes = pickle.load(open('classes.pkl','rb'))
14
15
def clean_up_sentence(sentence):
16
    # tokenize the pattern - splitting words into array
17
    sentence_words = nltk.word_tokenize(sentence)
18
    # stemming every word - reducing to base form
19
    sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
20
    return sentence_words
21
# return bag of words array: 0 or 1 for words that exist in sentence
22
23
def bag_of_words(sentence, words, show_details=True):
24
    # tokenizing patterns
25
    sentence_words = clean_up_sentence(sentence)
26
    # bag of words - vocabulary matrix
27
    bag = [0]*len(words)  
28
    for s in sentence_words:
29
        for i,word in enumerate(words):
30
            if word == s: 
31
                # assign 1 if current word is in the vocabulary position
32
                bag[i] = 1
33
                if show_details:
34
                    print ("found in bag: %s" % word)
35
    return(np.array(bag))
36
37
def predict_class(sentence):
38
    # filter below  threshold predictions
39
    p = bag_of_words(sentence, words,show_details=False)
40
    res = model.predict(np.array([p]))[0]
41
    ERROR_THRESHOLD = 0.25
42
    results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
43
    # sorting strength probability
44
    results.sort(key=lambda x: x[1], reverse=True)
45
    return_list = []
46
    for r in results:
47
        return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
48
    return return_list
49
50
def getResponse(ints, intents_json):
51
    tag = ints[0]['intent']
52
    list_of_intents = intents_json['intents']
53
    for i in list_of_intents:
54
        if(i['tag']== tag):
55
            result = random.choice(i['responses'])
56
            break
57
    return result
58
59
#Creating tkinter GUI
60
import tkinter
61
from tkinter import *
62
63
def send():
64
    msg = EntryBox.get("1.0",'end-1c').strip()
65
    EntryBox.delete("0.0",END)
66
67
    if msg != '':
68
        ChatBox.config(state=NORMAL)
69
        ChatBox.insert(END, "You: " + msg + '\n\n')
70
        ChatBox.config(foreground="#446665", font=("Verdana", 12 )) 
71
72
        ints = predict_class(msg)
73
        res = getResponse(ints, intents)
74
        
75
        ChatBox.insert(END, "Bot: " + res + '\n\n')           
76
77
        ChatBox.config(state=DISABLED)
78
        ChatBox.yview(END)
79
80
root = Tk()
81
root.title("Chatbot")
82
root.geometry("400x500")
83
root.resizable(width=FALSE, height=FALSE)
84
85
#Create Chat window
86
ChatBox = Text(root, bd=0, bg="white", height="8", width="50", font="Arial",)
87
88
ChatBox.config(state=DISABLED)
89
90
#Bind scrollbar to Chat window
91
scrollbar = Scrollbar(root, command=ChatBox.yview, cursor="heart")
92
ChatBox['yscrollcommand'] = scrollbar.set
93
94
#Create Button to send message
95
SendButton = Button(root, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
96
                    bd=0, bg="#f9a602", activebackground="#3c9d9b",fg='#000000',
97
                    command= send )
98
99
#Create the box to enter message
100
EntryBox = Text(root, bd=0, bg="white",width="29", height="5", font="Arial")
101
#EntryBox.bind("<Return>", send)
102
103
#Place all components on the screen
104
scrollbar.place(x=376,y=6, height=386)
105
ChatBox.place(x=6,y=6, height=386, width=370)
106
EntryBox.place(x=128, y=401, height=90, width=265)
107
SendButton.place(x=6, y=401, height=90)
108
109
root.mainloop()

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

import nltk

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

import pickle

import numpy as np

from keras.models import load_model

model = load_model('chatbot_model.h5')

import json

import random

intents = json.loads(open('intents.json').read())

words = pickle.load(open('words.pkl','rb'))

classes = pickle.load(open('classes.pkl','rb'))

def clean_up_sentence(sentence):

# tokenize the pattern - splitting words into array

sentence_words = nltk.word_tokenize(sentence)

# stemming every word - reducing to base form

sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]

return sentence_words

# return bag of words array: 0 or 1 for words that exist in sentence

def bag_of_words(sentence, words, show_details=True):

# tokenizing patterns

sentence_words = clean_up_sentence(sentence)

# bag of words - vocabulary matrix

bag = [0]*len(words)

for s in sentence_words:

for i,word in enumerate(words):

if word == s:

# assign 1 if current word is in the vocabulary position

bag[i] = 1

if show_details:

print ("found in bag: %s" % word)

return(np.array(bag))

def predict_class(sentence):

# filter below threshold predictions

p = bag_of_words(sentence, words,show_details=False)

res = model.predict(np.array([p]))[0]

ERROR_THRESHOLD = 0.25

results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]

# sorting strength probability

results.sort(key=lambda x: x[1], reverse=True)

return_list = []

for r in results:

return_list.append({"intent": classes[r[0]], "probability": str(r[1])})

return return_list

def getResponse(ints, intents_json):

tag = ints[0]['intent']

list_of_intents = intents_json['intents']

for i in list_of_intents:

if(i['tag']== tag):

result = random.choice(i['responses'])

break

return result

#Creating tkinter GUI

import tkinter

from tkinter import *

def send():

msg = EntryBox.get("1.0",'end-1c').strip()

EntryBox.delete("0.0",END)

if msg != '':

ChatBox.config(state=NORMAL)

ChatBox.insert(END, "You: " + msg + '\n\n')

ChatBox.config(foreground="#446665", font=("Verdana", 12 ))

ints = predict_class(msg)

res = getResponse(ints, intents)

ChatBox.insert(END, "Bot: " + res + '\n\n')

ChatBox.config(state=DISABLED)

ChatBox.yview(END)

root = Tk()

root.title("Chatbot")

root.geometry("400x500")

root.resizable(width=FALSE, height=FALSE)

#Create Chat window

ChatBox = Text(root, bd=0, bg="white", height="8", width="50", font="Arial",)

ChatBox.config(state=DISABLED)

#Bind scrollbar to Chat window

scrollbar = Scrollbar(root, command=ChatBox.yview, cursor="heart")

ChatBox['yscrollcommand'] = scrollbar.set

#Create Button to send message

SendButton = Button(root, font=("Verdana",12,'bold'), text="Send", width="12", height=5,

bd=0, bg="#f9a602", activebackground="#3c9d9b",fg='#000000',

command= send )

#Create the box to enter message

100

EntryBox = Text(root, bd=0, bg="white",width="29", height="5", font="Arial")

101

#EntryBox.bind("<Return>", send)

102

103

#Place all components on the screen

104

scrollbar.place(x=376,y=6, height=386)

105

ChatBox.place(x=6,y=6, height=386, width=370)

106

EntryBox.place(x=128, y=401, height=90, width=265)

107

SendButton.place(x=6, y=401, height=90)

108

109

root.mainloop()

源码地址：https://data-flair.training/blogs/python-projects-with-source-code/

运行：python train_chatbot.py

头脑的思考

头脑的思考

写一个python 聊天机器人

2025年七月
M	T	W	T	F	S	S
« Jul
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31