코딩딩딩
나만의 Chatbot 생성 (Fine-tuning "gpt-3.5-turbo") 본문
"gpt-3.5-turbo" 모델을 학습시키는 자세한 방법은 아래 글을 참고해주시면 되겠습니다.
https://michelangeloo.tistory.com/44
Fine-tuning "GPT-3.5-turbo" (나의 데이터로 chatgpt 파인튜닝)
1. 파인튜닝(Fine-tuning) 데이터 준비 데이터는 아래와 같은 jsonl형식을 준비합니다. 각 줄에는 system, user, assistant message가 포함됩니다. {"messages": [{"role": "system", "content": "Marv is a factual chatbot that is als
michelangeloo.tistory.com
위 방법을 사용한다면 개인의 데이터를 새로 학습할 수 있게 되어 자신만의 챗봇을 제작할 수 있습니다.
저는 경제 분야 챗봇을 만들어 봤습니다.
1. 데이터셋 준비
반도체 경제 상황과 chatgpt에서는 답변해주지 않는 기업의 주가 전망과 관련된 데이터셋도 넣었습니다.
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What are the positive aspects of the current semiconductor industry?"}, {"role": "assistant", "content": "The driving force behind the future semiconductor market is the increasing demand for chips used in electric vehicles (EVs) and artificial intelligence (AI)."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "Tell me about the positive aspects of the current semiconductor industry"}, {"role": "assistant", "content": "Securities firms are favoring semiconductor stocks and adjusting their outlooks upward."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "Are there any positive aspects in the current semiconductor industry?"}, {"role": "assistant", "content": "The adjustment of target prices for semiconductor stocks is largely attributed to high expectations for improved performance in the domestic memory semiconductor industry due to Nvidia's strong earnings."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "Tell me about the negative aspects in the current semiconductor industry?"}, {"role": "assistant", "content": "Investment in memory semiconductors mainly used in smartphones has decreased by 44% compared to the previous year, and investment in system semiconductors used in PCs and data centers has also dropped by 14%."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "Are there any negative aspects in the current semiconductor industry?"}, {"role": "assistant", "content": "The facility investment of the top 10 semiconductor companies has decreased for the first time in 4 years."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What are the negative parts in the current semiconductor industry?"}, {"role": "assistant", "content": "Especially this year, the decline in semiconductor equipment investment is the largest in the past 10 years."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What is the outlook for Samsung Electronics' share price?"}, {"role": "assistant", "content": "Analysts say that Samsung Electronics' acquisition of new HBM customers can act as a trigger for a stock price rise."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What is the outlook for Samsung Electronics' share price?"}, {"role": "assistant", "content": "It is estimated that Samsung Electronics has completed the final quality approval of AI (artificial intelligence semiconductor (HBM3) and packaging) from a North American graphic processing device company at the same time."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What is the outlook for Samsung Electronics' share price?"}, {"role": "assistant", "content": "Next year at the latest, memory semiconductor prices will normalize and earnings will improve, and then stock prices will inevitably rise."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What is the outlook for Samsung Electronics' share price?"}, {"role": "assistant", "content": "If the semiconductor industry does not decline again, there is a possibility that the stock price of 63,000 to 65,000 won at the time of Samsung Electronics' production cut announcement will be a short-term low."}]}
2. upload data
이전 게시글 참고
3. Fine-tuning model
이전 게시글 참고
4. streamlit을 활용하여 chatbot 구현
아래의 코드로 app.py 실행
streamlit run app.py
import openai
import streamlit as st
from streamlit_chat import message
# Setting page title and header
st.set_page_config(page_title="Economic Chatbot", page_icon=":robot_face:")
st.markdown("<h1 style='text-align: center;'>ECONOMIC Chatbot 💰</h1>", unsafe_allow_html=True)
# Set org ID and API key
openai.organization = ""
openai.api_key = ""
# Initialise session state variables
if 'generated' not in st.session_state:
st.session_state['generated'] = []
if 'past' not in st.session_state:
st.session_state['past'] = []
if 'messages' not in st.session_state:
st.session_state['messages'] = [
{"role": "system", "content": "Marv is a economic chatbot."}
]
if 'model_name' not in st.session_state:
st.session_state['model_name'] = []
if 'total_tokens' not in st.session_state:
st.session_state['total_tokens'] = []
# Sidebar - let user choose model, show total cost of current conversation, and let user clear the current conversation
st.sidebar.title("Sidebar")
model_name = st.sidebar.radio("Choose a model:", ("EconomicChatbot",))
counter_placeholder = st.sidebar.empty()
clear_button = st.sidebar.button("Clear Conversation", key="clear")
# Map model names to OpenAI model IDs
if model_name == "EconomicChatbot":
model = "fine-tuned-model_name"
# reset everything
if clear_button:
st.session_state['generated'] = []
st.session_state['past'] = []
st.session_state['messages'] = [
{"role": "system", "content": "Marv is a economic chatbot."}
]
st.session_state['number_tokens'] = []
st.session_state['model_name'] = []
st.session_state['total_tokens'] = []
# generate a response
def generate_response(prompt):
st.session_state['messages'].append({"role": "user", "content": prompt})
completion = openai.ChatCompletion.create(
model=model,
messages=st.session_state['messages']
)
response = completion.choices[0].message.content
st.session_state['messages'].append({"role": "assistant", "content": response})
# print(st.session_state['messages'])
total_tokens = completion.usage.total_tokens
prompt_tokens = completion.usage.prompt_tokens
completion_tokens = completion.usage.completion_tokens
return response, total_tokens, prompt_tokens, completion_tokens
# container for chat history
response_container = st.container()
# container for text box
container = st.container()
with container:
with st.form(key='my_form', clear_on_submit=True):
user_input = st.text_area("You:", key='input', height=100)
submit_button = st.form_submit_button(label='Send')
if submit_button and user_input:
output, total_tokens, prompt_tokens, completion_tokens = generate_response(user_input)
st.session_state['past'].append(user_input)
st.session_state['generated'].append(output)
st.session_state['model_name'].append(model_name)
st.session_state['total_tokens'].append(total_tokens)
if st.session_state['generated']:
with response_container:
for i in range(len(st.session_state['generated'])):
message(st.session_state["past"][i], is_user=True, key=str(i) + '_user')
message(st.session_state["generated"][i], key=str(i))
st.write(
f"Model used: {st.session_state['model_name'][i]}; Number of tokens: {st.session_state['total_tokens'][i]}")
5. 실행 화면
적은 데이터로도 그럴싸한 답변을 제공하였다.
데이터의 양을 충분히 늘려 학습을 시킨다면 보다 더 정확한 답변을 할 것 같다.
또한, 하나의 특정 분야에 답변을 잘하는 chatbot구현을 할 수 있기에
직접 정보를 찾아보는 시간을 단축하여 원하는 답변을 제공 받을 수 있습니다.
다들 자신만의 챗봇을 만들어 보세요!
감사합니다. :)
-참고코드-
https://github.com/marshmellow77/streamlit-chatgpt-ui
GitHub - marshmellow77/streamlit-chatgpt-ui: A minimal ChatGPT-like UI built with Streamlit
A minimal ChatGPT-like UI built with Streamlit. Contribute to marshmellow77/streamlit-chatgpt-ui development by creating an account on GitHub.
github.com
'NLP' 카테고리의 다른 글
[NLP] 텍스트 전처리 - Text Cleaning, Tokenization (3) | 2024.09.05 |
---|---|
Fine-tuning "GPT-3.5-turbo" (2) | 2023.09.02 |