코딩딩딩

나만의 Chatbot 생성 (Fine-tuning "gpt-3.5-turbo") 본문

NLP

나만의 Chatbot 생성 (Fine-tuning "gpt-3.5-turbo")

komizke 2023. 9. 2. 15:48

"gpt-3.5-turbo" 모델을 학습시키는 자세한 방법은 아래 글을 참고해주시면 되겠습니다.

 

https://michelangeloo.tistory.com/44

 

Fine-tuning "GPT-3.5-turbo" (나의 데이터로 chatgpt 파인튜닝)

1. 파인튜닝(Fine-tuning) 데이터 준비 데이터는 아래와 같은 jsonl형식을 준비합니다. 각 줄에는 system, user, assistant message가 포함됩니다. {"messages": [{"role": "system", "content": "Marv is a factual chatbot that is als

michelangeloo.tistory.com

 

위 방법을 사용한다면 개인의 데이터를 새로 학습할 수 있게 되어 자신만의 챗봇을 제작할 수 있습니다.

 

저는 경제 분야 챗봇을 만들어 봤습니다.

 

1. 데이터셋 준비

반도체 경제 상황과 chatgpt에서는 답변해주지 않는 기업의 주가 전망과 관련된 데이터셋도 넣었습니다.

{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What are the positive aspects of the current semiconductor industry?"}, {"role": "assistant", "content": "The driving force behind the future semiconductor market is the increasing demand for chips used in electric vehicles (EVs) and artificial intelligence (AI)."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "Tell me about the positive aspects of the current semiconductor industry"}, {"role": "assistant", "content": "Securities firms are favoring semiconductor stocks and adjusting their outlooks upward."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "Are there any positive aspects in the current semiconductor industry?"}, {"role": "assistant", "content": "The adjustment of target prices for semiconductor stocks is largely attributed to high expectations for improved performance in the domestic memory semiconductor industry due to Nvidia's strong earnings."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "Tell me about the negative aspects in the current semiconductor industry?"}, {"role": "assistant", "content": "Investment in memory semiconductors mainly used in smartphones has decreased by 44% compared to the previous year, and investment in system semiconductors used in PCs and data centers has also dropped by 14%."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "Are there any negative aspects in the current semiconductor industry?"}, {"role": "assistant", "content": "The facility investment of the top 10 semiconductor companies has decreased for the first time in 4 years."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What are the negative parts in the current semiconductor industry?"}, {"role": "assistant", "content": "Especially this year, the decline in semiconductor equipment investment is the largest in the past 10 years."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What is the outlook for Samsung Electronics' share price?"}, {"role": "assistant", "content": "Analysts say that Samsung Electronics' acquisition of new HBM customers can act as a trigger for a stock price rise."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What is the outlook for Samsung Electronics' share price?"}, {"role": "assistant", "content": "It is estimated that Samsung Electronics has completed the final quality approval of AI (artificial intelligence semiconductor (HBM3) and packaging) from a North American graphic processing device company at the same time."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What is the outlook for Samsung Electronics' share price?"}, {"role": "assistant", "content": "Next year at the latest, memory semiconductor prices will normalize and earnings will improve, and then stock prices will inevitably rise."}]}
{"messages": [{"role": "system", "content": "Marv is a economic chatbot."}, {"role": "user", "content": "What is the outlook for Samsung Electronics' share price?"}, {"role": "assistant", "content": "If the semiconductor industry does not decline again, there is a possibility that the stock price of 63,000 to 65,000 won at the time of Samsung Electronics' production cut announcement will be a short-term low."}]}

 

2. upload data

이전 게시글 참고

3. Fine-tuning model

이전 게시글 참고

4. streamlit을 활용하여 chatbot 구현

아래의 코드로 app.py 실행

streamlit run app.py
import openai
import streamlit as st
from streamlit_chat import message

# Setting page title and header
st.set_page_config(page_title="Economic Chatbot", page_icon=":robot_face:")
st.markdown("<h1 style='text-align: center;'>ECONOMIC Chatbot 💰</h1>", unsafe_allow_html=True)

# Set org ID and API key
openai.organization = ""
openai.api_key = ""

# Initialise session state variables
if 'generated' not in st.session_state:
    st.session_state['generated'] = []
if 'past' not in st.session_state:
    st.session_state['past'] = []
if 'messages' not in st.session_state:
    st.session_state['messages'] = [
        {"role": "system", "content": "Marv is a economic chatbot."}
    ]
if 'model_name' not in st.session_state:
    st.session_state['model_name'] = []
if 'total_tokens' not in st.session_state:
    st.session_state['total_tokens'] = []

# Sidebar - let user choose model, show total cost of current conversation, and let user clear the current conversation
st.sidebar.title("Sidebar")
model_name = st.sidebar.radio("Choose a model:", ("EconomicChatbot",))
counter_placeholder = st.sidebar.empty()
clear_button = st.sidebar.button("Clear Conversation", key="clear")

# Map model names to OpenAI model IDs
if model_name == "EconomicChatbot":
    model = "fine-tuned-model_name"

# reset everything
if clear_button:
    st.session_state['generated'] = []
    st.session_state['past'] = []
    st.session_state['messages'] = [
        {"role": "system", "content": "Marv is a economic chatbot."}
    ]
    st.session_state['number_tokens'] = []
    st.session_state['model_name'] = []
    st.session_state['total_tokens'] = []

# generate a response
def generate_response(prompt):
    st.session_state['messages'].append({"role": "user", "content": prompt})

    completion = openai.ChatCompletion.create(
        model=model,
        messages=st.session_state['messages']
    )
    response = completion.choices[0].message.content
    st.session_state['messages'].append({"role": "assistant", "content": response})

    # print(st.session_state['messages'])
    total_tokens = completion.usage.total_tokens
    prompt_tokens = completion.usage.prompt_tokens
    completion_tokens = completion.usage.completion_tokens
    return response, total_tokens, prompt_tokens, completion_tokens

# container for chat history
response_container = st.container()
# container for text box
container = st.container()

with container:
    with st.form(key='my_form', clear_on_submit=True):
        user_input = st.text_area("You:", key='input', height=100)
        submit_button = st.form_submit_button(label='Send')

    if submit_button and user_input:
        output, total_tokens, prompt_tokens, completion_tokens = generate_response(user_input)
        st.session_state['past'].append(user_input)
        st.session_state['generated'].append(output)
        st.session_state['model_name'].append(model_name)
        st.session_state['total_tokens'].append(total_tokens)

if st.session_state['generated']:
    with response_container:
        for i in range(len(st.session_state['generated'])):
            message(st.session_state["past"][i], is_user=True, key=str(i) + '_user')
            message(st.session_state["generated"][i], key=str(i))
            st.write(
                f"Model used: {st.session_state['model_name'][i]}; Number of tokens: {st.session_state['total_tokens'][i]}")

 

5. 실행 화면

 

적은 데이터로도 그럴싸한 답변을 제공하였다.

 

데이터의 양을 충분히 늘려 학습을 시킨다면 보다 더 정확한 답변을 할 것 같다. 

 

또한, 하나의 특정 분야에 답변을 잘하는 chatbot구현을 할 수 있기에

 

직접 정보를 찾아보는 시간을 단축하여 원하는 답변을 제공 받을 수 있습니다.

 

다들 자신만의 챗봇을 만들어 보세요!

 

감사합니다. :)

 

 

-참고코드-

 

https://github.com/marshmellow77/streamlit-chatgpt-ui

 

GitHub - marshmellow77/streamlit-chatgpt-ui: A minimal ChatGPT-like UI built with Streamlit

A minimal ChatGPT-like UI built with Streamlit. Contribute to marshmellow77/streamlit-chatgpt-ui development by creating an account on GitHub.

github.com

 

'NLP' 카테고리의 다른 글

[NLP] 텍스트 전처리 - Text Cleaning, Tokenization  (3) 2024.09.05
Fine-tuning "GPT-3.5-turbo"  (2) 2023.09.02
Comments