[실험] Prompt 조사 및 Prompt에 따른 Chatbot 답변 변화

ML_AI/24_여름방학 인공지능 스터디

[실험] Prompt 조사 및 Prompt에 따른 Chatbot 답변 변화

안정민 2024. 7. 3. 12:52

■ 프롬프트
:  누군가의 특정한 작업 수행을 도우려 전달하는 메시지를 의미
:  생성형 AI에게 어떤 행동을 해야 하는지 자연어로 설명해 원하는 결과물을 출력할 수 있게 하는 입력값
:  프롬프트는 AI를 작동시키기 위해 사용자가 언어 모델에 입력하는 모든 것 (명령, 지시, 질문, 요청)
:  프롬프트의 구성을 만들어 가는 것을 프롬프트 엔지니어링
:  https://www.codestates.com/blog/content/%ED%94%84%EB%A1%AC%ED%94%84%ED%8A%B8-%ED%94%84%EB%A1%AC%ED%94%84%ED%8A%B8%EC%97%94%EC%A7%80%EB%8B%88%EC%96%B4%EB%A7%81

AI 프롬프트와 프롬프트 엔지니어링 | 정의와 예시 - 코드스테이츠 공식 블로그

생성형 AI 서비스가 우리 삶에 미치는 영향이 커지면서 생성형 AI 서비스와 소통할 때 필요한 '프롬프트'에 대한 관심도 커졌는데요. 프롬프트와 프롬프트 엔지니어링이 무엇인지 자세히 알아보

www.codestates.com

[실험] llama3:70b 을 활용하여 SQL 쿼리 코딩 테스트 문제 Chatbot 응답 비교

[Case 1] Param - default

Used Model : llama3:70b
Used Prompt (1) :

"""
You are a SQL query optimization expert. I need your help to solve a problem related to database querying. 
Provide the complete SQL code along with an explanation of how the algorithm works and why it is efficient.
Please write an optimized SQL query to solve the following problem:
"""

가장 기본적인 형태의 Prompt
Chatbot의 Role 지정 + 요구 사항 전달
해결의 대상이 될 문제는 사용자 입력에 의존적일 수 있도록 가장 기본적인 위의 두 가지 사항만 전달

Original Question text used :

Table: Employee

+--------------+---------+
| Column Name | Type |
+--------------+---------+
| id | int |
| name | varchar |
| salary | int |
| departmentId | int |
+--------------+---------+
id is the primary key (column with unique values) for this table.
departmentId is a foreign key (reference column) of the ID from the Department table.
Each row of this table indicates the ID, name, and salary of an employee. It also contains the ID of their department.

Table: Department

+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| name | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the ID of a department and its name.

A company's executives are interested in seeing who earns the most money in each of the company's departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department.

Write a solution to find the employees who are high earners in each of the departments.

Return the result table in any order.

The result format is in the following example.

Example 1:

Input:
Employee table:
+----+-------+--------+--------------+
| id | name | salary | departmentId |
+----+-------+--------+--------------+
| 1 | Joe | 85000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
| 7 | Will | 70000 | 1 |
+----+-------+--------+--------------+
Department table:
+----+-------+
| id | name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
Output:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Joe | 85000 |
| IT | Randy | 85000 |
| IT | Will | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
Explanation:
In the IT department:

Max earns the highest unique salary
Both Randy and Joe earn the second-highest unique salary
Will earns the third-highest unique salary

In the Sales department:

Henry earns the highest salary
Sam earns the second-highest salary
There is no third-highest salary as there are only two employees

모든 파라미터를 기본으로 설정하고 답변 출력

Here's an optimized SQL query to solve this problem:

WITH ranked_employees AS (
  SELECT 
    e.name AS Employee,
    d.name AS Department,
    e.salary,
    DENSE_RANK() OVER (PARTITION BY e.departmentId ORDER BY e.salary DESC) AS salary_rank
  FROM Employee e
  JOIN Department d ON e.departmentId = d.id
)
SELECT 
  Department,
  Employee,
  Salary
FROM ranked_employees
WHERE salary_rank <= 3;

Here's an explanation of how the algorithm works and why it is efficient:

Step 1: Create a Common Table Expression (CTE)

We create a CTE named ranked_employees to perform the ranking operation. This allows us to avoid using subqueries, which can be slower.

Step 2: Join Employee and Department tables

We join the Employee table with the Department table on the departmentId column to link each employee to their department.

Step 3: Use DENSE_RANK() function

We use the DENSE_RANK() function to assign a ranking to each employee within their department based on their salary in descending order (highest salary first). The PARTITION BY clause groups employees by department, and the ORDER BY clause sorts them by salary.

Step 4: Filter top 3 earners per department

In the final query, we select only the columns of interest (Department, Employee, and Salary) from the CTE. We filter the results to include only employees with a salary_rank of 3 or less, which corresponds to the top 3 earners in each department.

Why this algorithm is efficient:

CTE: Using a CTE allows us to perform the ranking operation without creating intermediate tables or using subqueries.
DENSE_RANK(): This function is optimized for ranking operations and can handle large datasets efficiently.
Partitioning and ordering: By partitioning by department and ordering by salary, we minimize the amount of data that needs to be processed.
Filtering: By filtering only the top 3 earners per department, we reduce the number of rows in the final result set.

Overall, this query should perform well even with large datasets, as it leverages optimized ranking functions and minimizes unnecessary data processing.

Result

Leet 사이트에 직접 출력 답변을 넣고 실행시킨 결과 문제에 대한 정답이 오류 없이 컴파일 된다.
더하여 Step by Step 형식으로 문제 풀이에 대한 자세한 설명을 LLM이 제공, 요구 사항에 부합함.
나아가 해당 문제 풀이에서 활용된 알고리즘이 “왜” 효율적인지 설명하여 출력한 코드와 풀이 방식에 대한 정당성을 입증한다.

→ LLM 이 요구된 SQL query expert 화자로써 optimized query를 출력해냈다는 것에 대해 스스로 입증

[Case 2] System Prompt 조정 및 출력 답변 비교

<개요>

Prompt Tuning은 인공지능 모델, 특히 자연어 처리 모델을 사용할 때 사용되는 기법으로, 이 기법의 핵심은 모델에 입력되는 Prompt(명령이나 요청 등의 텍스트)를 조정하여, 모델이 원하는 방식으로 응답하도록 유도하는 방식
같은 모델을 사용하면서도 사용자 요구에 따라 다양한 종류의 출력을 얻기 위함이며, 이는 사용자의 입력에 따라 유연하게 반응하는 것을 목표로 한다
추가적인 학습이나 데이터 없이 여러 상황에 대한 다양하고 유연한 답변을 출력할 수 있게 되는 장점

<기본 설정>

Used Model : llama3:70b 로 Case1과 동일하게 유지
Original Question text used : Case1과 동일하게 유지

<변수 설정>

System Prompt에 변화를 줌
기존 System Prompt

"""
You are a SQL query optimization expert. I need your help to solve a problem related to database querying. 
Provide the complete SQL code along with an explanation of how the algorithm works and why it is efficient.
Please write an optimized SQL query to solve the following problem:
"""

조정된 System Prompt (1) : 같은 답을 출력하는 다양한 SQL 코드 출력에 초점

"""
You are a SQL query optimization expert. I need your help to solve a problem related to database querying. 
Provide the complete SQL code along with an explanation of how the algorithm works and why it is efficient.
You should provide more than 2 possible answer with different algorithm for the problem solution
Please write an optimized SQL query to solve the following problem:
"""

"You should provide more than 2 possible answer with different algorithm for the problem solution" 이 추가됨

조정된 System Prompt (2) : 어투 조정

"""
You are a SQL query optimization expert. I need your help to solve a problem related to database querying. 
Provide the complete SQL code along with an explanation of how the algorithm works and why it is efficient.
You should provide your answer in a tone of a teacher explaining to the kindergardeners.
Prefer to answer with a metaphor.
Please write an optimized SQL query to solve the following problem:
"""

"You should provide your answer in a tone of a teacher explaining to the kindergardeners.Prefer to answer with a metaphor." 이 추가됨

"""
You are a SQL query optimization expert. I need your help to solve a problem related to database querying. 
Provide the complete SQL code along with an explanation of how the algorithm works and why it is efficient.
You should provide your answer in a tone of a professor who's giving a lecture in front of a large audience.
Prefer to answer using charts and graph.
Please write an optimized SQL query to solve the following problem:
"""

"You should provide your answer in a tone of a professor who's giving a lecture in front of a large audience.Prefer to answer using charts and graph." 이 추가됨

<변수에 따른 결과 도출>

조정된 System Prompt (1) : 같은 답을 출력하는 다양한 SQL 코드 출력에 초점

Here are three different solutions to the problem with varying algorithms:

Solution 1: Using Row Numbering

This solution uses row numbering to assign a ranking to each employee within their department based on their salary. We then select the top 3 rows for each department.

WITH ranked_employees AS (
    SELECT 
        d.name AS Department,
        e.name AS Employee,
        e.salary,
        ROW_NUMBER() OVER (PARTITION BY e.departmentId ORDER BY e.salary DESC) AS rank
    FROM 
        Employee e
    JOIN 
        Department d ON e.departmentId = d.id
)
SELECT 
    Department,
    Employee,
    Salary
FROM 
    ranked_employees
WHERE 
    rank <= 3;

Solution 2: Using Dense Ranking

This solution uses dense ranking to assign a ranking to each employee within their department based on their salary. We then select the top 3 rows for each department.

WITH ranked_employees AS (
    SELECT 
        d.name AS Department,
        e.name AS Employee,
        e.salary,
        DENSE_RANK() OVER (PARTITION BY e.departmentId ORDER BY e.salary DESC) AS rank
    FROM 
        Employee e
    JOIN 
        Department d ON e.departmentId = d.id
)
SELECT 
    Department,
    Employee,
    Salary
FROM 
    ranked_employees
WHERE 
    rank <= 3;

Solution 3: Using Subqueries

This solution uses subqueries to find the top 3 salaries for each department, and then selects the employees with those salaries.

SELECT 
    d.name AS Department,
    e.name AS Employee,
    e.salary AS Salary
FROM 
    Employee e
JOIN 
    Department d ON e.departmentId = d.id
WHERE 
    (e.departmentId, e.salary) IN (
        SELECT 
            departmentId,
            salary
        FROM 
            (
                SELECT 
                    departmentId,
                    salary,
                    DENSE_RANK() OVER (PARTITION BY departmentId ORDER BY salary DESC) AS rank
                FROM 
                    Employee
            ) subquery
        WHERE 
            rank <= 3
    );

All of these solutions should be efficient as they use window functions or subqueries to avoid self-joins and reduce the number of rows being processed. The choice of solution depends on the specific database system being used, as some systems may optimize certain types of queries better than others.

요구 사항에 부합하도록 총 세 가지의 서로 다른 알고리즘을 출력

어떤 알고리즘을 활용하였는지, 어떤 방식으로 코드가 설계 되었는지 설명함

그러나 답변 다양성을 보장하기 위하여 알고리즘 다양성에 답변 출력 초점을 맞추다 보니, LLM이 제시한 세 개의 해답 모두 Runtime Error가 발생. (MySQL 버전 호환 관련 에러)

다양성을 확보하기 위해 variance을 높이면, 정확성이 떨어지는 결과를 낸다는 결론을 내릴 수 있음.

조정된 System Prompt (2) : 어투 조정 및 응답의 난이도 조정

"You should provide your answer in a tone of a teacher explaining to the kindergardeners. Prefer to answer with a metaphor." 이 추가된 Prompt의 출력은 아래와 같았다

My dear kindergarten friends, today we're going to solve a problem that's like finding the top three toys in each box of toys!

Imagine we have many boxes (departments) and inside each box, we have many toys (employees) with different prices (salaries). We want to find the top three most expensive toys in each box.

Here's the optimized SQL query to solve this problem:

WITH ranked_employees AS (
  SELECT 
    e.departmentId,
    e.name AS Employee,
    e.salary,
    DENSE_RANK() OVER (PARTITION BY e.departmentId ORDER BY e.salary DESC) AS salary_rank
  FROM Employee e
)
SELECT 
  d.name AS Department,
  re.Employee,
  re.salary
FROM ranked_employees re
JOIN Department d ON re.departmentId = d.id
WHERE re.salary_rank <= 3;

Let me explain how this query works like a toy factory!

Step 1: Create a temporary table with ranking
We use a Common Table Expression (CTE) named ranked_employees to create a temporary table that ranks each employee within their department by salary in descending order ( highest salary first). We use DENSE_RANK() to ensure that if there are multiple employees with the same salary, they will have the same rank.

Step 2: Join the ranked employees with departments
We join the ranked_employees CTE with the Department table on the departmentId column. This allows us to link each employee with their department name.

Step 3: Filter only top three earners in each department
Finally, we filter the results to only include employees with a salary_rank of 1, 2, or 3 (the top three earners in each department).

The final result is a table that shows the department name, employee name, and salary for each high earner in each department.

Why is this query efficient?

We use a CTE to avoid complex subqueries and improve readability.
We use DENSE_RANK() instead of ROW_NUMBER() to handle ties (employees with the same salary).
We filter the results using the salary_rank column, which reduces the number of rows to process.

Now, go ahead and run this query in your database management system to find those high-earning toys... err, employees!

제공된 코드를 실행하였을 때 Runtime Error을 비롯한 에러가 발생하지 않고 정확한 정답을 출력함.

“My dear kindergarten friends, today we're going to solve a problem that's like finding the top three toys in each box of toys!” 와 같이 SQL 문제를 장난감 찾기 문제에 빗대어 비유적인 표현을 통해 청자인 아이들이 이해하기 쉬운 문장 구성을 출력하였다. 나아가 “Imagine we have many boxes (departments) and inside each box, we have many toys (employees) with different prices (salaries).” 와 같이 어려운 어휘들을 쉬운 어휘로 치환하여 출력하는 것을 확인할 수 있었다.

"You should provide your answer in a tone of a professor who's giving a lecture in front of a large audience. Prefer to answer using charts and graph." 이 추가된 Prompt의 출력은 아래와 같았다.

Ladies and gentlemen, today we're going to tackle a fascinating problem in SQL query optimization. Our goal is to find the high earners in each department, where a high earner is an employee who has a salary in the top three unique salaries for that department.

To solve this problem, we'll use a combination of window functions, Common Table Expressions (CTEs), and clever indexing. So, let's dive into the solution!

The Optimized SQL Query:

WITH ranked_employees AS (
  SELECT 
    e.departmentId,
    e.name AS Employee,
    e.salary,
    DENSE_RANK() OVER (PARTITION BY e.departmentId ORDER BY e.salary DESC) AS salary_rank
  FROM Employee e
)
SELECT 
  d.name AS Department,
  re.Employee,
  re.salary
FROM ranked_employees re
JOIN Department d ON re.departmentId = d.id
WHERE re.salary_rank <= 3;

Explanation:

CTE (Common Table Expression): We define a CTE named ranked_employees to perform the complex ranking operation.
Window Function: We use the DENSE_RANK() window function to assign a unique rank to each employee within their department, based on their salary in descending order (highest salary gets rank 1).
Partitioning: The PARTITION BY e.departmentId clause ensures that the ranking is done separately for each department.
Join and Filtering: We join the ranked_employees CTE with the Department table on the departmentId column. Then, we filter out employees with a salary rank greater than 3 using the WHERE clause.

Why this query is efficient:

Indexing: By partitioning the data by department ID, we can take advantage of indexing on the departmentId column, reducing the number of rows to process.
Window Function: The DENSE_RANK() function is optimized for this type of ranking operation, making it more efficient than using self-joins or subqueries.
CTE: By defining the complex ranking operation in a CTE, we can avoid repeating the same calculation multiple times.

Performance Analysis:

Here's a rough estimate of the query performance:

Operation	Cost
CTE (ranking)	O(n log n)
Join	O(n)
Filtering	O(n)

Total cost: O(n log n)

As you can see, the dominant operation is the ranking step in the CTE, which has a logarithmic complexity due to the DENSE_RANK() function. The join and filtering operations have linear complexity.

Visualizing the Query Plan:

Here's a simplified query plan graph:

          +---------------+
          |  Employee   |
          +---------------+
                  |
                  | (CTE)
                  v
+---------------+     +---------------+
| ranked_employees |     | Department    |
+---------------+     +---------------+
                  |
                  | (JOIN)
                  v
+---------------+     +---------------+
| Result        |     | Filtering  |
+---------------+     +---------------+

In conclusion, this optimized SQL query efficiently finds the high earners in each department by leveraging window functions, CTEs, and clever indexing. The query plan is well-structured, and the performance analysis shows that the dominant operation is the ranking step in the CTE.

마찬가지로 어투만 변경하는 Prompt Engineering에서 출력하는 예시 코드는 Leet에서 실행시켰을 때 어떠한 에러도 발생 시키지 않고 정답을 출력하는 것을 확인할 수 있었다.

“Ladies and gentlemen, today we're going to tackle a fascinating problem in SQL query optimization.” 의 문장으로 답변 출력을 시작하며, 화자를 Prompt에서 요구한 것과 같이 강연을 하고 있는 대학교수로 설정하였음을 확인 가능했다. 나아가 Kindergarten이 청자로 설정되었던 바로 이전 답변과 다르게 답변에 사용되는 어휘의 난이도가 전반적으로 상승하였음을 확인할 수 있다.

또 계획을 그래프를 활용하여 설명하고, 전문적인 용어인 시간 복잡도를 설명하는 표를 제시하며 이전과는 어투가 조금 더 난도 있고 전문적인 방향으로 변화하였다는 것을 확인할 수 있었다.

Prompt를 통해 설정한 화자에 따른 어투가 서로 다르게 LLM에 기본적으로 장착되고, 이에 따라 서로 다른 답변 난이도로 답변이 출력될 수 있음을 확인하였고, 이는 간편하게 Prompt 조정을 통해 이루어낼 수 있다는 결론을 도출할 수 있었다.

[Case 3] 파라미터 조정

Temperature 파라미터 조정 (Default 0.8 → 0)

Temperature 파라미터를 0으로 조정한 뒤 출력된 답변은 다음과 같았다.

Here is an optimized SQL query to solve this problem:

WITH ranked_employees AS (
  SELECT 
    e.departmentId,
    e.name AS Employee,
    e.salary,
    DENSE_RANK() OVER (PARTITION BY e.departmentId ORDER BY e.salary DESC) AS salary_rank
  FROM Employee e
)
SELECT 
  d.name AS Department,
  re.Employee,
  re.salary
FROM ranked_employees re
JOIN Department d ON re.departmentId = d.id
WHERE re.salary_rank <= 3;

Here's an explanation of how the algorithm works and why it is efficient:

Step 1: Create a Common Table Expression (CTE) to rank employees by salary

We use a CTE to create a temporary result set that ranks each employee within their department by salary in descending order. We use DENSE_RANK() instead of RANK() to ensure that there are no gaps in the ranking, even if there are ties.

Step 2: Join the ranked employees with the Department table

We join the ranked_employees CTE with the Department table on the departmentId column. This allows us to associate each employee with their department name.

Step 3: Filter out employees who are not high earners

Finally, we filter out employees who have a salary rank greater than 3 using the WHERE clause. This ensures that only the top 3 earners in each department are returned.

Why this query is efficient:

Indexing: If there are indexes on the departmentId column in both tables and the salary column in the Employee table, the query optimizer can use these indexes to speed up the join and ranking operations.
CTE vs. subquery: Using a CTE instead of a subquery allows the database to optimize the ranking operation more efficiently.
DENSE_RANK() vs. ROW_NUMBER(): Using DENSE_RANK() instead of ROW_NUMBER() ensures that there are no gaps in the ranking, even if there are ties.
Filtering early: By filtering out employees who are not high earners early on, we reduce the number of rows that need to be joined and processed.

This query should perform well even with large datasets, as long as the necessary indexes are in place.

Temperature 0.8 (Default)

Temperature 0 (manipulated)

구조: 구조가 잘 정리되어 있지만, 단계별 번호가 매겨지지 않았고 설명이 조금 더 일반적입니다.
세부 사항: 알고리즘의 단계별 설명이 있지만, 일부 세부 사항은 응답 1보다 덜 구체적입니다.
효율성 설명: CTE 사용의 이점, DENSE_RANK()의 최적화, 파티셔닝과 정렬, 필터링의 중요성 등을 설명하지만, 인덱스에 대한 언급은 없습니다.

더 일반적이고 이해하기 쉬운 설명을 제공합니다. 알고리즘의 효율성을 설명할 때 CTE와 DENSE_RANK()의 최적화 측면을 강조합니다.

구조: 더 구조화되고 간결하며, 단계가 명확하게 구분되고 번호가 매겨져 있습니다.
세부 사항: 알고리즘에 대한 자세한 단계별 설명을 제공하며, DENSE_RANK()를 사용하는 등의 특정 선택에 대해 집중적으로 설명합니다.
효율성 설명: 인덱스, CTE와 서브쿼리 비교, DENSE_RANK()와 ROW_NUMBER() 비교, 필터링의 장점 등을 포함하여 상세히 설명합니다.

더 기술적이고 구체적입니다. 효율성에 대한 설명에서 인덱스 사용의 중요성을 강조합니다.

Temperature 파라미터를 낮게 조정하면 답변의 출력이 조금 더 구체적이고 전문적인 성향을 보이는 데에 반면, 높게 설명하면 조금 더 난이도가 낮고 일반적인 답변을 출력함을 알 수 있었다.

즉,

Temperature는 답변의 랜덤성을 조절하는 설정 하이퍼 파라미터이고, temperature가 높으면 더 예측 불가능하고 창의적인 결과를 내고 낮으면 흔하고 보수적인 답변을 한다는 결론에 도달 가능하다.

저작자표시