[code]BERT example code

Machine Learning/NLP

[code]BERT example code

뚜둔뚜둔 2023. 7. 20. 00:13

import torch
import transformers import BertTokenizerFast
from torch.utils.data import Dataset, DataLoader

class TokenDataset(Dataset):

	def __init__(self, dataframe, tokenoizer_pretrained):
    	# sentence, label 컬럼으로 구성된 데이터프레임 전달
    	self.data = dataframe
        # huggingface 토크나이저 생성
        self.tokenizer = BertTokenizerFast.from_pretrained(tokenizer_pretrained)

	def __len__(self):
    	return len(self.data)
        
    def __getitem__(self, idx):
    	sentence = self.data.iloc[idx]['document']
        label=self.data.iloc[idx]['label']
        
        # tokenizer 처리
        tokens = self.tokenizer(
        	sentence,					# 1개 문장
            return_tensors='pt',		# 텐서로 변환
            truncation=Ture,			# 잘라내기 허용
            padding='max_length',		# 패딩 적용
            add_special_tokens=True		#스페셜 토큰 적용
        )
        
        input_idx=tokens['input_ids'].squeeze(0) 				# 2D -> 1D
        attention_mask =tokens['attention_mask'].squeeze(0)		# 2D -> 1D
        token_type_idx = torch.zeros_like(attention_mask)
        
        # input_ids, attention_mask, token_type_idx 이렇게 3가지 요소를 반환하도록 함
        # input_idx : token
        # attention_mask: 실제 단어가 존재하면1, padding이면 0 (패딩은 0이 아닐 수 있음)
        # token_type_ids: 문장을 구분하는 id, 단일 문장인 경우에는 전부 0) 
        
        return {
        	'input_ids':input_ids,
            'attention_mask':attention_mask,
            'token_type_idx': token_type_ids,
        }, torch.tensor(label)

nn.Dataset을 확장한 클래스인 TokenDataset 클래스 인스턴스를 생성한다

참고 (https://teddylee777.github.io/huggingface/bert-kor-text-classification/)

내용을 참고 하여 작성하였습니다

코드 전체를 따라서 돌려보시려면 참고 사이트에 가시면 전체 원본이 있습니다.

저작자표시 비영리 동일조건

'Machine Learning > NLP' 카테고리의 다른 글

what is LoRA? (0)	2024.11.17
자연어처리 흐름 한눈에 보기 (0)	2023.07.07
[summary]what is Embedding? (0)	2023.07.05
what is Encoder model, Decoder model, Encoder&Decoder model ??? (0)	2023.07.05
[논문리뷰]Dense Passage Retrieval for open-Domain Questionestions and passages Answering(2020) (0)	2023.07.05

현재글[code]BERT example code

데이터 처리중입니다. 잠시만 기다려 주세요... AI_Developer

#nlp #자연어처리 #ml #ai #MLOps

파이썬, Python, DevOps, conda, ml, docker, reinforcement, 알고리즘, Mlflow, 쿠버네티스, cicd, AI, error, K8S, Airflow, 딥러닝, MLOps, 강화학습, Kubernetes, 머신러닝,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

데이터 처리중입니다. 잠시만 기다려 주세요... AI_Developer

[code]BERT example code

'Machine Learning > NLP' 카테고리의 다른 글

'Machine Learning/NLP'의 다른글

티스토리툴바

[code]BERT example code

'Machine Learning > NLP' 카테고리의 다른 글

'Machine Learning/NLP'의 다른글

관련글

티스토리툴바