[R 프로그래밍 언어] Word Cloud

티스토리 뷰

Software/Data Analytics

[R 프로그래밍 언어] Word Cloud

Arc Lab. 2016. 10. 17. 17:35

[업데이트 2016.11.16 02:05]

R언어에서 텍스트와 같은 비정형 데이터에 대해 Word Cloud를 만들어 보여줄 수 있는 패키지가 있습니다. 해당 패키지로 Word Cloud를 만드는 방법에 대해 정리해보려고 합니다.

1) 패키지 설치

아래와 같이 4가지 패키지를 설치합니다.

> install.packages('tm')
> install.packages('SnowballC')
> install.packages('wordcloud')
> install.packages('RColorBrewer')

2) 패키지 선언

패키지 설치 후, 사용을 위해 아래와 같이 선언합니다.

> library(NLP)
> library(tm)
> library(SnowballC)
> library(RColorBrewer)
> library(wordcloud)  

3) Text 파일 불러오기 및 분석 함수 호출

이제 분석을 수행할 비정형 데이터를 불러옵니다.

불러온 텍스트 데이터를 가공하는 작업을 거칩니다.(대문자 변환, 숫자/공백/마침표 불용어 제거 등) 그리고 matrix 형태의 데이터로 만든후, 최종적으로 date.frame, row/column 형태의 데이터로 만듭니다. 그런후 head 함수를 통해 앞에서 N번째까지의 데이터를 가져옵니다. 최종적으로 wordcloud 함수를 호출하여 Word Cloud를 만듭니다.

test.txt는 아래의 사이트의 내용을 copy하여 만들었습니다.

참고: http://blog.datalicious.com/big-data-analysis/to-find-the-true-value-of-big-data-stop-calling-it-big-data/

> text <- readLines("C:\\R_WordCloud_TextData.txt")
> docs <- Corpus(VectorSource(text))
> docs <- tm_map(docs, content_transformer(toupper))
> docs <- tm_map(docs, removeNumbers)
> docs <- tm_map(docs, removeWords, stopwords("english"))
> docs <- tm_map(docs, removePunctuation)
> docs <- tm_map(docs, stripWhitespace)
> docs <- tm_map(docs, stemDocument)
> docu_matrix <- TermDocumentMatrix(docs, control=list(tolower=FALSE))
> docu_matrix <- as.matrix(docu_matrix)
> terms <- sort(rowSums(docu_matrix), decreasing = TRUE)
> terms <- data.frame(word = names(terms), freq=terms)
> head(terms, 15)
> set.seed(1)
> wordcloud(words=terms$word, freq=terms$freq, min.freq = 2, max.words = 200, random.order = FALSE, rot.per = 0.3,colors=brewer.pal(8, "Dark2")) 

<최종 생성 결과>

<GitHub>
https://github.com/AsyncBridge/Analytics/blob/master/R/R_WordCloud.R

저작자표시 비영리 변경금지 (새창열림)

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

글 보관함

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Arc Lab.'s Blog

티스토리 뷰

[R 프로그래밍 언어] Word Cloud

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역