Python 干 /从 pandas dataframe/

我创建 dataframe 有必须完成的建议。
我想使用 Snowballstemmer, 通过我的分类算法获得更高的准确性。 我怎样才能实现这一目标?


import pandas as pd
from nltk.stem.snowball import SnowballStemmer

# Use English stemmer.
stemmer = SnowballStemmer/"english"/

# Sentences to be stemmed.
data = ["programers program with programing languages", "my code is working so there must be a bug in the optimizer"]

# Create the Pandas dataFrame.
df = pd.DataFrame/data, columns = ['unstemmed']/

# Split the sentences to lists of words.
df['unstemmed'] = df['unstemmed'].str.split//

# Make sure we see the full column.
pd.set_option/'display.max_colwidth', -1/

# Print dataframe.
df

+----+---------------------------------------------------------------+
| | unstemmed |
|----+---------------------------------------------------------------|
| 0 | ['programmers', 'program', 'with', 'programming', 'languages']|
| 1 | ['my', 'code', 'is', 'working', 'so', 'there', 'must', |
| | 'be', 'a', 'bug', 'in', 'the', 'interpreter'] |
+----+---------------------------------------------------------------+
已邀请:

八刀丁二

赞同来自:

您必须将生长应用于每个单词并将其保存在列中。 "stemmed".


df['stemmed'] = df['unstemmed'].apply/lambda x: [stemmer.stem/y/ for y in x]/ # Stem every word.
df = df.drop/columns=['unstemmed']/ # Get rid of the unstemmed column.
df # Print dataframe.

+----+--------------------------------------------------------------+
| | stemmed |
|----+--------------------------------------------------------------|
| 0 | ['program', 'program', 'with', 'program', 'languag'] |
| 1 | ['my', 'code', 'is', 'work', 'so', 'there', 'must', |
| | 'be', 'a', 'bug', 'in', 'the', 'interpret'] |
+----+--------------------------------------------------------------+

要回复问题请先登录注册