首页 技术 正文
技术 2022年11月9日
0 收藏 904 点赞 4,547 浏览 3552 个字

大部分的代码、思路参考了《Ansi Common Lisp》P138~P141。

问题:给一篇英文文本,如何让计算机依据此文本而生成随机但可读的文本。如:

|Venture|

The National Venture Capital Association estimates that wealth associated with a deal a big spending by regulations that will spend one another’s main reason these projects .

这是计算机学习了Paul Graham的一些文章后生成的随机文本。它根据Venture这个单词向两边延伸成一个句子。令人惊喜的是,文本常常是可读的。

算法:记录每个单词后面出现的单词以及出现的次数,如I leave在原文中出现了5次,I want出现了3次,除此之外,其它地方没有出现过I,所以在生成随机文章的时候,当遇到I,有5/8的概率选择leave为下一个单词。假如选择了leave的话,则看看leave后面出现过哪些单词,重复以上过程。

现用lisp来解决问题。

lisp里的符号类型,即symbol,可以很好记录各种字符串还有标点符号,所以采用它来记录。采用内附的hashtable来建立列表:

(defparameter *words* (make-hash-table :size 10000))

那如何建立列表呢?

(let ((prev '|.|))
(defun see (sym)
(let ((pair (assoc sym (gethash prev *words*))))
(if pair
(incf (cdr pair))
(push (cons sym 1) (gethash prev *words*))))
(setf prev sym)))

以当前单词为keyword,以assoc-list关系列表为该keyword下的值。

如I下有( (|leave| . 5) (|want| . 3) )。没有单词word的话,则push入(word . 1)。

如何随机选一个词呢?

(defun random-word (word ht)
(let* ((choices (gethash word ht))
(x (random (reduce #'+ choices :key #'cdr))))
(dolist (pair choices)
(decf x (cdr pair))
(if (minusp x)
(return (car pair))))))

这里巧妙用了reduce函数。

现在再来思考,如何将给定一个词向两侧延伸成一句话呢?

1)先将文本反向,得到一个反向的列表,也即I leave,I want变成leave I,want I。

2)将hashtable反向,得到另外一个hashtable,以后一个单词为关键字,前面可能出现的单词及次数构成assoc-list。

3)碰运气,从一个标点开始延续文章,直到出现给定单词为止。

我用了第二个方法:

(defparameter *r-words* (make-hash-table :size 10000))(defun push-words (w1 w2 n)
(push (cons w2 n) (gethash w1 *r-words*)))(defun get-reversed-words ();a cat -> cat a
(maphash #'(lambda (k lst)
(dolist (pair lst)
(push-words (car pair) k (cdr pair))))
*words*))

遍历原来的hashtable,再把每一对单词先后换个位置插入另外一个hashtable。

给出双向延伸句子的自动生成文本代码:

(defparameter *words* (make-hash-table :size 10000))
(defconstant maxword 100)
(defparameter nwords 0)
(defconstant debug nil)
(let ((prev '|.|))
(defun see (sym)
(incf nwords)
(let ((pair (assoc sym (gethash prev *words*))))
(if pair
(incf (cdr pair))
(push (cons sym 1) (gethash prev *words*))))
(setf prev sym)))(defun check-punc (c);char to symbol
(case c
(#\. '|.|) (#\, '|,|)
(#\; '|;|) (#\? '|?|)
(#\: '|:|) (#\! '|!|)))(defun read-text (pathname)
(with-open-file (str pathname :direction :input)
(let ((buf (make-string maxword))
(pos 0))
(do ((c (read-char str nil 'eof)
(read-char str nil 'eof)))
((eql c 'eof))
(if (or (alpha-char-p c)
(eql c #\'))
(progn
(setf (char buf pos) c)
(incf pos))
(progn
(unless (zerop pos)
(see (intern (subseq buf 0 pos)))
(setf pos 0))
(let ((punc (check-punc c)))
(if punc
(see punc)))))))))(defun print-ht (ht)
(maphash #'(lambda (k v)
(format t "~A ~A~%" k v))
ht))(defparameter *r-words* (make-hash-table :size 10000))(defun push-words (w1 w2 n)
(push (cons w2 n) (gethash w1 *r-words*)))(defun get-reversed-words ();a cat -> cat a
(maphash #'(lambda (k lst)
(dolist (pair lst)
(push-words (car pair) k (cdr pair))))
*words*))(defun print-a-word (word ht)
(maphash #'(lambda (k lst)
(if (eql k word)
(format t "~A ~A~%" k lst)))
ht))(if debug
(print-a-word '|leave| *r-words*))(defun punc-p (sym);symbol to char,nil when fails.
(check-punc (char (symbol-name sym) 0)))(defun random-word (word ht)
(let* ((choices (gethash word ht))
(x (random (reduce #'+ choices :key #'cdr))))
(dolist (pair choices)
(decf x (cdr pair))
(if (minusp x)
(return (car pair))))))(defun gen-former (word str)
(let ((last (random-word word *r-words*)))
(if (not (punc-p last))
(progn
(gen-former last str)
(format str "~A " last)))))(defun gen-latter (word str)
(let ((next (random-word word *words*)))
(format str "~A " next)
(if (not (punc-p next))
(gen-latter next str))));(gen-latter '|leave| t)(defun get-a-word (ht);get a random word
(let ((x (random nwords)))
(maphash #'(lambda (k v)
(dolist (pair v)
(decf x (cdr pair))
(if (minusp x)
(return-from get-a-word (car pair)))))
ht)))
;(get-a-word *words*)
(defun gen-sentence (word str)
(gen-former word str)
(format str "~A " word)
(gen-latter word str))(defun test ()
(setf nwords 0)
(read-text "essay.txt")
(get-reversed-words)
(let ((word (get-a-word *words*)))
(print word)
(gen-sentence word t)))
(test)

文本语料库、lisp源代码见:
Here

相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:9,484
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,899
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,732
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,485
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:8,125
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:5,286