Quick Start

In order to get started quickly we will work through a standard wordscores analysis using some example documents. The documents are British political party manifestos from elections in 1992 and 1997. the manifestos are available as a zipfile from the wordscores homepage.

Extract the contents of the zip file to your Stata working folder, and launch Stata. In the transcript, your input is shown in white and Stata's responses in green.

Before beginning the analysis, we will allow Stata to allocate sufficient memory. 10 megabytes should be enough for this number of documents.

.  set memory 10m

(10240k)

Now we compute word frequencies from all the manifestos:

.  wordfreq lab92.txt con92.txt ld92.txt lab97.txt con97.txt ld97.txt

  Starting WORDFREQ ...
    lab92.txt --> tlab92
    con92.txt --> tcon92
    ld92.txt --> tld92
    lab97.txt --> tlab97
    con97.txt --> tcon97
    ld97.txt --> tld97

We will use the 1992 manifestos as reference documents, so we will assign scores to them:

.  setref tlab92 5.35 tld92 8.21 tcon92 17.21

We can take a look at the texts we're working with using the describetext function.

.  describetext t*92 t*97

            |     Ref   Total  Unique   Mean   Median
       Text |   Score   Words   Words   Freq.   Freq.
------------+----------------------------------------
      tld92 |    8.21  17,671   3,167    5.58    1.00
     tcon92 |   17.21  29,413   4,028    7.30    2.00
     tlab92 |    5.35  11,445   2,372    4.83    1.00
      tld97 |       .  13,959   2,418    5.77    2.00
     tcon97 |       .  21,129   3,174    6.66    2.00
     tlab97 |       .  17,567   2,994    5.87    2.00

Now we set the name of the dimension we used for the reference scores:

.  wordscore economic

And finally infer the positions of the 1997 manifestos on this dimension:

. textscore economic tlab97 tld97 tcon97

Wordscore v0.36  (c) 2003 Kenneth Benoit
Dimension: ECONOMIC

            |                    Unique    Trans-    Trans-                          Total      %
     Virgin |     Raw      Raw   Scored    formed    formed       Transformed        Words    Tot
       Text |   Score       SE    Words     Score        SE   [95% Conf. Interval]  Scored   Sc'd
------------+-------------------------------------------------------------------------------------
     tlab97 | 10.3718   0.0149    2,247    9.1274    0.3459     8.4356     9.8192   16,616   94.6
      tld97 | 10.1934   0.0153    1,949    4.9922    0.3559     4.2804     5.7039   13,380   95.9
     tcon97 | 10.7184   0.0137    2,341   17.1640    0.3175    16.5290    17.7989   20,072   95.0

Each row represents the results for a virgin text. The main quantities of interest in this table are the inferred scores of each text on the economic scale, in the column marked 'Transformed Score', and the standard error estimate for this estimate, in the column marked 'Transformed SE'.

To learn more about the functions applied in this session, click 'Next'.

Up to Table of Contents