An Introduction to R

This is an introduction to R (“GNU S”), a language and environment forstatistical computing and graphics.R is similar to theaward-winning1 Ssystem, which was developed at Bell Laboratories by John Chambers et al.It provides a wide variety of statistical and graphical techniques(linear and nonlinear modelling, statistical tests, time seriesanalysis, classification, clustering, ...).

This manual provides information on data types, programming elements,statistical modelling and graphics.

This manual is for R, version 4.1.2 (2021-11-01).

Copyright © 1990 W. N. VenablesCopyright © 1992 W. N. Venables & D. M. SmithCopyright © 1997 R. Gentleman & R. IhakaCopyright © 1997, 1998 M. MaechlerCopyright © 1999–2021 R Core Team

Permission is granted to make and distribute verbatim copies of thismanual provided the copyright notice and this permission notice arepreserved on all copies.

Permission is granted to copy and distribute modified versions of thismanual under the conditions for verbatim copying, provided that theentire resulting derived work is distributed under the terms of apermission notice identical to this one.

Permission is granted to copy and distribute translations of this manualinto another language, under the above conditions for modified versions,except that this permission notice may be stated in a translationapproved by the R Core Team.Preface

This introduction to R is derived from an original set of notesdescribing the S and S-PLUS environments written in 1990–2 byBill Venables and David M. Smith when at the University of Adelaide.Wehave made a number of small changes to reflect differences between theR and S programs, and expanded some of the material.

We would like to extend warm thanks to Bill Venables (and David Smith)for granting permission to distribute this modified version of the notesin this way, and for being a supporter of R from way back.

Comments and corrections are always welcome.Please address emailcorrespondence to R-help@R-project.org.Suggestions to the reader

Most R novices will start with the introductory session in AppendixA.This should give some familiarity with the style of R sessionsand more importantly some instant feedback on what actually happens.

Many users will come to R mainly for its graphical facilities.See Graphical procedures, which can be read at almost any time and need not waituntil all the preceding sections have been digested.1 Introduction and preliminariesThe R environmentRelated software and documentationR and statisticsR and the window systemUsing R interactivelyAn introductory sessionGetting help with functions and featuresR commands, case sensitivity, etc.Recall and correction of previous commandsExecuting commands from or diverting output to a fileData permanency and removing objects1.1 The R environment

R is an integrated suite of software facilities for datamanipulation, calculation and graphical display.Among other things ithas an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for dataanalysis, graphical facilities for data analysis and display either directly atthe computer or on hardcopy, and a well developed, simple and effective programming language (called ‘S’)which includes conditionals, loops, user defined recursive functions andinput and output facilities.(Indeed most of the system suppliedfunctions are themselves written in the S language.)

The term “environment” is intended to characterize it as a fullyplanned and coherent system, rather than an incremental accretion ofvery specific and inflexible tools, as is frequently the case with otherdata analysis software.

R is very much a vehicle for newly developing methods of interactivedata analysis.It has developed rapidly, and has been extended by alarge collection of packages.However, most programs written inR are essentially ephemeral, written for a single piece of dataanalysis.1.3 R and statistics

Our introduction to the R environment did not mentionstatistics, yet many people use R as a statistics system.Weprefer to think of it of an environment within which many classical andmodern statistical techniques have been implemented.A few of these arebuilt into the base R environment, but many are supplied aspackages.There are about 25 packages supplied with R (called“standard” and “recommended” packages) and many more are availablethrough the CRAN family of Internet sites (viahttps://CRAN.R-project.org) and elsewhere.More details onpackages are given later (see Packages).

Most classical statistics and much of the latest methodology isavailable for use with R, but users may need to be prepared to do alittle work to find it.

There is an important difference in philosophy between S (and henceR) and the other main statistical systems.In S a statisticalanalysis is normally done as a series of steps, with intermediateresults being stored in objects.Thus whereas SAS and SPSS will givecopious output from a regression or discriminant analysis, R willgive minimal output and store the results in a fit object for subsequentinterrogation by further R functions.1.4 R and the window system

The most convenient way to use R is at a graphics workstation runninga windowing system.This guide is aimed at users who have thisfacility.In particular we will occasionally refer to the use of Ron an X window system although the vast bulk of what is said appliesgenerally to any implementation of the R environment.

Most users will find it necessary to interact directly with theoperating system on their computer from time to time.In this guide, wemainly discuss interaction with the operating system on UNIX machines.If you are running R under Windows or macOS you will need to makesome small adjustments.

Setting up a workstation to take full advantage of the customizablefeatures of R is a straightforward if somewhat tedious procedure, andwill not be considered further here.Users in difficulty should seeklocal expert help.1.5 Using R interactively

When you use the R program it issues a prompt when it expects inputcommands.The default prompt is ‘>’, which on UNIX might bethe same as the shell prompt, and so it may appear that nothing ishappening.However, as we shall see, it is easy to change to adifferent R prompt if you wish.We will assume that the UNIX shellprompt is ‘$’.

In using R under UNIX the suggested procedure for the first occasionis as follows: Create a separate sub-directory, say work, to hold data files onwhich you will use R for this problem.This will be the workingdirectory whenever you use R for this particular problem. Start the R program with the command At this point R commands may be issued (see later). To quit the R program the command is

At this point you will be asked whether you want to save the data fromyour R session.On some systems this will bring up a dialog box, andon others you will receive a text prompt to which you can respondyes, no or cancel (a single letter abbreviation willdo) to save the data before quitting, quit without saving, or return tothe R session.Data which is saved will be available in future Rsessions.

Further R sessions are simple. Make work the working directory and start the program as before: Use the R program, terminating with the q() command at the endof the session.

To use R under Windows the procedure tofollow is basically the same.Create a folder as the working directory,and set that in the Start In field in your R shortcut.Then launch R by double clicking on the icon.1.6 An introductory session

Readers wishing to get a feel for R at a computer before proceedingare strongly advised to work through the introductory sessiongiven in A sample session.1.7 Getting help with functions and features

R has an inbuilt help facility similar to the man facility ofUNIX.To get more information on any specific named function, forexample solve, the command is

For a feature specified by special characters, the argument must beenclosed in double or single quotes, making it a “character string”:This is also necessary for a few words with syntactic meaning includingif, for and function.

Either form of quote mark may be used to escape the other, as in thestring "It's important".Our convention is to usedouble quote marks for preference.

On most R installations help is available in HTML format byrunning

Post a Comment (0)
Previous Post Next Post