Introduction To HTML


Table of Contents

Introduction
Accessing Information
HTML
Web Page Template

Introduction

The World Wide Web, usually termed the Web, grew out of the desire of the high-energy physics community at CERN to share information via the Internet in a flexible, evolvable manner. Beginning in late 1990, they defined a protocol for information exchange---http, the HyperText Transfer Protocol, that complements ftp and telnet---and a system for formatting documents that employs the HyperText Markup Language (HTML). These protocols were first demonstrated in December 1991, and were first used by physics researchers in early 1993 (story).

The grand design for information exchange on the Web has several novel features. 

These design guidelines have been warmly received, and use of the Web has grown exponentially over a significant period of time. Users find the Web's interface much easier to use, each person can easily establish their own node on the Web, and its protocols constitute a form of electronic publication. Printed references on using the Web are out-of-date by the time they reach the bookstores. The Web, its protocols, and HTML are evolving much too rapidly for the standard book publishing process to keep up. Instead, use the Web itself to learn of the latest supported protocols and what information resources have emerged. 

Accessing Information

A user requests information by clicking special text or image fields in a window displayed by a browser, Mosaic and Netscape being the most popular. Underlying these special fields is the URL (Uniform Resource Locator), a specification of the protocol used to retrieve information and of where the information is located. Taking one apart, it consists of three basic units. 
       http:   //www.owlnet.rice.edu/   ~dhj
        /\              /\               /\
     Protocol     Internet Address    path to file
                   of Computer
A URL beings with a protocol specification followed by a colon and two slashes. Protocols include
Protocol  
Name
Description 
http  Hypertext Transfer Protocol: The standard protocol for describing Web pages. 
ftp  File Transfer Protocol. The oldest Internet protocol that transfers files from one computer to another, and translates file formats automatically. 
file  Followed by the pathname of a file, the specified file will be loaded into the browser and displayed. This means that while developing a page, you can debug and view it before it is placed on the Web. 
mailto  The URL specifies an e-mail address. 
gopher  Old style way of transferring information around the Internet. Primitive, and displays files literally according to the system's hierarchial directory structure. 
Internet addresses for URLs frequently begin with www; thus, to explore whether a site provides information for the Web, try combining this prefix with a network address. For example, movie previews of a major Hollywood studio can be found at http://www.mca.com/ and MathWorks at http://www.mathworks.com/.

If no pathname is given, a default filename, usually index.html, is used. This default name is determined by the site being accessed, not the Web. When given, the path is rooted somewhere in the site's file system, the exact location again determined by the information site's operating system. UNIX conventions are used for pathnames. A user's login name beginning with a tilde (~) can also appear as a path (~dhj in the example above). Here, the browser searches the user's login directory for the subdirectory public_html, and loads the file index.html. Continuing the example, the Web page is located at ~dhj/public_html/index.html. If no // is given, the information is assumed to be located on the computer from which the last retrieval was made, and a path constitutes the remainder of the URL.

Clicking a field means to request information transfer across the network and to display it in a page according to a HTML specification expressed in a file corresponding to the selected field. Each Web site will typically have an overview document known as a home page that guides users to general information resources maintained there and elsewhere. Individuals and groups also maintain home pages. The user's browser maintains search state: the sequence of URLs corresponding to previous and the currently displayed information. "Information" should be interpreted in a very broad sense: Text, tables, graphics, images, video, and audio can all be "displayed" using current browsers. In creating Web pages, you should not be inhibited by the notion of an eror destroying your browser. Browsers have been written to be very resilient, and will display something despite massive HTML errors. Not only does HTML specify which information to display and how the display should appear, it expresses where on the Web the information resides. The browser uses the URL to access the information sites with the specified protocol. Information files can be transferred using the classic ftp method, or using the Web's new information transfer protocol \http. Browsers allow the user to print the displayed information and to print the HTML source for any page. 

HTML

Markup languages, such as HTML, specify where and how text and graphics are to be positioned. Because the browser, the user (he or she selects which browser to use, window size, and overall font size), and the HTML file writer conspire to control the display, only general formatting can be specified in HTML. HTML files consist solely of text-based commands, which means any editor can create a HTML file. TeX users will be familiar with this way of formatting text; WYSIWYG users might find this approach cumbersome, but a text-based specification means that the file is portable across all platforms and operating systems. A section of text is formatted according to paired instructions that surround the text. For example, the HTML phrase
     <A HREF="http://www.rice.edu/">Rice University</A>
specifies that the text Rice University can be clicked for information corresponding to the URL http://www.rice.edu/. In HTML-ese, special locations in a file are anchors, and they are each sandwiched by a <A>-</A> pair. HTML formatting commands are always enclosed in angle brackets, with the formatting instruction consisting of one or more letters (A in this case) and the terminal member of the pair consisting of the same instruction enclosed in angle brackets and preceded by a slash (/). This example also illustrates that formatting commands can have options. Here, HREF is an option to the anchor command, and indicates that it can be clicked to load the specified HTML file. An option consists of the option's name, an equal sign (=), and its value. The file's URL is the value of the HREF option in this case, and corresponds to Rice University's home page. Commands are case-insensitive, even to the extent that upper and lower cases can be mixed within a command:
     <a HrEf="http://www.rice.edu/">Rice University</A>
works just as well in the example. HTML commands can specify headings, bulleted and numbered lists, tables, and limited equation formatting. Various text styles (boldface, italics, etc.) can be displayed, and ways of receiving user information specified. The various commands can be nested to achieve more extensive effects. For example, to let an image specify a URL, you would use the construct
    <A HREF="http://www.somewhere/file.html"><IMG SRC="arrow.gif"></A>
Some example HTML commands are

Selected HTML Formatting Commands

<HTML>
<HEAD>
<TITLE>
page title
</TITLE>
</HEAD>
<BODY>
Rest of page
</BODY>
</HTML>
Outline of what is minimally needed to construct a Web page. The <HTML> declaration means that HTML is the language used to express what is on the page. In this way, new markup languages can be specified and used as they arise. The <HEAD> command places a title in the window's title bar. <BODY> frames the information displayed on the page. An example minimal page is   
<HTML>
<HEAD>
<TITLE>
Simple HTML Example
</TITLE>
</HEAD>
<BODY>
<H2>HTML is easy to learn</H2>
<P>Welcome to the world of HTML.
This is the first paragraph.  While it is short is is still
a paragraph!
<P>
And this is the <B>second</B> paragraph.
</BODY>
</HTML>
<H1>Text</H1>  Produces a level-1 headline that appears in a large, boldface font. Levels 1-6 are supported, with level 1 corresponding to the largest font. The headline size used here are H3
<B>text</B>  Produces boldface text. The <I> command produces italics, <U> underlined text, and <TT> typewriter characters
<P>  Begin a new paragraph. To end a line, but not start a new paragraph, use the <BR> command. Neither of these require pairing: No </P> is needed, for example. 
<UL>
<LI> item 1
<LI> item 2
</UL>
Unordered list of the indicated items. Each item is preceded by a bullet in an unordered list. As an example, see the list that occurred earlier in this document. Note that the <LI> needs no pairing. An ordered list, in which items are assigned numbers, is produced by the <OL> command. 
<A>text</A>  Create an anchor. It can have the optional name label when NAME="label" is added to the after the anchor command. If this command were located in the file named file.html, a URL ending in file.html#label indicates not only to load the file, but start the display at the anchor having the name label. The option HREF="url" makes the anchor into a link, enabling loading of information located at the specified URL when text is clicked. 
<IMG SRC="url">
Display the image specified by the URL. The image representation format is gleaned from the URL's postfix: .gif specifies the GIF format, .tiff the TIFF format, etc. 
<HR>  Produce a horizontal rule. Useful for separating items. 
A detailed description of HTML can be found on the Web. The Beginner's Guide to HTML is a good reference, even for moderately advanced users. However, perhaps the best way to learn is to view others' pages; the browsers allow you to easily view the HTML file (the View Source menu item in Netscape, for example) that corresponds to a displayed page.