Signal Processing and the World Wide Web

Signal Processing and the World Wide Web
Don H. Johnson
Computer and Information Technology Institute
Department of Electrical and Computer Engineering
Rice University, MS #366
Houston, TX 77251-1892
dhj@rice.edu

Supported by grant MIP-9301646 from the National Science Foundation.

Introduction
Information Access
Using Browsers
HTML
Signal Processing Resources
Bibliography

Abstract The World Wide Web offers an amazing amount of information useful to the signal processing community. Using the Web, information having a variety of different forms can be transferred in a cohesive fashion. This paper describes the rudiments of accessing the Web and how to create your own information resources. We focus on currently available signal processing resources and how the Web catalyzes signal processing research and development.

Introduction

The World Wide Web, usually termed the Web, grew out of the desire of the high-energy physics community at CERN to share information via the Internet in a flexible, evolvable manner. Beginning in late 1990, they defined a protocol for information exchange---http, the HyperText Transfer Protocol, that complements ftp and telnet---and a system for formatting documents that employs the HyperText Markup Language (HTML). These protocols were first demonstrated in December 1991, and were first used by physics researchers in early 1993[1].

The grand design for information exchange on the Web has several novel features.

Point-and-click interfaces should be the dominant way a user accesses information.
No arbitrary search structure, such as a tree, should be imposed on the user as he or she browses for information.
Multimedia information should be expressly allowed without restricting the representations for text, audio, image, and movie information.
The user's access software rather than the information provider's system maintains each user's search parameters: Searches are stateless from the source's viewpoint. This design means that the provider can field requests from an arbitrary number of users.

These design guidelines have been warmly received, and use of the Web has grown exponentially over a significant period of time[2].

This article surveys how to use the Web, with a focus on accessing signal processing information resources. Printed references on using the Web are out-of-date when they reach the bookstores. The Web, its protocols, and HTML are evolving much too rapidly for the standard book publishing process to keep up. Instead, use the Web itself to learn of the latest supported protocols and what information resources have emerged. Sprinkled throughout this article are network locations for basic information about the Web and its resources. Another resource is the ongoing column Traveling the information highway on the Web and the Internet written by Bob Alden that appears in The Institute.

General Web Resources
http://www.w3.org/ Source for information about the World Wide Web, its protocols, and its future.
http://www.cern.ch/ The URL for where it all began: The CERN high-energy physics laboratory in Switzerland (ch is the Internet abbreviation for Switzerland).
ftp.netscape.com
ftp.ncsa.uiuc.edu Internet addresses for obtaining (via ftp) copies of the public domain versions of Netscape and Mosaic, respectively. As described in a previous article[1], use anonymous ftp to acquire these software systems.
http://www.rice.edu/ Regarded as one of the best sites for starting general information searches
Digital Signal Processing Resources
(Good initial starting points)
http://www.ieee.org/sp/ The home page for the Signal Processing Society.
http://spib.rice.edu/spib.html The Signal Processing Society's online database (supported by the National Science Foundation).

General Web Resources
http://www.w3.org/	Source for information about the World Wide Web, its protocols, and its future.
http://www.cern.ch/	The `URL` for where it all began: The CERN high-energy physics laboratory in Switzerland (`ch` is the Internet abbreviation for Switzerland).
ftp.netscape.com ftp.ncsa.uiuc.edu	Internet addresses for obtaining (via `ftp`) copies of the public domain versions of Netscape and Mosaic, respectively. As described in a previous article[1], use anonymous ftp to acquire these software systems.
http://www.rice.edu/	Regarded as one of the best sites for starting general information searches
Digital Signal Processing Resources (Good initial starting points)
http://www.ieee.org/sp/	The home page for the Signal Processing Society.
http://spib.rice.edu/spib.html	The Signal Processing Society's online database (supported by the National Science Foundation).

Information Access

Accessing the Web requires the user's computer to have an Internet address. Modem access can be supported if you use any of the protocols, such as PPP and SLIP, that help your computer pretend it is on the Internet. Provided your computer is "on the Internet," Web information retrieval works quite simply. A user requests information by clicking special text or image fields in a window displayed by a browser, Mosaic and Netscape being the most popular. Underlying these special fields is the URL (Uniform Resource Locator), a specification of the protocol used to retrieve information and of where the information is located (the computer's Internet address and its filename). Clicking a field means to request information transfer across the network and to display it in a page according to a HTML specification expressed in a file corresponding to the selected field. Each Web site will typically have an overview document known as a home page that guides users to general information resources maintained there and elsewhere. Individuals and groups also maintain home pages. The user's browser maintains search state: the sequence of URLs corresponding to previous and the currently displayed information.

Anatomy of a Uniform Resource Locator

                  http://www.ieee.org/sp/SPS.html

              protocol://site/path

A URL beings with a protocol specification followed by a colon and two slashes. Protocols include http, ftp, file (which means that local files can be viewed), mailto (the URL specifies an e-mail address), and gopher. Internet addresses for URLs frequently begin with www; thus, to explore whether a site provides information for the Web, try combining this prefix with a network address. For example, movie previews of a major Hollywood studio can be found at http://www.mca.com/ and MathWorks at http://www.mathworks.com/. If no pathname is given, a default filename, usually index.html, is used. This default name is determined by the site being accessed, not the Web. When given, the path is rooted somewhere in the site's file system, the exact location again determined by the information site's operating system. UNIX conventions are used for pathnames. If no // is given, the information is assumed to be located on the computer from which the last retrieval was made, and a path constitutes the remainder of the URL.

"Information" should be interpreted in a very broad sense: Text, tables, graphics, images, video, and audio can all be "displayed" using current browsers. Not only does HTML specify which information to display and how the display should appear, it expresses where on the Web the information resides. The browser uses the URL to access the information sites with the specified protocol. Information files can be transferred using the classic ftp method, or using the Web's new information transfer protocol http. gopher searches and retrievals can also be made within the context of the Web. (See the article[1] on the Signal Processing Information Base for a description of these classic transfer protocols.) Browsers allow the user to print the displayed information and to print the HTML source for any page. Information can also be sent from the user to a Web site using CGI (Common Gateway Interface). Here, fields can be selected and text entries filled, then sent to a URL for processing or storage. Thus, the Web can be used (and is) for completing application forms and for controlling simulations.

Using Browsers

Browsers are available free of charge for all the common computational platforms, be they PC, Macintosh, or UNIX based. UNIX browsers typically use an X-windows based interface. Commercial, fully supported, browsers are appearing on the market, which can be purchased using the Internet of course. The browsers most commonly used are Mosaic, developed at the National Center for Supercomputing Applications (NCSA), and Netscape, which has both public domain and commercial versions. These browsers display text and graphics by translating the information format expressed by an HTML file. The user controls font size and window size; thus, HTML files can only express formatting information broadly. An example home page is shown in the accompanying figure, along with the HTML source file. Text or graphics that can be clicked to obtain more information are highlighted in some fashion (for the moment, text is underlined and displayed in a special color, and graphics are surrounded by a special border). Typically, as one moves the cursor over a highlighted section, the cursor's shape changes, indicating that it has been positioned correctly and that that information is just a click away. While information is being loaded, the browser indicates how much is left to transfer, an estimate of the time remaining, and shows that it is busy with a dynamic graphic in one of the window's upper corners. Note how the browser, Netscape in our example, uses the purple color to indicate which information resources\emrule links in Web parlance\emrule have already been selected and viewed. The ones in blue have not yet been viewed. (All browsers allow these colors to be altered by the user.) The time frame used by the browser to define a previous search is not limited to the current session; the duration of previous search history can be defined by the user.

Browsers are equipped to display text in a variety of fonts and styles, and to display images represented in the GIF (Graphical Interchange Format) format. Sound, movies, PostScript files, and alternately formatted images (JPEG, TIFF, etc.) are displayed using helper applications. What these applications are can be controlled from within the browser, and are heavily system dependent.

HTML

Markup languages, such as HTML, specify where and how text and graphics are to be positioned. Because the browser, the user (he or she selects which browser to use, window size, and overall font size), and the HTML file writer conspire to control the display, only general formatting can be specified in HTML. HTML files consist solely of text-based commands, which means any editor can create a HTML file. An example page and its corresponding source are shown in the example displaying SPIB's home page. TeX users will be familiar with this way of formatting text; WYSIWYG users might find this approach cumbersome, but a text-based specification means that the file is portable across all platforms and operating systems. A section of text is formatted according to paired instructions that surround the text. For example, the HTML phrase

         <A HREF="http://www.rice.edu/">Rice University</A>

specifies that the text Rice University can be clicked for information corresponding to the URL http://www.rice.edu/. In HTMLese, special locations in a file are anchors, and they are each sandwiched by a <A>text</A> pair. HTML formatting commands are always enclosed in angle brackets, with the formatting instruction consisting of one or more letters (A in this case) and the terminal member of the pair consisting of the same instruction enclosed in angle brackets and preceded by a slash (/). This example also illustrates that formatting commands can have options. Here, HREF is an option to the anchor command, and indicates that it can be clicked to load the specified HTML file. An option consists of the option's name, an equal sign (=), and its value. The file's URL is the value of the HREF option in this case, and corresponds to Rice University's home page. Commands are case-insensitive, even to the extent that upper and lower cases can be mixed within a command: <a HrEf="http://www.rice.edu/"> works just as well in the example. HTML commands can specify headings, bulleted and numbered lists, tables, and limited equation formatting. (Well, almost. At the time of this writing, HTML 3.0 was being defined. This protocol will eventually be extended to express equations. Equation formatting instructions do not correspond to TeX's definition. This said, recall the previous caution about the timeliness of printed descriptions of Web software and protocols.) Various text styles (boldface, italics, etc.) can be displayed, and ways of receiving user information specified. The various commands can be nested to achieve more extensive effects. For example, to let an image specify a URL, you would use the construct

      <A HREF="http://www.somewhere/file.html"><IMG SRC="arrow.gif"></A>

Example HTML commands are shown in the table.

Selected HTML Formatting Commands
<HEAD> <TITLE> page title </TITLE> </HEAD> <BODY> Rest of page </BODY> Outline of what is minimally needed to construct a Web page. The <HEAD> command places a title in the window's title bar. <BODY> frames the information displayed on the page.
<H1>Text</H1> Produces a level-1 headline that appears in a large, boldface font. Levels 1-6 are supported, with level~1 corresponding to the largest font.
<B>text</B> Format text in boldface. The <I> command produces italics, <U> underlined text, and <TT> typewriter characters.
<P> Begin a new paragraph. To end a line, but not start a new paragraph, use the <BR> command. Neither of these require pairing: No </P> is needed, for example.
<UL> <LI> item 1 <LI> item 2 </UL> Unordered list of the indicated items. Each item is preceded by a bullet in an unordered list. Note that the <LI> needs no pairing. An ordered list, in which items are assigned numbers, is produced by the <OL> command.
<A>text</A> Create an anchor. It can have the optional name label when NAME="label" is added to the after the anchor command. If this command were located in the file named file.html, a URL ending in file.html#label indicates not only to load the file, but start the display at the anchor having the name label. The option HREF="url" makes the anchor into a link, enabling loading of information located at the specified URL when text is clicked.
<IMG SRC="url"> Display the image specified by the URL. The image representation format is gleaned from the URL's postfix: .gif specifies the GIF format, .tiff the TIFF format, etc.

Selected HTML Formatting Commands
`<HEAD> <TITLE> page title </TITLE> </HEAD> <BODY> Rest of page </BODY>`	Outline of what is minimally needed to construct a Web page. The `<HEAD>` command places a title in the window's title bar. `<BODY>` frames the information displayed on the page.
`<H1>Text</H1>`	Produces a level-1 headline that appears in a large, boldface font. Levels 1-6 are supported, with level~1 corresponding to the largest font.
`<B>text</B>`	Format `text` in boldface. The `<I>` command produces italics, `<U>` underlined text, and `<TT>` typewriter characters.
`<P>`	Begin a new paragraph. To end a line, but not start a new paragraph, use the `<BR>` command. Neither of these require pairing: No `</P>` is needed, for example.
`<UL> <LI> item 1 <LI> item 2 </UL>`	Unordered list of the indicated items. Each item is preceded by a bullet in an unordered list. Note that the `<LI>` needs no pairing. An ordered list, in which items are assigned numbers, is produced by the `<OL>` command.
`<A>text</A>`	Create an anchor. It can have the optional name `label` when `NAME="label"` is added to the after the anchor command. If this command were located in the file named `file.html`, a `URL` ending in `file.html#label` indicates not only to load the file, but start the display at the anchor having the name `label`. The option `HREF="url"` makes the anchor into a link, enabling loading of information located at the specified `URL` when `text` is clicked.
`<IMG SRC="url">`	Display the image specified by the `URL`. The image representation format is gleaned from the `URL`'s postfix: `.gif` specifies the `GIF` format, `.tiff` the `TIFF` format, etc.

A detailed description of HTML can be found at http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html. However, perhaps the best way to learn is to view others' pages; the browsers allow you to easily view the HTML file that corresponds to a displayed page.

SPIB Home Page Viewed from Netscape

HTML Source

<HEAD>
<TITLE>Signal Processing Information Base (SPIB)</TITLE>
<H1>Signal Processing Information Base (SPIB)</H1>
<HR SIZE=4>
</HEAD>
<BODY>
The Signal Processing Information Base (SPIB) is a project sponsored 
by the Signal Processing Society and the National Science Foundation.  
SPIB contains information repositories of data, papers, software, 
newsgroups, bibliographies, links to other repositories, and 
addresses, all of which are relevant to signal processing research and 
development.
<P>
For general information, send e-mail to
<A HREF="mailto:spib@spib.rice.edu">spib@spib.rice.edu</A>
containing the message:
<PRE>
        send help
</PRE>
<UL>
<LI> <A HREF="gopher://spib.rice.edu:70/11/SPIB/addresses"> 
addresses</A>
<LI> <A HREF="gopher://spib.rice.edu:70/11/SPIB/bibliography"> Signal 
processing bibliography</A>
<LI> <A HREF="http://spib.rice.edu/directory.html"> data</A>
<LI> <A HREF="gopher://spib.rice.edu:70/11/SPIB/help"> help</A>
<LI> newsgroup and e-letter archives
<UL>
  <LI> <A HREF="gopher://spib.rice.edu:70/11/SPIB/news/e-letter"> 
        E-Letter on digital signal processing</A>
  <LI> <A HREF="gopher://spib.rice.edu:70/11/SPIB/news/imdsp-e-letter">
        E-Letter on image and multidimensional signal processing</A>
  <LI> <A HREF="gopher://spib.rice.edu:70/11/SPIB/news/comp.dsp"> 
        USENET digital signal processing newsgroup</A>
</UL>
<LI> <A HREF="http://spib.rice.edu/papers.html"> papers</A>
</UL>
<HR>
<ADDRESS>
--
<A HREF="mailto:spib-admin@spib.rice.edu">spib-admin@spib.rice.edu</A>
1/3/95
</ADDRESS>

Signal Processing Resources

The Signal Processing Society maintains a home page that provides a good entry point to signal processing resources located on the Web. Its URL is http://www.ieee.org/sp/. There, you will find descriptions of the Society and its governance, a calendar of workshops and conferences, and links to signal processing companies and research groups around the world. URL s to interactive signal processing design systems are also provided there.

Of special interest to the signal processing community is the Signal Processing Information Base (http://spib.rice.edu/spib.html). It serves as the repository of data and reference materials, such as preprints of articles (or links to them) that provide early dissemination of results in a primitive electronic form. Many of the data files are quite large. Consequently, we have designed a HTML interface to Matlab programs that allows users to preview data and to extract data segments (or download the entire file). At the moment, waveform and spectral displays are provided; an example of how this interface can be used is shown in the accompanying figure. Depicted there is a spectrogram of a segment taken from an acoustic recording of a machine gun. We intend to add more previewing schemes in the future. To access the previewer, go to the SPIB home page and select data.

Data Previewing using Web/Matlab

Bibliography

D.H. Johnson and P.N. Shami. The Signal Processing Information Base. Signal Processing Magazine, 10: 36-42, October 1993.
Regulating cyberspace. Science, 268: 628-629, 5 May 1995.
http://www.w3.org/hypertext/WWW/WWW/