C++ AND THE WORLD-WIDE WEB Marcus Speh Deutsches Elektronen-Synchrotron DESY Notkestr. 85, 22607 Hamburg, Germany and Carlye Dinnell The Science Policy Support Group 22 Henrietta Street, London WC2E 8NA, UK [Accepted for publication in: C++ Report, ed. S.Lippman, Jan/Feb 95] Abstract C++ documents and resources are made available on the Internet via the World-Wide Web. It is argued that and explained how an international community of C++ users and vendors can profit from this tool for information retrieval, training and service. These days, newspapers and journals are full of articles on the Internet and its phenomenal potential. Occasionally the authors discuss the World Wide Web (WWW), but fail to explain how the Internet differs from the Web. Even more often, the reader's justified question "Why should I care?" remains unanswered. This article will deal briefly with getting on the Web, and discuss information resources of particular interest to the C++ users' community. At the end, you should have a rough idea of what's waiting for you out there on the Web. Even more importantly, you will know how you can contribute to the Web yourself, as a lone C++ programmer or as a software company, and how you can use it as a sophisticated tool for teaching and learning C++. Only a minimum of technical information is necessary to get to the Web and use it effectively. The World-Wide Web is a "wide-are hypermedia information retrieval initiative aiming to give unversal access to a large universe of documents" which was developed at CERN, the European particle physics facility. First proposed in 1989 by Tim Berners-Lee (CERN) to promote the availability of information within the high energy physics community, it has by now expanded throughout the whole Internet and embraces all scientific disciplines, the liberal arts and the business world. The fantastic growth rate of the Web users' community has made people recognize it as the Internet tool with the greatest potential for the future of the electronic information age. There are other retrieval systems that, individually, provide more limited services on the Internet (gophers, netnews, ftp, telnet etc) but the Web gives access to all of those services combined (provided of course that the respective programs are installed on your computer). Most WWW documents are written in Hyper-Text Markup Language (HTML). This is normal text, embedded with links to other documents. A link appears as a highlighted or underlined word, and you move between linked documents by clicking on those words. Hidden behind each link is something known as a Uniform Resource Locator (URL), an address attached to the file to be accessed. If you only followed links form one document to another, you would not even need to know the concept of URLs, as the link automatically goes to that address and plugs you in to the document. However, you will certainly want to access documents directly at some point. URLs can be intimidatingly long - the file containing this article has the address http://www.desy.de/user/projects/C++/report.html The string "http" tells you that this document is delivered using the Hyper-Text Transfer Protocol. The second part, "www.desy.de", is an internet address domain name, and the last part is the local path to the requested document. A WWW history mechanism called a "hotlist" (not available for all browsers) simplifies the process considerably by allowing you to store automatically, without ever seeing it, the URL of a document you are in with a single click or keystroke. This is extremely useful when you have followed a long path through a number of documents. Most of the references to this article consist in URLs. The two crucial pieces of software for the Web are document servers and browsers. Servers are for those providing information, but are irrelevant for those merely interest in retrieving it (see [2] for more important information, in particular relevant to companies with access restrictions). However, you must have a browser, also known as a client, to use the Web. The Internet links computers together, and the Web links the documents stored on those computers. A browser allows you to actually look at the documents. The choice of browser depends on your computer and how you are connected to the net. Among the most popular clients is NCSA Mosaic, a graphical client, and Lynx, a simple line-mode browser, both of which are freely available on a variety of computer architectures [2]. You can quickly check out the Web without installing any new software on your system in one of two ways. If you have the "telnet" program and access to the Internet, you can get onto the Web with Cern's simple browser by telnetting to "info.cern.ch". At this point you can follow links by entering the numbers conveniently placed next to them in brackets, or you can use the "go" command to reach a particular URL. In the telnet session, the command go http://www.desy.de/user/projects/C++.html would bring you to the C++ Virtual Library [3]. On the other hand, if you don't even have telnet, you may have e-mail with Internet access. Then you can use a mail robot to serve particular WWW documents to you in ASCII format. The email returned by this program contains the page text and, more importantly, a list of references (URLs) at the bottom, which you can use to climb further down a documentation tree. For example, to get to the C++ Virtual Library HomePage, send email to "test-list@info.cern.ch" with only the line send http://www.desy.de/user/projects/C++.html in the body of the message. The generic format of the "send" command is "send ". Once you are familiar with Web pages, the simplicity of HTML and the special appeal of pages seen through graphical browsers like Mosaic tools are very likely to make you wish to write your own hypertext pages. Once people and companies realise that the Web is a strong paradigm whose usefulness and fascination is not tied to the Internet, they often start out by providing local Web services. These might include self-descriptions (the so-called "hyplans" [3]), virtual resumes, diaries and pictures, or a whole environment, software product palette or campus-wide information system. In between these stages may lie several months of getting used to editing HTML, convincing reluctant collaborators of the Web's irresistible charm, and finally a jump onto the Internet, since only there does the full power of WWW unfold. Whether the new friends of the Web realise it or not, they have taken the first step towards virtual management. The C++ virtual library [2] grew out of a need to be in touch with the C++ world outside a single research institution and to organise search results in an orderly way. It is the power of the Web that it has become a resource used by over 4,000 people a week, and it is clear that the Web is an important resource for making distributed information centrally available in a networked environment. For example, the Web can instantly solve a problem that plagues the daily life of a C++ programmer, namely, dealing with out of date information: no FAQ, library or README information needs to be copied to a single site. All that needs to be maintained (often with the help of sophisticated scripts [1,4]) is the correctness and accessibility of the link. In early 1993, the C++ virtual library, a central repository of C++ related documents and resources, became available to the public via WWW [2]. Before entering the WWW subject index maintained at CERN, it contained only specific information on the use of C++ in High Energy Physics, but since then has evolved into a huge catalogue of over 250 documents on C++, mainly in HTML format. Examples from the home page [illustration 1] include links to: o A long list of separate pages with C++ and OOP FAQ documents for "Getting Start(l)ed"; interesting C++ applications, book reviews, other archives (mainly with FTPable material), topical resources of interest to C++ users such as parallel applications, the current list of ANSI/ISO resolutions etc. o "Learning C++", a page with links to V. Carpenter's course resource list and the Virtual GNA C++ Course (see also below). o A page with links to access (although not to post to) Usenet newsgroups dealing with discussion about C++. o Lists of freely available and commercial C++ libraries and packages. o List of C++ and OOP conferences. o Editing: An introduction to Barry Warsaw's clever "c++-mode" for the GNU Emacs editor. o Links to other programming and application hierarchies (like general OOP and Literate Programming), and to pages prepared by software companies offering C++ products. o A list of upcoming OOP and computing conferences. The more frequent visitor to this tree is informed about recent changes through a "What's new" page. An additional essential requirement, namely quick access to the desired resource, is met by a keyword-searchable index, based on the ICE indexing package by C. Neuss, which is part of the CERN httpd server distribution [1]. At the bottom of the page, there is a link to an HTML fill-out form, which can be used to report errors or suggest additional resources directly to the author. Fill-out forms and indices, based on CGI (Common Gateway Interface [5]) scripts, are among the most useful of the more recent additions to the technical equipment on the server side. Forms are becoming an indispensable mechanism for feedback from the Web user to the information provider. Companies advertising their products can put out a "guest book"-like form to be signed by those visiting their pages, or advertise jobs for programmers. Since its inception and fantastic growth, the Web has been plagued by an "old" Internet disease: the lack of overview for the surfing user. Therefore, the thoughtful use of indices can strongly be recommended to everyone who desires to offer information via WWW. Still, the kind of index search generally offered by the information provider does not take into account the fact that the resources on the Web are often quite different in character: a single document should be distinguished from a list of documents in an FTP area, or from a volume of issues of a particular magazine. To account for these differences, one must introduce a "coverage code" concept, assigning different coverage indices to resources and forcing the indexing program to follow their hierarchy. The price to be paid is that one cannot fully rely on automated indexing for that - a human has to evaluate a resource to assign the proper coverage code to it, thus creating the problem of outdated indices again. The only Internet (and Web document) index known to the authors which follows a coverage code principle is the "Meta-Library" from the Globewide Network Academy [6]. The initial confusion which arose from the fact that different servers provided for a widely varying arsenal of tools has disappeared in 1994, and you can now choose between a number of equally well-supported software products (all of which are freely available [1]). On the client/browser side, it will remain true that not all software supports all the different gadgets, but which one you can use strongly depends on the hardware available to you - it is the responsibility of the information provider to ensure usefulness of a resource for as wide an audience as possible. Every public page should be cross-checked with at least one graphical and one non- graphical browser (like Mosaic and Lynx or Emacs-W3), and servers should not forget the visually impaired user when considering a fancy graphical HTML page layout. The Web has a special attraction if you want to make C++ code available to the world, or even just to a group of developers, since access authorisation or strictly local service is possible. Information is accessible not only as a packed file for complete retrieval, but also by using the Web to provide an intelligent Hyper-map thorough code samples, class libraries and complicated design solutions. Available examples fall into two different groups: straightforward, manual preparation of HTML pages for the code/design or HTML front-end (either integrated in the coding environment or as script-collections). An example for straightforward HTMLisation are T. Burnett's GISMO development code and the AIPS++ library for Radio Astronomy [7] - here the Web solution merely consists of a nice embedding of raw source code in a hypertext guide. Another method is to interface C++ code and HTML using a manually run script like D. Bruck's classdoc.awk [8], turning the class header files into Unix manual pages which can be HTMLised on the fly using suitable scripts [1,23]. A nice presentation of various C++ projects prepared in this way can be inspected through the "High Energy Physics C++ Catalog" [23]. A bit fancier is the use of clickable image maps [9] [illustration 2], allowing for the graphical display presentation of your C++ design process. These can only be accessed with a WWW browser that supports this feature, like NCSA Mosaic. Such a map can contain links either to more design documents or directly to the code, which may or may not be marked up with HTML, or which may be combined with a class2man translation tool like the one mentioned above. The solution I personally prefer is the use of an HTML front end to code which is written in a literate programming environment [10] like N. Ramsey's NOWEB processor [11]: here, the code document formatted in LaTeX (a package of macros for Knuth's TeX program which is very popular among scientists) gets turned into HTML with N. Drakos' latex2html translator [12]. As an example, the HTML version of this article [13] was prepared using latex2html. If you do not want to follow the literate approach, but use latex2html nevertheless for automatic translation into HTML, you can e.g. try the c++2latex program by J. Heitk\"otter [14]. Another interesting solution to the presentation of C++ libraries on the Web has been given by P. Murray-Rust from the Glaxo Protein Research group, UK, together with his DEMOCRITOS library for bioinformatics [15]: he runs a tk/tcl [16] script on header files consistently formatted for documentation purposes to produce a very appealing marked-up C++ library reference. To display information, another tcl script can be used which runs a simple, home- made class browser. For all the different approaches mentioned, the main reason that software authors are interested in publishing their code in hypertext format is that they are seeking international collaboration: the Web material often addresses a user community which is still debating the usefulness of C++ (or even of object-orientation, since Fortran is still the dominating language in scientific applications). The library on the Web often consists of research code that is far from being finished and this attracts hackers, beta-testers and programmers worldwide. Only a few software companies have already smelled the market here and offer C++ products over the Web, but their number is growing. They then often feel the need to contribute to the maze of resources for C++ developers. An example is the C++ Forum from Quadralay Corp. and product information from ParaSoft Corp. (see [2]). Valuable free resources often have attractive home pages, like J. Smart's free GUI toolkit 'wxWindows' in C++ [2]. With the present global labor market changes, the need for training and retraining is growing. Distance education is an attractive means to face these changes adequately in an increasingly networked working environment. On this premise the Macvicar School of Educational Technology (MSET), a school under the umbrella of the Globewide Network Academy, Inc. (GNA) [6], a young online university, offered a C++ course fully delivered via the internet, with the World-Wide Web as its backbone [17]. The course was taught for the first time from May to August 1994, with 75 students from over 18 countries and a faculty of 9 consultants from Canada, Germany, Korea, the UK and the USA. During the course, there was no real life contact between the teachers and the students: instead, realtime interaction was offered online using Diversity University MOO [18], a multi-user environment with OOP capabilities, originally developed at XEROX Parc, Palo Alto, in which a "virtual classroom" had been built. Transcripts of interesting MOO discussions were added to the course Web as well. In addition, the students and teachers were able to have discussions on a dedicated mailing list archived on the WWW using K. Hughes' HyperMail program [19]. The course units followed a hyper-textbook served on the Web and based on the Coronado tutorial by Gordon Dodrill [illustration 3]. This hypertextbook, which is keyword-searchable as a whole, contains links to compilation notes prepared by the teachers, a glossary of C++ terms, solutions to exercises sent in by students and a fair amount of additional material based on student questions and prepared during the duration of the course. All sample programs are available both as hypertext (marked up as C++ code) to be read parallel to the text by opening several browser windows (compared to reading two pages of a printed book), and in raw ASCII format for quick pasting and compilation [illustration 4]. There was no tuition or fee asked from the students for this prototypical course. Instead, they were expected to contribute to the Web material, and most of them did - 20 people alone were involved in the preparation of the hypertext notes, and some applied their fresh knowledge of C++ to writing helpful conversion programs. During the course, some interesting programming problems materialized as student projects that lasted beyond the end of the course, covering a CGI wrapper library, matrix and string classes, etc. These are now organized as the "GNA Global C++ Library" (GCL) project, which was sponsored by MSET to conduct research and development on reusable software tools. This project is being developed in the same spirit as GNU software development, with copyleft protection and drawing on from the vast internet resources of programming talent, free software and emerging advanced teleconferencing software [20]. The response to this new way of learning using the World-Wide Web was overwhelmingly positive: at the 1st International WWW conference at CERN, Geneva, in May, the course won the 1994 award for "Best Educational Service on the Web". Its successor, planned for October 94, was already oversubscribed before the first course had terminated, and the massive feedback from the first class is leading to a major review and restructuring of the material. The way the different components of the course - WWW, maillists and online consulting - can work together still needs to be optimized, but the concept of distributed learning with the Web, even for a topic as complex as the C++ language, has undoubtedly proven to be very successful and is already giving rise to imitation. At workshops like the one on "Teaching and Learning with the Web" during the 1st WWW conference [21], the long- distance education community is now meeting and discussing with WWW wizards how to profit from this new approach to teaching and learning. Programming courses in particular promise to be rewarding targets for this kind of learning on the Web: the students can initially be assumed to be computer literate, and the presentation of the material is usually easy to markup as HTML (compared to, for example, a course relying on complicated mathematical formulae). Also, no field is better represented on the net or the Web than computer science, so that any course can immediately offer a enormous pool of secondary resources. More and more software vendors and academic institutions are trying to pace their way into the electronic world of the 21st century. The World wide web, as the most powerful paradigm around on the internet, is here to stay. Through various services, both from research labs (which still make up the majority of server sites) and companies, C++ is already well represented on WWW. As automatic tools to translate existing documents into HTML - the standard Web markup language - become more widely available, whole software product palettes can conveniently be advertised and distributed on the Web. There is no limitation of WWW to be used only on the Internet: rather, companies may gain considerably from an in-house Web server already. On the design and development side, there are several interesting proposals of how to lucidly display C++ library information. For C++ training and programming courses in general, virtual coursework following the example of the GNA C++ course can be organized and delivered to customers, employees and students using the Web. Additional reading: K. Hughes' freely available text "Entering the World-Wide Web: A Guide to Cyberspace" [22] answers most of the immediate questions which this article with its limited scope cannot address. REFERENCES. [1] Central WWW software repository at CERN, at URL "http://info.cern.ch/", branching into the mother of all WWW home pages with various subject libraries, and software lists for both the server and the client side. [2] URL "http://www.desy.de/user/projects/C++.html". [3] The hyplans of the authors for example are at URLs "http://www.desy.de/www/marcus.html" and "http://www.desy.de/www/carlye.html". [4] Oscar Nierstrasz' collection of scripts is marvellous, see "http://cui_www.unige.ch/ftp/PUBLIC/oscar/scripts/README.html". See also [1]. [5] Details on CGI are at URL "http://hoohoo.ncsa.uiuc.edu:80/cgi/". [6] The GNA's home on the Web, awarded "Best Campus-Wide Information System" in the WWW contest 1994, is at URL "http://uu-gna.mit.edu:8001/uu-gna/". [7] See the "Free Packages" link in ref. [2]. An article on the "GISMO" project appeared in the March/April 1993 issue of the "C++ Report". [8] classdoc.awk is distributed together with the CLHEP library, see "Free Packages" in ref. [2], and also [23]. [9] An example by one of the authors the author is available at URL "http://www.desy.de/user/projects/MG/MGLIB.html" [10] An own Web hierarchy maintained by one of the authors is at URL "http://www.desy.de/user/projects/LitProg.html", and simple examples for HTML from literate C++ code are in "http://www.desy.de/gna/html/cc/text/tutorial3/minimal/index.html". [11] For information how to get NOWEB, see URL "ftp://bellcore.com/pub/norman/www/noweb/intro.html". [12] See URL "http://cbl.leeds.ac.uk/nikos/tex2html/doc/latex2html/latex2html.html" [13] See link in URL "http://www.desy.de/user/projects/C++/report.html". [14] Available from your local comp.sources.misc newsgroup archive. [15] See URL "http://www.dl.ac.uk/CBMT/democ/HOME.html". [16] tk/tcl is a widely used programming system for developing and using graphical user interfaces. See e.g. URL "ftp://ftp.cs.berkeley.edu/ucb/tcl" for more. [17] To access the course Web material from America, try the URL "http://uu-gna.mit.edu:8001/uu-gna/text/cc/index.html". From Europe, the course is mirrored at "http://www.desy.de/gna/html/cc/index.html". A paper on the course, by D.Perron, is available at URL http://uu-gna.mit.edu:8001/uu-gna/text/cc/papers/2nd_conf.html [18] Information about these virtual meeting and teaching places and about Diversity University in particular is at URL "http://pass.wayne.edu/DU.html". [19] This and other useful software is freely available from Enterprise Integration Technologies at URL "http://www.eit.com/". [20] People interested in this project should contact its coordinator, Jeffrey Thompson, at . [21] See URL "http://tecfa.unige.ch/edu-ws94/ws.html" [22] Available from [17] or via anonymous FTP from "ftp.eit.com" in the "pub/web.guide" directory. [23] See URL http://afal01.cern.ch/C++/Catalog/Tools/Tools.html for a script collection used to present libraries in the "HEP C++ Catalog", URL http://afal01.cern.ch/C++/Catalog/Catalog.html. ILLUSTRATIONS 1: View on [2] from Mosaic. 2: MG++ imagemap from "http://www.desy.de/user/projects/MG/MGLIBgraph.html" from Mosaic. [fully developed and clickable by October 94]. 3: View on [15] from Mosaic. 4: View on "http://uu-gna.mit.edu:8001/uu-gna/text/cc/text/tutorial2/html/concom.html" from Mosaic.