DL-VT416
A Digital Library Testbed for Research Related to 4/16/2007 at Virginia


[ Introduction | Project Website | Project Description | Documents | Contacts ]

 

Introduction

This research proposes to support a wide range of research studies, as well as inquiries from the general public, related to the tragedy that occurred during the morning of April 16, 2007 on the Virginia Tech campus in Blacksburg, VA. The target audience includes those interested in how technology aids detection, prevention, and responding to disasters in highly connected settings. Concern with many other issues, such as social support and psychological health and coping, leads behavioral and social scientists to request support for data curation as well as special services involving data mining and information visualization. A key question is how digital libraries can work in rapid-response settings, as well as for studying the aftermath of tragedies: testing hypotheses, collecting and analyzing related data, visualizing findings, discovering trends and patterns, modeling and simulation, and building and validating improved theories and models. This project will lead to further development of the theory and software support for large scale digital libraries that also allow researchers to apply closely-coupled data mining and visualization services, e.g., so that archived content can be efficiently and conveniently analyzed, and so that trends and outliers can be spotted. Computer and information scientists, following legal, policy, and human-subject guidelines, can study portions, or the whole complex, of the resulting testbed - of content, services, usage logs, etc.

Project Description

Our multidisciplinary team will research how digital libraries (DLs) [1, 2] can provide immediate and ongoing support during crises and their aftermath, especially on university campuses. We will develop a testbed supporting a wide range of research studies including those aided by data mining, visualization, and social network analysis. We will validate our approach with data and multimedia information related to the events on 4/16/2007 at Virginia Tech (VT416), when 33 members of the university community were killed by a student turned gunman. As can be seen from Figure 1, our DL will be at the heart of our research activities and should have significant broader impacts when used by large numbers of scholars, as well as the general public. The specification of the DL will be integrated with the activities of a related VT416 study now being proposed to NSF, for a workshop aimed to develop a research agenda.

It is extremely important that our digital library be put into operation as soon as possible, so data now being captured by various parties, to assist in understanding VT416, can be brought together and used to support the research of sociologists, psychologists, and others interested in crises, tragedies, stress, grief, coping, and many related topics. This must be done in a flexible way (see next section), so we can adapt the DL to ever changing needs, including those that will be specified in connection with the abovementioned workshop aimed to devise a research agenda addressing such events (which Dr. Shoemaker will lead).

DL Generation
DL-VT416 will be built using our semi-automatic approach to rapid development of digital libraries [3, 4] that has been under development by PI Fox and his colleagues for over five years, in connection with the 5S framework [5]. In addition to a number of doctoral dissertations related [6-8], several Master's theses [9, 10] have facilitated this approach; the most recent is by Gorton [11], who summarized the overall situation. We will generate a DL, and revise it as needed, building initially upon the popular DSpace system [12, 13]. We will connect it with data mining, social network analysis, and visualization capabilities [14], leading to a flexible support infrastructure in which rapid testing of hypotheses will be made possible, as follows.

Supporting Systems-Level Science
Our goal is to support "systems-level" science on the social dynamics associated with crisis events. Inspired by current research trends in biology and the life sciences, systems-level science seeks to understand the functioning of very large and complex systems and all the interactions therein, in a wholistic fashion. This is in contrast to more traditional reductionist science, which narrows the problem down to focus on specific individual variables. Systems-level science is more exploratory in nature, and encourages the development of new hypotheses. In the life sciences, the "system" of systems-level science refers to the functioning of complex biological organisms. In Virginia Tech's case, the "system" under study is the complex communities of people and how they respond to crises. Systems-level science is more challenging to implement than other approaches, but can offer deeper insights into the underlying phenomenon. Systems-level science requires:

- Rapid and continuous collection of massive data:
The recent growth of systems-level science in biology was supported by the invention of microarray and similar instrumentation that enables simultaneous collection of data about thousands of genes and proteins at the cellular level, thereby offering a complete picture that is both detailed and broad. Similarly our digital library will be organized to ingest large collections, such as email logs, that are collected and curated over time.

- Integration of diverse, heterogenous data:
Our digital library must be able to bridge diverse data sources, including vast electronic logs such Google searches and email logs, as well as rich personal sources such Facebook pages, surveys, and interviews. We will investigate new kinds of data that could be captured such as through volunteer tracking and deployment of our two Microsoft SenseCam/Memex units.

- Realtime exploratory analysis:
To gain deep insight, our digital library must link with analysis tools to support rich exploratory analysis of complex patterns and interrelationships for theory development. Realtime analysis must maintain synchronization with incoming information to enable awareness of breaking hypotheses and quick response. Access, analysis, dissemination, and utilization will be supported by visualizations and data mining tools.

Visualization
For access, analysis, and dissemination of our library contents, we will integrate a set of visualization tools, led by Dr. North. To support heterogeneous and dynamic data collections, flexible visualization tools capabilities are required. We will integrate our Visualization Schemas framework with our 5S digital libraries framework to offer visualization capabilities that users can model and curate in much the same way that they do for the digital library content. We will liaise with other institutions that offer diverse library visualization tools that could plug into this environment, such as Pacific Northwest National Laboratories' InSpire system, and Penn State's Improvise system. To enable new social science research through realtime analysis of our continuously dynamic library content, we will link the digital library with Virginia Tech's GigaPixel Display. The GigaPixel Display project ( http://infovis.cs.vt.edu /gigapixel/ ) offers nearly 200 million pixels of display space for massive data visualization and situational awareness. Recent research results indicate that such large displays significantly expand and enhance human abilities for visualizing large data and maintaining awareness of dynamic data. This can enable a new form of social research that occurs in realtime. Scientists can examine trends as they occur, such as ulterior changes in later crisis reactions by certain population groups, and potentially work to affect outcomes. The GigaPixel Display will give the library a living presence.

Data Mining
We will investigate a multi-pronged approach to mining and harnessing the collection of information brought together in DL-VT416. First, led by Dr. Ramakrishnan, we will mine the time-stamped series of documents to uncover the key trends that characterized the tragedy and the ensuing response. Next, we will mine the network of relationships induced by communications as recorded on various social networking sites and characterize this network temporally in light of the trends characterized before. This will aid in understanding if particular forms of communication were especially prevalent during different stages of the unfolding sequence of events. Finally, we can explore different ˇ§projectionsˇ¨ of the multi-dimensional data space and determine if trends manifesting globally also reflect in the local views. The results of data mining will ideally be parameters of information diffusion that can then be used to drive a system-wide model of human-human communication, which in turn can be used for simulating synthetic scenarios. The algorithms we will explore include Kleinberg's burst detection algorithm, storyline extraction from collections of documents, graph characterizations of networks such as connectedness, average shortest path length, clustering coefficient, and multi-dimensional aggregates and views.

Social Network Analysis
Dr. Fan has extensive expertise on focus crawling, text mining, and social network analysis [15-23]. He will help with crawling data from different sources and with text mining and social network analysis to analyze emerging patterns from the testbed. Social network analysis is a proven technique widely used in social science to understand properties related to a social system. It will help us understand not only the global properties of a network such as the average betweenness of two nodes, or the average in-degree and out-degree among all nodes in a network, but also help us understand individual node's properties such as its centrality in the network, and in-degree and out-degree. Many data sets from our data collection will have networks of relationships. For example, each email exchange (which we can study with IRB approval if released for research by all parties involved) will set up a link between a sender and a receiver. Similarly, in the popular Facebook discussion forum, every message will include a message originator and a respondent. Analyzing these kinds of social exchange data using social network analysis combined with text mining techniques will help us answer interesting research questions related to the 04/16/2007 VT tragedy such as:

- How do different communication channels (online forums in Facebook, email) help people affected by tragedy cope with the stress and grief, and improve their healing process?
- Who do people respond to when they first hear of a tragedy?
- What types of roles are played by different types of users?
- What are the different sub-communities? How do they evolve over time?

Sustainability and Broader Impacts
Virginia Tech University Libraries, and the various branches concerned with Special Collections and Archives, has been in touch with the Library of Congress and other groups. It will maintain into the future an archive related to VT416. We are coordinating closely with them, as well as the Center for Digital Discourse and Culture (see fuller list of interested parties under Supplemental Documents), and will make sure that sustainability of the DL results. Our focus will be on handling digital information, collecting it as quickly as possible, and supporting the broader impacts of such information through a variety of services aimed at the needs of researchers and the general public (recall Figure 1). Our approach, partially described above, will be refined as we collaborate with those forming a research agenda for this field, those providing data, and those engaged in research and education activities. We will have a web site and widely disseminate our findings through online, conference, and journal venues. We also will build upon work previously supported by NSF, included studies described below.

Results from Prior NSF Support
Drs. Fan and Fox are completing four years of work with NSF (ITR) funding, through grant IIS-0325579, entitled Information Technology Research: Managing complex information applications: An archaeology digital library. This was launched by an archaeologist, Project PI James Flanagan (CWRU), with the IT aspects led by VT PI Fox and co-PI Fan. VT's subcontract was for $189,500, covering 9/1/03-12/31/05, but a no-cost extension allowed continuation of research into the summer of 2007. The ETANA-DL (digital library ˇV see http://etana.dlib.vt.edu for publications, presentations, and a link to the system) provides an integration framework and broad set of services operating on data from sites in Jordan and Israel. Two dissertations, two theses, and a number of papers have been published [4, 24-38]. Tools have been developed for schema mapping and integration, and the system supports search, multi-dimensional browsing, visualization, comparison, data export, and extensibility to more sites and other domains. Dr. North, serving as PI working with colleagues Doug Bowman, Roger Ehrich, Steve Harrison, is completing work on Towards Boundless Display: Developing a Reconfigurable Research Testbed for Large-scale, High-resolution Visual Displays. NSF #CNS-04-23611(08/16/04 - 08/15/07) supported the construction of the GigaPixel Display Laboratory ( http://infovis.cs.vt.edu /gigapixel/ ), hosted by Virginia Tech's Department of Computer Science and the Center for Human-Computer Interaction (CHCI), and directed by Dr. Chris North. This NSF-funded facility (see Figure 3) contains reconfigurable ultra-high resolution displays, totaling approximately 200 million pixels, one of the highest resolutions in the world. In addition to resolution, a unique aspect of this facility is its diversity of technologies and reconfigurability. Display technologies include rear-projection blocks and LCD panels. Interactive devices include touch panels, 6 DoF trackers, laser trackers, RFID, and various handhelds. Reconfigurability enables the display blocks and panels to be rearranged into arbitrary form factors, with plug-n-play flexibility of input devices. The facility is supported by computational clusters, and software to support rapid reconfiguration. The facility is co-located with the CHCI's AwareLab, providing VICON vision-based tracking for interactive input, and the 3Di Laboratory, providing immersive 3D displays. This facility provides an ideal research testbed for exploring fundamental questions of the design of future human-computer interfaces. It also provides a resource for advanced visualization and analysis of very large data. The massive number of pixels enables analysts to efficiently visualize much larger quantities of data than on traditional desktop displays. A current project resulting from this facility is designing and evaluating visualizations for intelligence data analysis for the National Geospatial-intelligence Agency. Research results have shown significant user performance advantages of such large-scale visualizations over their small-scale counterparts. Initial results are published in [39-45].

Reference
[1] E. A. Fox and G. Marchionini, "Toward a Worldwide Digital Library: Guest Editors' Introduction to Special Section on Digital Libraries: Global Scope, Unlimited Access," Comm. ACM, vol. 41, pp. 29-32, 1998. http://purl.lib.vt.edu/dlib/pubs/CACM199804; http://doi.acm.org/10.1145/273035.273043
[2] E. A. Fox and S. Urs, "Digital Libraries," in Annual Review of Information Science and Technology (ARIST), vol. 36, B. Cronin, Ed.: American Society for Information Science, 2002, pp. 503-589.
[3] M. A. Goncalves and E. A. Fox, "5SL - A Language for Declarative Specification and Generation of Digital Libraries," in Proc. JCDL'2002, Second ACM / IEEE-CS Joint Conference on Digital Libraries, July 14-18, Portland, Oregon, G. Marchionini, Ed.: ACM, 2002, pp. 263-272.
[4] R. Shen, M. A. Goncalves, W. Fan, and E. Fox, "Requirements gathering and modeling of domain-specific digital libraries with the 5S framework: An archaeological case study with ETANA," in Research and Advanced Technology for Digital Libraries, vol. 3652, Lecture Notes in Computer Science, 2005, pp. 1-12.
[5] M. A. Goncalves, E. A. Fox, L. T. Watson, and N. A. Kipp, "Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital Libraries," ACM Transactions on Information Systems, vol. 22, pp. 270-312, 2004. http://doi.acm.org/10.1145/984321.984325
[6] R. Shen, Applying the 5S Framework To Integrating Digital Libraries (Doctoral Dissertation). Blacksburg, VA, USA: Virginia Tech, 2006. http://scholar.lib.vt.edu/theses/available/etd-04212006-135018/
[7] M. A. Goncalves, "Streams, Structures, Spaces, Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications," in Computer Science. Blacksburg, VA: Virginia Tech, 2004, pp. 161. http://scholar.lib.vt.edu/theses/available/etd-12052004-135923/
[8] H. Suleman, "Open Digital Libraries," in Department of Computer Science. Blacksburg: Virginia Tech, 2002. http://scholar.lib.vt.edu/theses/available/etd-11222002-155624/
[9] Q. Zhu, "5SGraph: A Modeling Tool for Digital Libraries," Virginia Tech – Department of Computer Science, 2002. http://scholar.lib.vt.edu/theses/available/etd-11272002-210531/
[10] R. Kelapure, "Scenario-Based Generation of Digital Library Services," in Computer Science. Blacksburg, VA: Virginia Tech, 2003. http://scholar.lib.vt.edu/theses/available/etd-06182003-055012/
[11] D. C. Gorton, "Practical Digital Library Generation into DSpace with the 5S Framework," in Computer Science Master's Thesis. Blacksburg: Virginia Tech, 2007. http://scholar.lib.vt.edu/theses/available/etd-04252007-161736/
[12] MIT, "DSpace: Durable Digital Depository," vol. 2004. Cambridge, MA: MIT, 2003. http://dspace.org
[13] R. Tansley, M. Bass, D. Stuve, M. Branschofsky, D. Chudnov, G. McClellan, and M. Smith, "The DSpace Institutional Digital Repository System: current functionality," presented at Proc. of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, Houston, Texas, 2003. http://portal.acm.org/citation.cfm?id=827151
[14] J. Wang, VIDI: A lightweight protocol between visualization systems and digital libraries. Blacksburg, VA: Virginia Tech, Department of Computer Science Masters thesis, 2002. http://scholar.lib.vt.edu/theses/available/etd-07012002-145841/
[15] J. N. Cummings, "Work groups, structural diversity, and knowledge sharing in a global organization," Management Science, vol. 50, pp. 352, 2004.
[16] D. Maloney-Krichmar and J. Preece, "The meanings of an online health community in the lives of its members: Roles, relationships and group dynamics," presented at 2002 International Symposium on Technology and Society ISTAS'02, 2002.
[17] S. Borgatti and R. Cross, "A relational view of information seeking and learning in social networks," Management Science, vol. 49, pp. 432-445, 2003.
[18] G. Marwell and P. Oliver, "Social networks and collective action: A theory of the critical mass III," American Journal of Sociology, vol. 94, pp. 502-534, 1988.
[19] N. Friedkin, "Information flow through strong and weak ties in intraorganizational social networks," Social Networks, vol. 3, pp. 273-285, 1982.
[20] B. Wellman, "Computer networks as social networks," Science, vol. 293, pp. 2031-2034, 2001.
[21] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press, 1994.
[22] S. P. Borgatti and R. Cross, "A relational view of information seeking and learning in social networks," Management Science, vol. 49, pp. 432-445, 2003.
[23] L. C. Freeman, "Centrality in social networks: Conceptual clarification," Social Networks, vol. 1, pp. 215-240, 1979.
[24] U. Ravindranathan, R. Shen, M. Goncalves, W. Fan, E. A. Fox, and J. W. Flanagan, "ETANA-DL: a digital library for integrated handling of heterogeneous archaeological data," in the Proceedings of 2004 ACM-IEEE Joint Conference on Digital Libraries. Tucson, AZ, 2004.
[25] U. Ravindranathan, R. Shen, M. Goncalves, W. Fan, E. A. Fox, and J. W. Flanagan, "Prototyping digital libraries handling heterogeneous data sources - the ETANA-DL case study," in The Proceedings of the European Conference on Digital Libraries (ECDL 2004). Bath, UK, 2004.
[26] U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, and J. W. Flanagan, "ETANA-DL: Managing complex information applications an archaeology digital library," in JCDL 2004: Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries - Global Reach and Diverse Impact, H. Chen, M. Christel, and E. P. Lim, Eds., 2004, pp. 414-414. <Go to ISI>://ISIP:000222881400106
[27] E. A. Fox, F. Das Neves, X. Y. Yu, R. Shen, S. Kim, and W. Fan, "Exploring the computing literature with visualization and stepping stones & pathways," Communications of the ACM, vol. 49, pp. 53-58, 2006. <Go to ISI>://000236285500019
[28] A. Raghavan, N. S. Vemuri, R. Shen, M. A. Goncalves, W. Fan, and E. A. Fox, "Incremental, semi-automatic, mapping-based integration of heterogeneous collections into archaeological digital libraries: Megiddo case study," in Research and Advanced Technology for Digital Libraries, vol.[3652, Lecture Notes in Computer Science, 2005, pp. 139-150. <Go to ISI>://000233890800013
[29] R. Shen, N. S. Vemuri, W. G. Fan, R. da S Torres, and E. A. Fox, "Exploring digital libraries: Integrating browsing, searching, and visualization," in Opening Information Horizons, 2006, pp. 1-10. <Go to ISI>://ISIP:000238914700001
[30] N. S. Vemuri, R. Shen, S. Tupe, W. Fan, and E. A. Fox, "ETANA-ADD: An interactive tool for integrating archaeological DL collections," in Opening Information Horizons, 2006, pp. 161-162. <Go to ISI>://ISIP:000238914700023
[31] D. Gorton, R. Shen, N. S. Vemuri, W. Fan, and E. A. Fox, "ETANA-GIS: GIS for archaeological digital libraries," in Opening Information Horizons, 2006, pp.[379-379. <Go to ISI>://ISIP:000238914700104
[32] A. Raghavan, D. Rangarajan, R. Shen, M. A. Goncalves, N. S. Vemuri, W. Fan, and E. A. Fox, "Schema mapper: A visualization tool for DL integration," in Proceedings of the 5th Acm/Ieee Joint Conference on Digital Libraries, Proceedings, 2005, pp. 414-414. <Go to ISI>://ISIP:000230429800112
[33] R. Shen, "Applying the 5S Framework To ntegrating Digital Libraries," in Computer Science. Blacksburg, VA: Virginia Tech, 2006. http://scholar.lib.vt.edu/theses/available/etd-04212006-135018/
[34] M. A. Goncalves, "Streams, Structures, Spaces,Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications," in Computer Science. Blacksburg, VA, 2004. http://scholar.lib.vt.edu/theses/available/etd-12052004-135923/
[35] U. Ravindranathan, "Prototyping Digital Libraries Handling Heterogeneous Data Sources - An ETANA-DL Case Study," in Computer Science. Blacksburg, VA: Virginia Tech, 2004. http://scholar.lib.vt.edu/theses/available/etd-04262004-153555/
[36] E. A. Fox, M. A. Goncalves, and R. Shen, "The Role of Digital Libraries in Moving Toward Knowledge Environments," in From Integrated Publication and Information Systems to Information and Knowledge Environments: Essays Dedicated to Erich J. Neuhold on the Occasion of His 65th Birthday, Lecture Notes in Computer Science, Volume[3379, M. Hemmje, C. Niederee, and T. Risse, Eds.: Springer-Verlag GmbH, 2005, pp. 96-106. http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=3379&spage=96
[37] R. Shen, N. S. Vemuri, W. Fan, and Edward A. Fox. Full paper for , 2006, , pp. , "What is a Successful Digital Library?," in Proc. ECDL 2006, Alicante, Spain, Sept. 17-21, 2006, Research and Advanced Technology for Digital Libraries, ISBN 978-3-540-44636-1, Lecture Notes in Computer Science, Volume 4172, ISSN 03
[38] E. Fox, R. Shen, S. Vemuri, W. Fan, L. Cantara, J. Eustis, and J. Flanagan, "ETANA-DL: Leveraging digital library technologies to support archaeology," in Proc. CAA2006, Computer Applications and Quantitative Methods in Archaeology Annual Conference, April 18-21. Fargo, ND, 2006.
[39] R. Ball and C. North, "An Analysis of User Behavior on High-Resolution Tiled Displays," in Tenth IFIP International Conference on Human-Computer Interaction (INTERACT 2005), 2005, pp.[350-364.
[40] R. Ball, M. Varghese, B. Carstensen, E. D. Cox, C. Fierer, M. Peterson, and C. North, "Evaluating the Benefits of Tiled Displays for Navigating Maps," in IASTED International Conference on Human-Computer Interaction, 2005.
[41] R. Ball and C. North, "Realizing Embodied Interaction for Visual Analytics through Large Displays," Computers & Graphics (C&G), vol.[31, 2007.
[42] A. Sabri, R. Ball, S. Bhatia, A. Fabian, and C. North, "High-Resolution Gaming: Interfaces, Notifications and the User Experience," Computer Games (Special Issue on HCI Issues), vol. 19, pp. 151-166, 2007.
[43] B. Yost and C. North, "The Perceptual Scalability of Visualization," IEEE Transactions on Visualization and Computer Graphics (also from Proc. IEEE Symposium on Information Visualization, InfoVis 2006), vol. 12, pp. 837-844, 2006.
[44] R. Ball, M. DellaNoce, T. Ni, F. Quek, and C. North, "Applying Embodied Interaction and Usability Engineering to Visualization on Large Displays," in ACM British HCI - Workshop on Visualization & Interaction, 2006, pp. 57-65.
[45] L. Shupp, R. Ball, B. Yost, J. Booker, and C. North, "Evaluation of viewport size and curvature of large, high-resolution displays," in Proceedings of the 2006 conference on Graphics interface (GI). Quebec, Canada ACM, 2006, pp. 123 - 130.

Project Website

   http://www.dl-vt-416.org/DL-VT-416/Home.html

Documents

   Original Proposal submitted to NSF-ITR, April 2007 [ .PDF ]

Contacts

PI:
  Dr. Edward Fox [ Website ] [ ] - Pofessor, Computer Science

Co-PIs:
  Dr. Weiguo(Patrick) Fan [ Website ] [ ] - Associate Professor, Accounting and Information Systems
  Dr. Christopher North [ Website ] [ ] - Associate Professor, Computer Science
  Dr. Naren Ramakrishnan [ Website ] [ ] - Associate Professor, Computer Science
  Dr. Donald Shoemaker [ Website ] [ ] - Pofessor, Sociology

GRA:
  Christopher Andrews [ ]
  Szu-Chia Lu [ ]
  YiFei Ma [ ]
  Venkat Srinivasan [ ]

Last updated : 18th August 2007