The next series of posts will be about how emerging technologies are changing the practice of science and the career of scientists.

Since mid-2015, there has been increasing talk about the impact of artificial intelligence (AI), machine learning (ML), and automation on the future of jobs, the economy, and on business competitiveness. So far, the commentary has covered general themes and possible trends.

What I aim to do differently is to cover specific topics at a level that affects people studying and performing their craft.

I will use chemistry-related examples, since I can benchmark these with my own education in chemistry. However, these examples are similar across all scientific fields.

Let’s start at the beginning: what students should be learning and what practicing scientists need to know.

Today this means not just learning about the subject matter expertise, but also learning how to search for information, how to present data, how to share it, and understanding the principals of database standards and ethical use.

Here is what I mean:

The University of Pennsylvania’s course in Chemical Information for chemistry students

A few months ago, I was attending a scientific symposium in Philadelphia about the Cambridge Structural Database. The agenda featured an unexpected presentation, a sleeper of a topic, that turned out to be absolutely captivating in its content and its implications about knowledge management and modern science curricula.

The title of the presentation was “Examining research data through a crystal lens: teaching students about primary data, data representation, and data management using crystal structure databases,” presented by Judith Currano, head of the chemistry library at the University of Pennsylvania.

Judith described the course she teaches on Chemical Information. It is a required course for all chemistry PhD students at “Penn” and it is an elective for chemistry undergraduate and master’s degree students.

The course teaches

  • the organization and retrieval of chemical information,
  • the ways in which information tools are constructed, and
  • a broad range of effective search techniques.

She gives assignments using the key chemistry databases and information resources to demonstrate the techniques being taught.

She shows students how to find indexes to chemical structures and how to go back to the primary literature sources if necessary. These databases are rules based. One needs to know the rules and their exceptions and shortfalls to search properly.

The homework shows how students think and how they look at the data. She says students develop different search strategies. Students also go about their analyses differently. Citing the CSD as an example, some students use CSD’s Mercury, while some export the data to Excel. There are different ways to get at what they need to do and her job is to help them with their strategy.

The goal of the course is to teach students how to think about information for optimal and most efficient retrieval and use.

All around me, there were recent PhD graduates and scientists at various stages of their career. I was asking all of them: did you have a course like that when you were in graduate school? The universal answer was “no, but I wish there was one like that.”

Me too. In my time, it was optional to sign up for a library session that lasted about one hour. The experience was mostly limited to learning how to set up an account, get online, and do a few searches. It sounds like this is still very much the norm today. Judith’s course is an example of where this needs to go.

Customization by scientific sub-discipline

Judith takes it even further. The course is sectioned into the four major chemistry subdisciplines: organic chemistry, inorganic chemistry, physical chemistry, and analytical chemistry. Students take the one in which they are specializing.

The course is comprised of fourteen classes. Figure 1 shows the syllabus for the organic chemistry stream. Organic chemists engage quickly within the first few classes because organic chemistry depends on chemical structures, reaction mechanisms, and tracking down lab methods from the very start.

Figure 1: slide from presentation by Judith Currano, Head of Chemistry Library, University of Pennsylvania

In contrast, physical chemists are less engaged in these topics. For them, they start to engage when they start accessing data. Figure 2 shows the syllabus for the physical chemistry stream.

Figure 2: slide from presentation by Judith Currano, Head of Chemistry Library, University of Pennsylvania

FAIR data

Data is the starting point for analytics, machine learning and artificial intelligence.  Before any of these approaches are possible, one needs accurate data that is usable for this purpose.

This is where the concept of “FAIR data” will assume increasing importance in science and engineering.

FAIR data are data which meet standards of Findability, Accessibility, Interoperability, and Reusability.

Judith introduces this important emerging topic.

Figure 3 illustrates how data ends up in databases. Her slide is adapted from her 2017 ACS presentation “Three degrees of interpretation: Why structure searches fail and how to maximize success.”

Figure 3: slide from presentation by Judith Currano, Head of Chemistry Library, University of Pennsylvania

Figure 4 shows one mechanism where students, going about their work with no awareness of these concepts, ends up failing to find the information they are seeking. Worse, researchers can deposit data into information sources without using FAIR data standards. Over time, these information sources become increasingly unreliable.

Figure 4: slide from presentation by Judith Currano, Head of Chemistry Library, University of Pennsylvania

The master of information and library science (MLS)

To have a course like this, we need people who have the skill sets to teach it. The path is through a Master of Library and Information Science (MLIS) degree.

When I first created the competitive technical intelligence group at a biopharmaceutical company almost two decades ago, one of our first hires was an MLIS professional. We needed someone who knew how to access specialized literature sources and expensive databases efficiently and cost-effectively.

The MLIS degree and career path is still much the same today. However, in the deep science fields, I would say that we need an MLIS professional that also has a background in the specific field. This cross-disciplinary expertise is much harder to find. This is an example of the interdisciplinary expertise we need now and in the future.

Judith Currano is such an example. After her presentation, I asked her how she chose to get into this field? She said she heard about this career option when she was in high school and decided right then that this was what she wanted to do.

She did her bachelors in chemistry and English at the University of Rochester, and then her MLIS degree at the University of Illinois at Urbana–Champaign. She joined the chemistry library at the University of Pennsylvania, thereafter.

She has also been very active in the professional chemistry community as an active member of the American Chemical Society and presenting at their meetings. She is also currently Chair of the Board of Trustees of the Cambridge Crystallographic Data Centre. These are the kinds of active participation that allows one to teach a course that is relevant to the changing practice of science today.

Science information in the age of digital access and data analytics

If any undergraduate asked for my advice, my recommendation is to absolutely take a course like this, even if it is not a required course in their studies, because it is critical to one’s ability to access information.

If any senior manager or human resources department at a company was wondering what continuous training is critical for their technical staff, this would be at the top of the list. Any data and digital strategy—which is essential to a company’s competitiveness—starts first by having staff who have these information skills.

At another scientific workshop I attended two months ago, a director at a large global pharmaceutical company affirmed that these skills are indeed critical for any research-driven company. They aim to have all of their technical staff to have these skills eventually. They find that recent graduates coming from the top schools have these skills. Meanwhile, they are investing in training their in-house staff on these skills.

How emerging technologies are changing the practice of science and the career of scientists is that it starts with learning about the access and use of information and data.

Recently, I was speaking with a professor at a major pharmacy school in the United States. They were updating their curriculum. I suggested that they seriously consider a course formally dedicated to this topic.

This introduction to FAIR data will lead to my next post on Electronic Lab Notebooks and data lakes.

Science education in the age of data analytics, ML and AI
Share this...