Fargo, North Dakota, USA
December 2, 2016
By Prithviraj Lakkakula, Research Assistant Professor, NDSU Agribusiness and Applied Economics Department
In recent years, there has been much buzz around the term "big data," and most of the time, the term has been used loosely in the media.
The increase in the amount of data created, collected and stored in the past decade has been astronomical. In 2000, only 25 percent of all the world's stored information was digital, whereas 98 percent is digital today.
Today, more than 30,000 gigabytes of data are being generated every second. This has created huge data sets. Extracting value from these huge data sets is key to improving decision making and efficiency, irrespective of the field.
What are big data? In layman's terms, big data are beyond the storage capacity and processing power of a common machine or computer.
According to the National Science Foundation, big data are large, diverse, complex, longitudinal and/or distributed data sets generated from instruments, sensors, internet transactions, email, video, click streams and/or all other digital means.
The most common definition of big data in industry is that they are high-volume, high-velocity and high-variety information assets that demand cost-effective and innovative forms of information processing for enhanced insight and decision making.
Mostly, industry views big data in the form of three V's, (volume, velocity and
variety) and an A (analytics). A few others add another V (veracity) to the list of V's.
Volume usually refers to huge data sets, while velocity refers to the speed at which the data is generated (real-time data processing). Finally, variety refers to the diverse types and sources of data. For example, a variety of data forms generated could be structured (XML files), semi-structured (emails) and/or unstructured (video files).
In the context of agriculture, big data often are confused with precision agriculture.
The National Research Council refers to precision agriculture as a management strategy that uses information technologies to bring data from multiple sources to bear on decisions associated with crop production.
Some agricultural economists claim that the main difference between precision agriculture and big data lies in the fact that precision agriculture involves the collection of data often concentrated in a specific area or field spread through time and space.
Moreover, analytics is not a usual practice in precision agriculture. In general, the usual practice in precision agriculture is to graphically compare the field maps and identify the key nutrient-deficient or less-yielding areas in the field. However, because precision agriculture provides an input for big data for analytics, we could consider precision agriculture and big data complementary to each other.
Several applications of machine learning and big data in agriculture include information on particular crop/commodity seeds sold in a season, Google satellite imaging, pest and/or disease detection using satellite images, drone usage, predictions of commodity supply and demand, and water supplies for assessing drought or floods.
Of these, one of the interesting applications is the use of leaf images collected through drones for disease prediction/detection with tools such as TensorFlow.
The ability to effectively manage and use the massive data sets associated with big data is a huge challenge.
The analytics needed to combine data and to use algorithms necessary to gain insights will require specialized expertise. Determining the ownership, privacy and security of the data are some other challenges. Researchers will need to acquire the skills to store and access huge amounts of data for modeling and data analytics.