In both of these examples, we can use unstructured data as input.
The first example enables the automatic collection of information from a series of documents that organizations have gathered over time, but for which metadata tagging and conversion into structured data had to be done manually, or at best, with heuristics strongly based on the structure of the documents themselves.
The second example, on the other hand, forms the basis of modern chatbots. It leverages the ability of LLMs to generate natural language text and maintain a high-quality conversation with users, allowing these chatbots to interact with the document base they need to answer questions accurately. Of course, this methodology can also be used with structured data, for example, by having the model generate a coherent SQL query based on the user’s request, thus retrieving data in a structured format. The real value, however, comes from combining both functionalities.
In general, we can assert that unstructured data is becoming increasingly important: on one hand, new artificial intelligence tools make it easier to use and convert it into a structured form, which then fits into more traditional usage processes; on the other hand, this type of data is essential for both training these new tools and in use cases that bring real value to organizations.
As with all disciplines within Data Management, these new uses of artificial intelligence must be controlled and should fall within a defined Data Governance process at the organizational level.