Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Generative AI isn’t the only way forward. Commentary by Gary Sangha, Founder and CEO of LexCheck
“Like most industries, GenerativeAI (GenAI) will revolutionize the legal industry. As a lawyer and serial legal tech entrepreneur, I see the value in technology, but I also urge my fellow legal professionals and leaders to carefully consider generative AI’s value and risk before implementation. Headlines are already showcasing the technology’s risks. For example, one lawyer cited fake court citations generated by ChatGPT. Existing GenAI models are prone to hallucinations — mostly due to their probabilistic approach to modeling language — which means they can generate content not grounded on factual information, leading to costly legal errors.
Additionally, GenAI models are usually trained on large, diverse datasets from publicly available sources, which means they might fail to understand the nuance of legal language. This limits their reliability in tasks requiring context understanding. To be used in legal, AI systems must be designed with privacy and confidentiality in mind to ensure no sensitive information is compromised. Therefore, GenAI models might work best when used as part of a robust workflow in conjunction with other mature tools. Rule-based AI systems or machine learning systems trained with domain-specific data both positively impact legal practices while controlling for GenAI-related risks. Regardless of what technology you use, human involvement is essential. AI serves as a copilot to legal professionals, allowing them to accomplish more meaningful and engaging work. Thoughtfully consider a new tool’s impact on the results, the workflow and your team before implementing it.”
3 reasons why AI excels at spotting revenue leaks. Commentary by Vlad Voskaresensky, co-founder and CEO of Revenue Grid
“Revenue leakage can cost businesses significant amounts of money and it often goes unnoticed. Research conducted by EY reveals that companies lose up to 5{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09} of their EBITA annually to leakage, resulting in substantial financial losses. Imagine a $100 million company throwing away $5 million every year on stuff they could’ve avoided. Ouch. However, artificial intelligence (AI) offers a glimmer of hope by providing practical solutions to tackle revenue leakage head-on. AI can dive into real-time data, analyze it and make predictions based on historical data to help leaders make more informed decisions moving forward.
AI also has the ability to uncover patterns hidden within your historical data, revealing insights and dropping strategic recommendations that supercharge revenue generation – and that’s where the real magic happens. By tapping into the power of AI, businesses can take on the challenge of revenue leakage proactively. It’s all about plugging those sneaky leaks and reclaiming lost revenue that would have otherwise slipped through the crack. With AI’s real-time data analysis, accurate forecasts and savvy recommendations, you’ll be on the fast track to driving growth and raking in profits.”
Data teams are bogged down by repetitive tasks. Here’s why continuous deployment is a
necessity for the modern enterprise. Commentary by Michael Berthold, CEO of KNIME
“The need for automation has grown tremendously in the past decade, but data science teams
are still struggling with it. Complications can stem from the sheer complexity of putting models
into a deployable application, into production, and then keeping up with the constant need for
testing, validation, and re-training. Beyond that, the challenge of scaling the data to be
actionable and useful for everyone in a given organization.
This is where the practice of continuous deployment becomes critical for organizations and their
data science teams to enable quick and frequent model updates and reduce errors — cultivating
more accurate insights that are usable by anyone in an organization. But where to start? To
automate repetitive and increasingly complex tasks and overcome the challenges laid out
above, it’s best to approach automation in three phases: (i) Automating deployment; (ii) Automating testing and validation; (iii) Automating monitoring and re-training.
Automating deployment typically takes a lot of manual steps and re-coding. To mitigate these
repetitive tasks, it’s important to create an environment where data scientists can move
everything they’ve created into production with a single click of a button. For this, a transparent
development environment is a must — ideally one that also easily enables the reuse of best
practices and templates.
Once you’ve automated deployment, then it’s important to automate testing and validation to
ensure models remain compliant prior to going into production. To enable this communication
between data scientists, compliance and IT is critical to define proper validation and monitoring
steps for each application.
Finally, automation of model monitoring and re-training is paramount for effective and compliant
continuous deployment. Data science models need to perform properly in an ever-changing
reality. For instance, buying behavior and machines are constantly changing, which means
models will need to be consistently updated. Data scientists should submit monitoring workflows
along their production processes to automatically retrain models when performance degrades.
As an added layer of automation, data scientists should have alarm systems in place that alert
the team if reality has changed too drastically for the model to run properly.“
Facial recognition in stores in the UK and the US. Commentary by Leila Nashashibi, Campaigner at Fight for the Future
“Today, people’s knee-jerk response to store theft and other poverty-related societal issues is ‘more police’ and ‘more surveillance.’ The powers that be are responsible for that––they’ve convinced us policing and surveillance are the answers because they don’t have to deal with the consequences, like the increased trauma, racial profiling, harassment, and physical violence that poor people and BIPOC communities experience as a result of policing.
As more stores adopt facial recognition, they’re creating mass databases of people’s extremely sensitive biometric information that can be accessed by cops and ultimately used to track people in real time. The enormous harm that will result from the spread of this tech will never justify its use in public places––we must find other solutions.
The UK government’s attempts to push facial recognition on stores, in apparent coordination with facial recognition company Facewatch, also reminds us of the enormous power of private military and police tech companies in influencing government policies, from defense budgets to outright war. We’ve succeeded in holding facial recognition at bay in the US––most recently through a campaign in which over 100 artists and venues publicly denounced the tech––but what we truly need across the US and globally is legislation banning this tech in public, like Intro #1014, recently introduced in New York City.”
AI in customer service centers to jump 366{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09}! Commentary by Mike Myer, CEO of Quiq
“This is a trend that matches our experience and our future expectations, but we anticipated that the numbers from Gartner’s report to be even higher. We are seeing a lot of interest from brands who want to take advantage of Large Language Models (LLM), the AI that underlies ChatGPT. Companies are beginning to understand how much more powerful the latest AI is and how it can improve their CX departments. They are turning their attention to the implementation of an AI-based customer service solution in the next few years.
New capabilities provided by AI create a better platform for the customer service center staff, offer better service to customers, and also cut costs. The Large Language Model AI now underlying automated conversations far surpasses the frustrating chatbots of the days past. The new “chatbots”, we refer to them as Assistants on our platform, actually understand the questions customers are asking and can generate a concise and personalized response! This is a revolutionary development for the contact service center, since a great deal of customer service interactions can be solved satisfactorily by AI, as Gartner’s research confirms.
The current economic climate cannot be overlooked as an additional factor behind this uptake. Companies are simply looking for ways to do more with less. This has accelerated the interest in AI solutions faster than expected. The timing of the market conditions in combination with the revolution of AI technology has gotten the attention of brands across the board.”
AI Is Revolutionizing Investment Risk Management in Finance. Commentary by Chandini Jain, Chief Executive Officer, Auquan
“Technologies such as natural language processing (NLP) and generative AI are helping many vertical market industries overcome the dual challenge of information overload and hard-to-get information. This allows research teams to uncover the elusive unknown unknowns of global companies faster and easier than ever before.
For financial market participants, they are able to use AI to shift their “decision window” earlier. They can either address issues proactively with companies before they blow up into front-page controversies, or they can act on emerging idiosyncratic risks before markets price them in. However, AI is not without its own considerations and risks for institutional investors. Investors should avoid “black box” approaches to AI, as these cannot be understood or explained and can lead to data bias and poor decisions.
The real strength of AI lies in its ability to complement human professionals in their respective roles, help teams scale their efforts with the resources they have, and drive improved outcomes. AI should never be seen as a replacement for humans and human decision-making. Firms that get AI right will emerge as the real winners by identifying and managing risks more effectively than their peers and consistently generating alpha.”
Be Aware of The Environmental Impact of AI. Commentary by Hope Wang, Developer Advocate at Alluxio
“As AI advances, so too does its voracious appetite for energy. The rise of AI applications has led to more data centers, facilities housing the infrastructure enabling its data-intensive workloads. In 2020, our computers consumed 4-6{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09} of global electricity, which means by 2030 it may be 8-21{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09}, per University of Pennsylvania research. We must remain aware of the environmental impact of AI and continue pushing for sustainability. This will require rethinking AI infrastructure design, orchestrating machine learning pipelines, and optimizing hardware utilization to improve efficiency. Optimizing data access through techniques like caching, smart data loading, and data tiering can significantly reduce unnecessary data movement and compute costs. Hybrid/multi-cloud distributed training architectures can also help by allowing flexible scaling of compute resources.”
The Loophole in Generative AI. Commentary by Katy Salamati, Senior Manager of Advanced Analytics Lifecycle at SAS
“Most large companies hold themselves to rigorous standards when it comes to data governance and compliance. However, as these companies begin to negotiate with emerging AI companies to share data and use their innovations, like generative AI tools and products, loopholes appear within seemingly strict relationship guidelines and contracts. For large companies, there is always the risk that even their de-identified data—i.e., proprietary data that is masked to separate the information from the original company or source—might be joined with other datasets. The final dataset made through this combination provides a greater level of detail than what was originally intended by the large company, creating a “loophole” in what initially might have been a clear data-usage contract.
Traditionally, there has been extremely limited transparency about the data used in AI. For example, the following information should be available to anyone yet always unknown: the sources of the data, the original location of data, what data sources are joined, the quality of the data, how data quality is handled in specific situations like missing data in the dataset, and when the data was collected. AI models can quickly adapt to changing data patterns, ensuring daily accuracy. Ideas for how to increase data transparency include providing some type of expiration date or requiring constant monitoring to ensure accuracy at all times–very similar to regular maintenance of a vehicle!
A transparency mismatch between companies managing large data sets and emerging AI companies creates a ‘black box’ effect on how data is being used and results generated. Since 80{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09} of the value derived from an AI model lies in the data it uses, better data management can strengthen partnerships between AI companies and data-rich companies, eliminating this ‘Black Box Effect.’ There needs to be more transparency regarding AI models and associated data. Companies must clearly explain how their AI models are trained, including disclaimers about the training data used and expiration dates of their AI tools. In addition, there needs to be improved government standardization and approvals for the management of data within AI, much like the drug development process. Lastly, we need greater awareness and understanding of the risks and dangers associated with companies unknowingly creating biased models by training their neural networks on blended data sets. These are critical steps to enhancing transparency between large data-rich companies and emerging AI companies to help ensure AI is used accurately and responsibly.”
Three Tips for Avoiding Pitfalls in Your Data Supply Chain. Commentary by Chris Struttmann, founder and CTO of ALTR
“Data governance used to be a side hustle that Data Engineers would tackle as they were doing their “real jobs”–building their pipelines or warehouse size corrections or indexes, their views, raw zone, presentation layer, or data contracts. In between, they’d mask some data or throw in a row-level policy. But as data regulations have become more strict, numerous, and prominent, data governance has become a real job of its own, with data stewards or compliance teams focused on determining policies.
Imagine a data engineer in the middle of this data flow, in charge of a warehouse where trucks keep showing up and dropping off data pallets. Where did the data come from? Who sent it? What kind of data is it? What are the requirements for storing and sharing it? Brick-and-mortar warehouses have this down to a science through their supply chains. Enterprises need to make sure the same rigor around their data supply chain.
Here are Struttmann’s three tips for avoiding these data supply chain pitfalls: (i) Make Your Data Governance Policies Visible: Snowflake developers can easily write a masking policy in a few minutes–writing code is what they do! But while this is a no-brainer for the here-and-now and can even work long-term when teams are small, once you’re enterprise-size and dealing with data moving from one team to another, single, one-off policies become a dead-end. Basically, you’ve applied a policy locally that only technical Snowflake developers can see; (ii) Get Out of Your Silo: Today, we see a lot of “right hand” not knowing what the “left hand” is doing across the data supply chain. Line of business (LoB) users who use the data are so far removed from the data stewards tasked with protecting data it’s like they’re in different worlds. LoB users are busy figuring out how to shave costs; they’re not thinking about HIPAA or PCI regulations. So it’s critical for data middlemen to step out of their silos to understand how all business functions interact with data; (iii) Look for Tools That Make Integrations Easy: Building a modern data supply chain may not be what you had in mind when you woke up this morning. “Hey, I’m just the data guy!” But if your company is buying a data catalog here and an ETL tool there and just crossing its fingers hoping they’ll all work together, that will quickly lead to headaches for you and your colleagues.
Today’s Inflection Point: Trusted AI. Commentary by Louis Landry, Engineering Fellow at Teradata
“Artificial intelligence has been around for decades, but Large Language Model systems like ChatGPT have reignited the discussion around “Trusted AI,” the key tenet of which is that AI should benefit the greater society without harming to individuals or groups. It’s a noble aim, but today’s AI landscape is a bit like the wild west, with AI developments far outpacing regulations. In fact, competition likely will be AI’s ultimate regulator – along with the public and media who will hold big tech companies accountable. So how do we achieve AI that guards against bias/discrimination, ensures data privacy and security for all, and is transparent around digital sourcing and digital rights? Trusted AI is largely about checks and balances. Here are three keys: 1) Ensure developers are using a data analytics platform with a model governance and management environment that provides full transparency, data quality, and lineage so the products and solutions being created can be trusted implicitly. 2) Create continuous dialogue/knowledge transfer between the data scientists and business stakeholders. Define how trusted AI will be executed for each user; involve business users deeply in model creation, business rules and the key principles and methodologies driving those models. 3) Carefully track access, policies, and activities for governance models. As more decisions are based on the scoring of data with increasingly sophisticated models, both the business and regulators will need to know how models were trained, and on what data, to audit and understand how decisions were reached.”
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW